Pytorch Torch Nn Nllloss
The Negative Log-Likelihood Loss (`torch.nn.NLLLoss`) is a fundamental loss function in PyTorch used for training multiclass classification models. Despite its name, it does not actually compute the logarithm itself; rather, it expects log-probabilities as inputs.
This tutorial provides a comprehensive guide to understanding, configuring, and implementing `torch.nn.NLLLoss` in your PyTorch workflows.
---
## Introduction
`torch.nn.NLLLoss` is used to train a classification model with $C$ discrete classes. It measures the agreement between the model's predicted probability distribution and the ground-truth target labels.
### The Mathematical Formula
For a single sample, the loss is calculated as:
$$\ell(x, y) = -x_{y}$$
Where:
* $x$ is the input tensor containing **log-probabilities** for each class.
* $y$ is the ground-truth class index (an integer in the range $[0, C-1]$).
* $x_y$ is the predicted log-probability of the correct class.
If the model predicts a high probability for the correct class (close to $1.0$, meaning its log-probability is close to $0$), the loss is low. If the model predicts a low probability (log-probability is a large negative number), the loss is high.
### The Relationship with Softmax
To obtain the log-probabilities required by `NLLLoss`, you must apply a **LogSoftmax** activation function to your model's raw outputs (logits).
$$\text{LogSoftmax}(z_i) = \log \left( \frac{e^{z_i}}{\sum_{j} e^{z_j}} \right)$$
> **Note:** PyTorch's `nn.CrossEntropyLoss` is mathematically identical to combining `nn.LogSoftmax` and `nn.NLLLoss` into a single step. Using `NLLLoss` is preferred when you want explicit access to the log-probabilities during inference or when designing custom output layers.
---
## Syntax and Parameters
### Class Signature
```python
torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
```
### Parameters
| Parameter | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `weight` | `Tensor` | `None` | A manual rescaling weight given to each class. If provided, must be a 1D Tensor of size $C$. Useful for handling imbalanced datasets. |
| `ignore_index` | `int` | `-100` | Specifies a target value that is ignored and does not contribute to the input gradient. |
| `reduction` | `str` | `'mean'` | Specifies the reduction to apply to the output: `'none'` (no reduction), `'mean'` (weighted mean of the output), or `'sum'` (sum of the output). |
### Input and Output Shapes
* **Input ($x$):** $(N, C)$ where $N$ is the batch size and $C$ is the number of classes. (For 2D image segmentation, the shape is $(N, C, d_1, d_2, ..., d_K)$).
* **Target ($y$):** $(N)$ where each value is an integer in the range $[0, C-1]$. (For 2D image segmentation, the shape is $(N, d_1, d_2, ..., d_K)$).
* **Output:** Scalar by default (if `reduction='mean'` or `'sum'`). If `reduction='none'`, the output shape matches the target shape.
---
## Code Example
Below is a complete, runnable example demonstrating how to use `nn.LogSoftmax` alongside `nn.NLLLoss` in a standard PyTorch training step.
```python
import torch
import torch.nn as nn
# 1. Setup reproducible seed
torch.manual_seed(42)
# 2. Define dimensions
batch_size = 3
num_classes = 5
# 3. Create dummy model outputs (logits) and ground-truth targets
# Logits are raw, unnormalized predictions from a neural network
logits = torch.randn(batch_size, num_classes, requires_grad=True)
targets = torch.tensor([1, 0, 4], dtype=torch.long)
print("Raw Logits:\n", logits)
print("\nTarget Class Indices:\n", targets)
# 4. Apply LogSoftmax to get log-probabilities
log_softmax = nn.LogSoftmax(dim=1)
log_probs = log_softmax(logits)
print("\nLog-Probabilities (Sum of exp(log_probs) along dim 1 will equal 1.0):\n", log_probs)
# 5. Initialize NLLLoss
# We will use 'mean' reduction (default)
criterion = nn.NLLLoss()
# 6. Compute Loss
loss = criterion(log_probs, targets)
print(f"\nComputed NLLLoss: {loss.item():.4f}")
# 7. Backward pass demonstration
loss.backward()
print("\nGradient of logits (first row):\n", logits.grad)
```
### Explanation of the Output Calculation
If you look at the `log_probs` tensor generated in the code:
* For batch index `0`, target is `1`. Let's say `log_probs` is $-1.20$.
* For batch index `1`, target is `0`. Let's say `log_probs` is $-2.10$.
* For batch index `2`, target is `4`. Let's say `log_probs` is $-0.50$.
The individual losses are: $1.20$, $2.10$, and $0.50$.
The mean loss is: $\frac{1.20 + 2.10 + 0.50}{3} = 1.26$.
---
## Best Practices and Common Pitfalls
### 1. Forgetting the LogSoftmax Layer
The most common mistake when using `nn.NLLLoss` is passing raw logits directly into the loss function without applying `nn.LogSoftmax` first.
* **The Symptom:** The loss value can become negative, or the model will fail to converge.
* **The Fix:** Always ensure your model's forward pass ends with `nn.LogSoftmax(dim=1)` (or apply it explicitly before passing outputs to the loss function). If you prefer to output raw logits from your model, use `nn.CrossEntropyLoss` instead.
### 2. Target Tensor Data Type
PyTorch expects target tensors for classification losses to be of type `torch.long` (64-bit signed integers). Passing float targets or 32-bit integers (`torch.int32`) will result in a runtime error:
```text
RuntimeError: expected scalar type Long but found Float
```
Always cast your targets using `.long()` or specify `dtype=torch.long` during tensor creation.
### 3. Handling Class Imbalance with `weight`
When working with highly imbalanced datasets (e.g., 90% Class A, 10% Class B), the model might learn to always predict the majority class. You can mitigate this by passing a 1D tensor of weights to `NLLLoss`:
```python
# Assign higher weight to underrepresented classes
weights = torch.tensor([0.1, 0.9])
criterion = nn.NLLLoss(weight=weights)
```
YouTip