Pytorch Torch Nn Nllloss

The Negative Log-Likelihood Loss (`torch.nn.NLLLoss`) is a fundamental loss function in PyTorch used for training multiclass classification models. Despite its name, it does not actually compute the logarithm itself; rather, it expects log-probabilities as inputs. This tutorial provides a comprehensive guide to understanding, configuring, and implementing `torch.nn.NLLLoss` in your PyTorch workflows. --- ## Introduction `torch.nn.NLLLoss` is used to train a classification model with $C$ discrete classes. It measures the agreement between the model's predicted probability distribution and the ground-truth target labels. ### The Mathematical Formula For a single sample, the loss is calculated as: $$\ell(x, y) = -x_{y}$$ Where: * $x$ is the input tensor containing **log-probabilities** for each class. * $y$ is the ground-truth class index (an integer in the range $[0, C-1]$). * $x_y$ is the predicted log-probability of the correct class. If the model predicts a high probability for the correct class (close to $1.0$, meaning its log-probability is close to $0$), the loss is low. If the model predicts a low probability (log-probability is a large negative number), the loss is high. ### The Relationship with Softmax To obtain the log-probabilities required by `NLLLoss`, you must apply a **LogSoftmax** activation function to your model's raw outputs (logits). $$\text{LogSoftmax}(z_i) = \log \left( \frac{e^{z_i}}{\sum_{j} e^{z_j}} \right)$$ > **Note:** PyTorch's `nn.CrossEntropyLoss` is mathematically identical to combining `nn.LogSoftmax` and `nn.NLLLoss` into a single step. Using `NLLLoss` is preferred when you want explicit access to the log-probabilities during inference or when designing custom output layers. --- ## Syntax and Parameters ### Class Signature ```python torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean') ``` ### Parameters | Parameter | Type | Default | Description | | :--- | :--- | :--- | :--- | | `weight` | `Tensor` | `None` | A manual rescaling weight given to each class. If provided, must be a 1D Tensor of size $C$. Useful for handling imbalanced datasets. | | `ignore_index` | `int` | `-100` | Specifies a target value that is ignored and does not contribute to the input gradient. | | `reduction` | `str` | `'mean'` | Specifies the reduction to apply to the output: `'none'` (no reduction), `'mean'` (weighted mean of the output), or `'sum'` (sum of the output). | ### Input and Output Shapes * **Input ($x$):** $(N, C)$ where $N$ is the batch size and $C$ is the number of classes. (For 2D image segmentation, the shape is $(N, C, d_1, d_2, ..., d_K)$). * **Target ($y$):** $(N)$ where each value is an integer in the range $[0, C-1]$. (For 2D image segmentation, the shape is $(N, d_1, d_2, ..., d_K)$). * **Output:** Scalar by default (if `reduction='mean'` or `'sum'`). If `reduction='none'`, the output shape matches the target shape. --- ## Code Example Below is a complete, runnable example demonstrating how to use `nn.LogSoftmax` alongside `nn.NLLLoss` in a standard PyTorch training step. ```python import torch import torch.nn as nn # 1. Setup reproducible seed torch.manual_seed(42) # 2. Define dimensions batch_size = 3 num_classes = 5 # 3. Create dummy model outputs (logits) and ground-truth targets # Logits are raw, unnormalized predictions from a neural network logits = torch.randn(batch_size, num_classes, requires_grad=True) targets = torch.tensor([1, 0, 4], dtype=torch.long) print("Raw Logits:\n", logits) print("\nTarget Class Indices:\n", targets) # 4. Apply LogSoftmax to get log-probabilities log_softmax = nn.LogSoftmax(dim=1) log_probs = log_softmax(logits) print("\nLog-Probabilities (Sum of exp(log_probs) along dim 1 will equal 1.0):\n", log_probs) # 5. Initialize NLLLoss # We will use 'mean' reduction (default) criterion = nn.NLLLoss() # 6. Compute Loss loss = criterion(log_probs, targets) print(f"\nComputed NLLLoss: {loss.item():.4f}") # 7. Backward pass demonstration loss.backward() print("\nGradient of logits (first row):\n", logits.grad) ``` ### Explanation of the Output Calculation If you look at the `log_probs` tensor generated in the code: * For batch index `0`, target is `1`. Let's say `log_probs` is $-1.20$. * For batch index `1`, target is `0`. Let's say `log_probs` is $-2.10$. * For batch index `2`, target is `4`. Let's say `log_probs` is $-0.50$. The individual losses are: $1.20$, $2.10$, and $0.50$. The mean loss is: $\frac{1.20 + 2.10 + 0.50}{3} = 1.26$. --- ## Best Practices and Common Pitfalls ### 1. Forgetting the LogSoftmax Layer The most common mistake when using `nn.NLLLoss` is passing raw logits directly into the loss function without applying `nn.LogSoftmax` first. * **The Symptom:** The loss value can become negative, or the model will fail to converge. * **The Fix:** Always ensure your model's forward pass ends with `nn.LogSoftmax(dim=1)` (or apply it explicitly before passing outputs to the loss function). If you prefer to output raw logits from your model, use `nn.CrossEntropyLoss` instead. ### 2. Target Tensor Data Type PyTorch expects target tensors for classification losses to be of type `torch.long` (64-bit signed integers). Passing float targets or 32-bit integers (`torch.int32`) will result in a runtime error: ```text RuntimeError: expected scalar type Long but found Float ``` Always cast your targets using `.long()` or specify `dtype=torch.long` during tensor creation. ### 3. Handling Class Imbalance with `weight` When working with highly imbalanced datasets (e.g., 90% Class A, 10% Class B), the model might learn to always predict the majority class. You can mitigate this by passing a 1D tensor of weights to `NLLLoss`: ```python # Assign higher weight to underrepresented classes weights = torch.tensor([0.1, 0.9]) criterion = nn.NLLLoss(weight=weights) ```

YouTip

Pytorch Torch Nn Nllloss

📂 Categories