Pytorch Loss Function

Loss functions measure the gap between model predictions and ground truth values, serving as the core guide for neural network training—optimizers update model parameters by minimizing the loss function. PyTorch includes over a dozen common loss functions in the `torch.nn` module, covering major task types such as classification, regression, and ranking. * * * ## 1. Loss Function Basics ### Basic Usage All PyTorch loss functions are subclasses of `nn.Module` and share a unified usage pattern: ## Instance import torch import torch.nn as nn # 1. Instantiate the loss function criterion = nn.CrossEntropyLoss() # 2. Compute the loss (predictions first, targets second) loss = criterion(predictions, targets) # 3. Backpropagation loss.backward() ### Input Shape Conventions Different loss functions have different input shape requirements, which is where beginners most often make mistakes: | Loss Function | Prediction (input) Shape | Label (target) Shape | | --- | --- | --- | | `CrossEntropyLoss` | `(N, C)` raw logits | `(N,)` integer class indices | | `BCELoss` | `(N,)` probabilities after Sigmoid | `(N,)` 0/1 floats | | `BCEWithLogitsLoss` | `(N,)` raw logits | `(N,)` 0/1 floats | | `MSELoss` | `(N,)` any real number | `(N,)` any real number | | `NLLLoss` | `(N, C)` probabilities after log_softmax | `(N,)` integer class indices | > **N** = batch size, **C** = number of classes * * * ## 2. Classification Task Loss Functions ### 2.1 CrossEntropyLoss The most commonly used multi-class classification loss function. **It automatically applies Softmax + Log + Negation internally**, so there is no need to manually apply Softmax to the model output. **Mathematical Formula:** Loss = -sum(y_c * log(p_c)) Where p_c = exp(x_c) / sum_j exp(x_j) is the Softmax output. ## Instance import torch import torch.nn as nn criterion = nn.CrossEntropyLoss() # Model output: raw logits, shape (batch_size, num_classes) # No need to apply Softmax beforehand! predictions = torch.tensor([ [2.0,0.5,0.3],# Sample 1, most likely class 0 [0.1,3.0,0.2],# Sample 2, most likely class 1 [0.2,0.1,4.0],# Sample 3, most likely class 2 ]) # Labels: integer class indices, shape (batch_size,) targets = torch.tensor([0,1,2]) loss = criterion(predictions, targets) print(f"Loss: {loss.item():.4f}")# Loss: 0.1763 **Supports soft labels (Label Smoothing):** ## Instance # Label smoothing, mitigates overfitting, commonly used in image classification competitions criterion = nn.CrossEntropyLoss(label_smoothing=0.1) # Also supports directly passing soft labels (probability distributions) soft_targets = torch.tensor([ [0.9,0.05,0.05], [0.05,0.9,0.05], ]) predictions = torch.randn(2,3) loss = criterion(predictions, soft_targets) > **Applicable Scenarios:** Multi-classification (cat/dog/bird), image classification, text classification, and all other multi-classification tasks. * * * ### 2.2 BCELoss Binary Cross-Entropy Loss Specifically for **binary classification** or **multi-label classification** tasks. The input must be probability values (0~1) processed through `Sigmoid`. **Mathematical Formula:** Loss = -[y * log(p) + (1-y) * log(1-p)] ## Instance criterion = nn.BCELoss() # Model output must be passed through Sigmoid first, value range (0, 1) raw_output = torch.tensor([2.0, -1.0,0.5, -3.0]) predictions = torch.sigmoid(raw_output)# [0.88, 0.27, 0.62, 0.05] # Labels: float type 0.0 or 1.0 targets = torch.tensor([1.0,0.0,1.0,0.0]) loss = criterion(predictions, targets) print(f"Loss: {loss.item():.4f}")# Loss: 0.2824 # Multi-label classification (each sample can belong to multiple classes) # predictions shape: (batch_size, num_labels) predictions_ml = torch.sigmoid(torch.randn(4,5)) targets_ml = torch.randint(0,2,(4,5)).float() loss_ml = criterion(predictions_ml, targets_ml) > `BCELoss` requires the input to be in the (0, 1) range; passing raw logits will lead to numerical instability or even NaN. It is recommended to use `BCEWithLogitsLoss` below. * * * ### 2.3 BCEWithLogitsLoss An improved version of `BCELoss`. **It automatically applies Sigmoid internally**, is more numerically stable, and is recommended as the priority choice. ## Instance criterion = nn.BCEWithLogitsLoss() # Pass raw logits directly, no need to manually apply Sigmoid predictions = torch.tensor([2.0, -1.0,0.5, -3.0]) targets = torch.tensor([1.0,0.0,1.0,0.0]) loss = criterion(predictions, targets) print(f"Loss: {loss.item():.4f}") # Equivalent to (but with better numerical stability): # loss = BCELoss(Sigmoid(predictions), targets) **With positive sample weights (handling class imbalance):** ## Instance # pos_weight: positive sample weight, the larger the value, the more attention is paid to positive samples # For example, if negative samples are 10 times the positive samples, set pos_weight=10 pos_weight = torch.tensor([10.0]) criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight) > **Applicable Scenarios:** Binary classification (spam detection), multi-label classification (article multi-label tagging), object detection (foreground/background judgment). * * * ### 2.4 NLLLoss Negative Log-Likelihood Loss Requires manually applying `log_softmax` to the model output, offering greater flexibility. `CrossEntropyLoss = LogSoftmax + NLLLoss`. ## Instance criterion = nn.NLLLoss() # Must manually apply log_softmax first raw_output = torch.randn(4,3)# (batch, num_classes) log_probs = torch.log_softmax(raw_output, dim=1) targets = torch.tensor([0,2,1,0]) loss = criterion(log_probs, targets) > **Use Cases:** When you need to use log probabilities in intermediate steps (e.g., CTC, Beam Search); for other cases, prioritize `CrossEntropyLoss`. * * * ## 3. Regression Task Loss Functions ### 3.1 MSELoss Mean Squared Error The most classic regression loss, **highly sensitive to large errors** (because squaring amplifies the impact of large errors). **Mathematical Formula:** MSELoss = (1/N) * sum((y_i - y_hat_i)^2) ## Instance criterion = nn.MSELoss() predictions = torch.tensor([2.5,0.5,2.0,8.0]) targets = torch.tensor([3.0, -0.5,2.0,7.0]) loss = criterion(predictions, targets) print(f"MSE Loss: {loss.item():.4f}")# MSE Loss: 0.3750 # Manual verification manual =((predictions - targets) ** 2).mean() print(f"Manual calculation: {manual.item():.4f}")# 0.3750 > **Applicable Scenarios:** Continuous value regression like house price prediction, temperature prediction, etc. Works well when there are no obvious outliers in the data. * * * ### 3.2 L1Loss Mean Absolute Error **More robust to outliers**, because it takes the absolute value instead of squaring, so large errors are not overly amplified. **Mathematical Formula:** L1Loss = (1/N) * sum(|y_i - y_hat_i|) ## Instance criterion = nn.L1Loss() predictions = torch.tensor([2.5,0.5,2.0,8.0]) targets = torch.tensor([3.0, -0.5,2.0,7.0]) loss = criterion(predictions, targets) print(f"L1 Loss: {loss.item():.4f}")# L1 Loss: 0.5000 * * * ### 3.3 SmoothL1Loss Huber Loss **Combines the advantages of MSE and L1**: uses MSE for small errors (smooth, stable gradients) and L1 for large errors (robust to outliers). The standard loss in object detection (Faster R-CNN). **Mathematical Formula:** SmoothL1(x) = 0.5*x^2 if |x| < 1, else |x| - 0.5 ## Instance criterion = nn.SmoothL1Loss() predictions = torch.tensor([2.5,0.5,2.0,8.0]) targets = torch.tensor([3.0, -0.5,2.

YouTip

Pytorch Loss Function

📂 Categories