Pytorch Torch Nn Relu
[ PyTorch torch.nn Reference Manual](#)\\n\\n* * *\\n\\n`torch.nn.ReLU` is one of the most commonly used activation functions in PyTorch. It performs the Rectified Linear Unit operation element-wise on the input tensor.\\n\\nReLU is one of the most successful activation functions in deep learning because it is computationally simple, converges quickly, and effectively mitigates the vanishing gradient problem.\\n\\n### Function Definition\\n\\ntorch.nn.ReLU(inplace=False)\\n**Parameter Description:**\\n\\n* `inplace` (bool): If set to `True`, it will modify the input tensor directly in place, saving memory. Default is `False`.\\n\\n### Mathematical Principle\\n\\nThe calculation formula executed by `nn.ReLU` is as follows:\\n\\nf(x) = max(0, x)\\nThat is, for each element in the input, if the value is greater than 0, it remains unchanged; if the value is less than or equal to 0, the output is 0.\\n\\nThis simple non-linear transformation allows the network to learn complex patterns while maintaining the flow of gradients.\\n\\n* * *\\n\\n## Usage Examples\\n\\n### Example 1: Basic Usage\\n\\nCreate a ReLU activation layer and apply it to the input:\\n\\n## Instance\\n\\nimport torch\\n\\nimport torch.nn as nn\\n\\n# Create ReLU Activation Layer\\n\\n relu = nn.ReLU()\\n\\n# Create an Input Tensor Containing Negative Values\\n\\n input_tensor = torch.tensor([[-1.0,2.0, -3.0],[4.0, -5.0,6.0]])\\n\\n# Forward pass\\n\\n output = relu(input_tensor)\\n\\nprint("Input:n", input_tensor)\\n\\nprint("Output:n", output)\\n\\nOutput result:\\n\\nInput: tensor([[-1., 2., -3.], [ 4., -5., 6.]])Output: tensor([[0., 2., 0.], [4., 0., 6.]])\\nAs you can see, all negative values are set to 0, and positive values remain unchanged.\\n\\n### Example 2: In-place Operation\\n\\nUsing the inplace parameter can save memory:\\n\\n## Instance\\n\\nimport torch\\n\\nimport torch.nn as nn\\n\\n# Create ReLU with In-place Operation Enabled\\n\\n relu_inplace = nn.ReLU(inplace=True)\\n\\n# Creating Input Tensor\\n\\n input_tensor = torch.randn(2,4)\\n\\n original_id =id(input_tensor)\\n\\n# In-place Modification\\n\\n output = relu_inplace(input_tensor)\\n\\n# Check Whether the Original Tensor Was Modified\\n\\nprint("Whether the Original Tensor Was Modified:",id(output)== original_id)\\n\\nprint("Input/Output:n", input_tensor)\\n\\nOutput result:\\n\\nWhether the Original Tensor Was Modified: TrueInput/Output: tensor([[0.0000, 0.0000, 1.2345, 0.0000], [0.5432, 0.0000, 0.0000, 2.3456]])\\n> Note: In-place operations can save memory, but in some cases, they may affect gradient computation. During the early stages of training or when intermediate activation values need to be preserved, it is recommended to use the default `inplace=False`.\\n\\n### Example 3: Using in a Neural Network\\n\\nIn convolutional neural networks, ReLU usually follows the convolutional layer:\\n\\n## Instance\\n\\nimport torch\\n\\nimport torch.nn as nn\\n\\n# Define a Simple Convolutional Neural Network\\n\\nclass SimpleCNN(nn.Module):\\n\\ndef __init__ (self):\\n\\nsuper(SimpleCNN,self). __init__ ()\\n\\n# Convolutional Layer: Input 3 Channels, Output 32 Channels, Kernel 3x3\\n\\nself.conv1= nn.Conv2d(3,32, kernel_size=3, padding=1)\\n\\n# ReLU Activation\\n\\nself.relu1= nn.ReLU()\\n\\n# Second Convolutional Layer\\n\\nself.conv2= nn.Conv2d(32,64, kernel_size=3, padding=1)\\n\\nself.relu2= nn.ReLU()\\n\\n# Pooling Layer\\n\\nself.pool= nn.MaxPool2d(2,2)\\n\\ndef forward(self, x):\\n\\n x =self.conv1(x)\\n\\n x =self.relu1(x)\\n\\n x =self.pool(x)\\n\\nx =self.conv2(x)\\n\\n x =self.relu2(x)\\n\\n x =self.pool(x)\\n\\nreturn x\\n\\n# Create Model\\n\\n model = SimpleCNN()\\n\\n# Print Model Structure\\n\\nprint("Model Structure:")\\n\\nprint(model)\\n\\n# Test Forward Pass\\n\\n# Simulate a 3x32x32 Color Image\\n\\n input_image = torch.randn(1,3,32,32)\\n\\n output = model(input_image)\\n\\nprint("nInput Shape:", input_image.shape)\\n\\nprint("OutputShape:", output.shape)\\n\\nOutput result:\\n\\nModel Structure:SimpleCNN( (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (relu1): ReLU() (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (relu2): ReLU() (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False))Input Shape: torch.Size([1, 3, 32, 32])OutputShape: torch.Size([1, 64, 8, 8])\\n### Example 4: Using nn.functional.relu\\n\\nPyTorch also provides a functional interface `torch.nn.functional.relu`:\\n\\n## Instance\\n\\nimport torch\\n\\nimport torch.nn.functional as F\\n\\n# Using the Functional Interface\\n\\n input_tensor = torch.randn(1,5,5)\\n\\n# Method 1: Using the Functional Interface\\n\\n output1 = F.relu(input_tensor)\\n\\n# Method 2: Using the In-place Version of the Function\\n\\n F.relu_(input_tensor)\\n\\nprint("Output:n", output1)\\n\\nDifferences between the two:\\n\\n* `nn.ReLU` is a module class that can be used in model definitions and saves parameters.\\n* `F.relu` is a functional interface, commonly used in the forward method or in cases without learnable parameters.\\n* `F.relu_` is the in-place version of the function, indicated by the underscore.\\n\\n* * *\\n\\n## Comparison with Other Activation Functions\\n\\n| **Activation Function** | **Formula** | **Characteristics** | **Applicable Scenarios** |\\n| --- | --- | --- | --- |\\n| `nn.ReLU` | max(0, x) | Computationally simple, sparse activation | General deep learning (default choice) |\\n| `nn.LeakyReLU` | x > 0 ? x : 0.01x | Small gradient on the negative axis | Preventing "dead neurons" |\\n| `nn.GELU` | x * Ξ¦(x) | Smooth approximation, Transformer default | Transformer, BERT, etc. |\\n| `nn.Sigmoid` | 1/(1+e^(-x)) | Output 0-1, gradient saturation | Output layer, binary classification |\\n| `nn.Tanh` | (e^x - e^(-x))/(e^x + e^(-x)) | Output -1 to 1, zero-centered | RNN, LSTM |\\n\\n* * *\\n\\n## Common Questions\\n\\n### Q1: Why does ReLU cause the "dead neurons" problem?\\n\\nWhen the input is consistently negative, the output of ReLU is always 0, and the gradient is also 0, causing these neurons to no longer learn.\\n\\nSolutions:\\n\\n* Use `nn.LeakyReLU` or `nn.ELU` instead\\n* Use a smaller initial learning rate\\n* Apply Batch Normalization\\n\\n### Q2: Is ReLU suitable for the output layer?\\n\\nUsually not. The output layer more commonly uses `Sigmoid` (classification) or an identity function (regression), because the output of ReLU is unbounded.\\n\\n* * *\\n\\n## Usage Scenarios\\n\\n`nn.ReLU` is the most commonly used activation function in deep learning. Its main application scenarios include:\\n\\n* **Convolutional Neural Networks**: Introducing non-linearity after convolutional or fully connected layers.\\n* **Multilayer Perceptrons**: As the activation function for hidden layers.\\n* **Transformers**: As the activation function in FFN (Feed-Forward Network) (although GELU is more commonly used now).\\n* **Generative Adversarial Networks**: As the activation function for the generator.\\n\\n> Tip: Unless there are special requirements, ReLU is the preferred choice for hidden layer activation functions when designing neural networks.\\n\\n* * PyTorch torch.nn Reference Manual](#)
YouTip