YouTip LogoYouTip

Pytorch Torch Nn Conv2D

[![Image 1: PyTorch torch.nn Reference Manual](https://example.com/images/up.gif) PyTorch torch.nn Reference Manual](https://example.com/pytorch/pytorch-torch-nn-ref.html) * * * `torch.nn.Conv2d` is a module in PyTorch for two-dimensional convolution and serves as a core component of Convolutional Neural Networks (CNNs). It extracts spatial features by applying learnable convolutional kernels to input tensors, widely used in image processing and computer vision tasks. ### Function Definition torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros') **Parameter Description:** * `in_channels` (int): Number of input channels. For example, RGB images have 3 channels. * `out_channels` (int): Number of output channels, i.e., the number of convolutional kernels. * `kernel_size` (int or tuple): Size of the convolutional kernel. Can be an integer (square) or a tuple (height x width). * `stride` (int or tuple): Stride of the convolutional kernel. Default is 1. * `padding` (int or tuple): Padding size applied to the input edges. Default is 0. * `dilation` (int or tuple): Spacing between kernel elements. Default is 1 (standard convolution). * `groups` (int): Number of groups for grouped convolution. Default is 1 (standard convolution). * `bias` (bool): Whether to add a bias term. Default is `True`. * `padding_mode` (str): Padding mode. Options are `'zeros'`, `'reflect'`, `'replicate'`, `'circular'`. **Attributes:** * `weight` (Tensor): Learnable weights with shape (out_channels, in_channels/groups, kernel_size, kernel_size). * `bias` (Tensor): Learnable bias with shape (out_channels,). * * * ## Usage Examples ### Example 1: Basic Usage Create a simple 2D convolutional layer: ## Instance import torch import torch.nn as nn # Create convolutional layer: input 3 channels, output 32 channels, kernel 3x3 conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3) # Print shapes of weight and bias print("Weight shape:", conv.weight.shape)# torch.Size([32, 3, 3, 3]) print("Bias shape:", conv.bias.shape)# torch.Size() # Create input tensor: batch=1, channels=3, height=32, width=32 input_tensor = torch.randn(1,3,32,32) # Forward pass output = conv(input_tensor) print("Input shape:", input_tensor.shape)# torch.Size([1, 3, 32, 32]) print("Output shape:", output.shape)# torch.Size([1, 32, 30, 30]) Output result: Weight shape: torch.Size([32, 3, 3, 3]) Bias shape: torch.Size() Input shape: torch.Size([1, 3, 32, 32]) Output shape: torch.Size([1, 32, 30, 30]) By default, padding=0, so the output size decreases. To maintain the same size, add padding. ### Example 2: Using padding to Maintain Size Add padding to keep input and output dimensions consistent: ## Instance import torch import torch.nn as nn # Create convolutional layer with padding: padding=1 maintains size conv_pad = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1) # Input input_tensor = torch.randn(1,3,32,32) # Forward pass output = conv_pad(input_tensor) print("Input shape:", input_tensor.shape) print("Output shape:", output.shape)# Maintains 32x32 Output result: Input shape: torch.Size([1, 3, 32, 32]) Output shape: torch.Size([1, 32, 32, 32]) ### Example 3: Different stride and dilation Adjusting stride and dilation can change output size and receptive field: ## Instance import torch import torch.nn as nn # Strided convolution: stride=2 reduces size conv_stride = nn.Conv2d(3,32, kernel_size=3, stride=2, padding=1) input_tensor = torch.randn(1,3,32,32) output_stride = conv_stride(input_tensor) print("Stride=2 -> Output shape:", output_stride.shape) # Dilated convolution: dilation=2 increases receptive field conv_dilation = nn.Conv2d(3,32, kernel_size=3, dilation=2) output_dilation = conv_dilation(input_tensor) print("Dilation=2 -> Output shape:", output_dilation.shape) Output result: Stride=2 -> Output shape: torch.Size([1, 32, 16, 16]) Dilation=2 -> Output shape: torch.Size([1, 32, 28, 28]) ### Example 4: Grouped Convolution The groups parameter enables grouped convolution, commonly used in lightweight networks: ## Instance import torch import torch.nn as nn # Grouped convolution: groups=2 splits input into 2 groups conv_group = nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, groups=2) # Input with 4 channels input_tensor = torch.randn(1,4,16,16) # Forward pass output = conv_group(input_tensor) print("Input shape:", input_tensor.shape) print("Output shape:", output.shape) print("Weight shape:", conv_group.weight.shape)# Weight shape differs after grouping Output result: Input shape: torch.Size([1, 4, 16, 16]) Output shape: torch.Size([1, 8, 14, 14]) Weight shape: torch.Size([8, 2, 3, 3]) ### Example 5: Using in a Neural Network Build a simple handwritten digit recognition network: ## Instance import torch import torch.nn as nn class SimpleCNN(nn.Module): def __init__ (self, num_classes=10): super(SimpleCNN,self). __init__ () # First convolution block self.conv1= nn.Conv2d(1,32, kernel_size=3, padding=1) self.bn1= nn.BatchNorm2d(32) self.relu1= nn.ReLU() # Second convolution block self.conv2= nn.Conv2d(32,64, kernel_size=3, padding=1) self.bn2= nn.BatchNorm2d(64) self.relu2= nn.ReLU() # Pooling layer self.pool= nn.MaxPool2d(2,2) # Fully connected layer self.fc= nn.Linear(64 * 7 * 7, num_classes) def forward(self, x): # First convolution block x =self.conv1(x) x =self.bn1(x) x =self.relu1(x) x =self.pool(x) # Second convolution block x =self.conv2(x) x =self.bn2(x) x =self.relu2(x) x =self.pool(x) # Flatten and classify x = x.view(x.size(0), -1) x =self.fc(x) return x # Create model model = SimpleCNN(num_classes=10) # Test input: batch=4, grayscale image 28x28 input_image = torch.randn(4,1,28,28) output = model(input_image) print("Input shape:", input_image.shape) print("Output shape:", output.shape)# torch.Size([4, 10]) * * * ## Output Size Calculation Formula for calculating output size of a convolutional layer: H_out = floor((H_in + 2 * padding - dilation * (kernel_size - 1) - 1) / stride) + 1 W_out = floor((W_in + 2 * padding - dilation * (kernel_size - 1) - 1) / stride) + 1 * * * ## Common Questions ### Q1: How to choose kernel size? Common kernel sizes: * `1x1`: Used to change channel count and add non-linearity * `3x3`: Most common, balances parameters and receptive field * `5x5`, `7x7`: Larger receptive field but more parameters ### Q2: How to choose padding and stride? * Use `padding = (kernel_size - 1) / 2` to maintain feature map size * Use `stride > 1` for downsampling * * * ## Application Scenarios `nn.Conv2d` is one of the most important layers in computer vision, with main applications including: * **Image Classification**: Extract image features, e.g., VGG, ResNet * **Object Detection**: YOLO, Faster R-CNN, etc. * **Semantic Segmentation**: U-Net, FCN, etc. * **Style Transfer**: Generate artistic images > Tip: In modern CNNs, 3x3 convolutions are most commonly used, as they cover sufficient spatial information while keeping parameter count low. * * * [![Image 2: PyTorch torch.nn Reference Manual](https://example.com/images/up.gif) PyTorch torch.nn Reference Manual](https://example.com/pytorch/pytorch-torch-nn-ref.html)
← Pytorch Torch Nn DropoutPytorch Autograd β†’