Pytorch Cnn

PyTorch Convolutional Neural Networks (CNN) is a class of deep learning models specifically designed to handle data with grid-like topological structures (such as images). CNN is the core technology for computer vision tasks such as image classification, object detection, and segmentation. The following image shows the structure and workflow of a typical Convolutional Neural Network (CNN) for image recognition tasks. !(#) In the image, the output layer of the CNN gives probabilities for three categories: Donald (0.2), Goofy (0.1), and Tweety (0.7), indicating that the network believes the input image is most likely Tweety. Here is a brief description of each component: * **Input Image**: The raw image data received by the network. * **Convolution**: Using a kernel to slide over the input image, extract features, and generate feature maps. * **Pooling**: Usually applied after convolutional layers, reducing the size of feature maps through max pooling or average pooling while retaining important features, generating pooled feature maps. * **Feature Extraction**: Gradually extracting high-level features of images through the combination of multiple convolution and pooling layers. * **Flatten Layer**: Converting multi-dimensional feature maps into one-dimensional vectors for input to fully connected layers. * **Fully Connected Layer**: Similar to traditional neural network layers, used to map extracted features to output categories. * **Classification**: The output layer of the network, performing classification based on the output of fully connected layers. * **Probabilistic Distribution**: The output layer gives the probability of each category, representing the likelihood that the input image belongs to each category. ### Basic Structure of Convolutional Neural Networks **1. Input Layer** Receives raw image data, which is typically represented as a three-dimensional array where two dimensions represent the width and height of the image, and the third dimension represents color channels (e.g., RGB images have three channels). **2. Convolutional Layer** Uses kernels to extract local features such as edges and textures. Formula: !(#) * x: Input image. * k: Kernel (weight matrix). * b: Bias. Applies a set of learnable filters (or kernels) to perform convolution operations on the input image to extract local features. Each filter slides over the input image to generate a feature map, representing the filter's activation at different positions. A convolutional layer can have multiple filters, each generating a feature map, and all feature maps together form a feature map set. ### 3. Activation Function Nonlinear activation functions such as ReLU (Rectified Linear Unit) are typically applied after convolutional layers to introduce nonlinearity, enabling the network to learn more complex patterns. The ReLU function is defined as: f(x)=max(0,x), which means if the input is less than 0, the output is 0; otherwise, the output is the input value. **4. Pooling Layer** * Used to reduce the spatial dimensions of feature maps, decreasing computational load and parameter count while retaining the most important feature information. * The most common pooling operations are max pooling and average pooling. * Max pooling selects the maximum value within a region, while average pooling calculates the average value within a region. **5. Normalization Layer (Optional)** * For example, Local Response Normalization (LRN) or Batch Normalization. * These layers help accelerate the training process and improve model stability. **6. Fully Connected Layer** * At the end of the CNN, flatten the feature maps extracted from previous layers into one-dimensional vectors, then input them to the fully connected layer. * Each neuron in the fully connected layer is connected to all neurons in the previous layer, used to synthesize features and perform final classification or regression. **7. Output Layer** Depending on the task, the output layer can have different forms. For classification tasks, the Softmax function is typically used to convert outputs into probability distributions, representing the probability that the input belongs to each category. **8. Loss Function** Used to measure the difference between model predictions and true labels. Common loss functions include Cross-Entropy Loss for multi-class classification tasks and Mean Squared Error (MSE) for regression tasks. **9. Optimizer** Used to update network weights based on the gradient of the loss function. Common optimizers include Stochastic Gradient Descent (SGD), Adam, RMSprop, etc. **10. Regularization (Optional)** Includes techniques such as Dropout and L1/L2 regularization to prevent model overfitting. These layers can be stacked to form deeper network structures to improve the model's learning capability. The depth and complexity of CNN can be adjusted according to the requirements of the task. * * * ## PyTorch Implementation of a CNN Example The following example demonstrates how to build a simple CNN model using PyTorch for digit classification on the MNIST dataset. Main steps: * **Data Loading and Preprocessing**: Use torchvision to load and preprocess MNIST data. * **Model Construction**: Define convolutional layers, pooling layers, and fully connected layers. * **Training**: Train the model through loss function and optimizer. * **Evaluation**: Calculate model accuracy on the test set. * **Visualization**: Display some test samples and their prediction results. ### 1. Import Required Libraries import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim ### 2. Data Loading Use the MNIST dataset provided by torchvision to load and preprocess data. ## Example transform = transforms.Compose([ transforms.ToTensor(),# Convert to tensor transforms.Normalize((0.5,),(0.5,))# Normalize to [-1, 1] ]) # Load MNIST dataset train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True) test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True) train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True) test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False) ### 3. Define CNN Model Use nn.Module to build a CNN. ## Example class SimpleCNN(nn.Module): def __init__ (self): super(SimpleCNN,self). __init__ () # Define convolutional layer: input 1 channel, output 32 channels, kernel size 3x3 self.conv1= nn.Conv2d(1,32, kernel_size=3, stride=1, padding=1) # Define convolutional layer: input 32 channels, output 64 channels self.conv2= nn.Conv2d(32,64, kernel_size=3, stride=1, padding=1) # Define fully connected layer self.fc1= nn.Linear(64 * 7 * 7,128)# Input size = feature map size * number of channels self.fc2= nn.Linear(128,10)# 10 categories def forward(self, x): x = F.relu(self.conv1(x))# First convolution + ReLU x = F.max_pool2d(x,2)# Max pooling x = F.relu(self.conv2(x))# Second convolution + ReLU x = F.max_pool2d(x,2)# Max pooling x = x.view(-1,64 * 7 * 7)# Flatten operation x = F.relu(self.fc1(x))# Fully connected layer + ReLU x =self.fc2(x)# Fully connected layer output return x # Create model instance model = SimpleCNN() ### 4. Define Loss Function and Optimizer Use cross-entropy loss and stochastic gradient descent optimizer. criterion = nn.CrossEntropyLoss() # Multi-class cross-entropy loss optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) # Learning rate and momentum ### 5. Train Model Train the model for 5 epochs, output training loss after each epoch. ## Example num_epochs =5 model.train()# Set to training mode for epoch in range(num_epochs): total_loss =0 for images, labels in train_loader: # Forward propagation outputs = model(images) loss = criterion(outputs, labels) # Backward propagation optimizer.zero_grad() loss.backward() optimizer.step() total_loss += loss.item() print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss / len(train_loader):.4f}") ### 6. Test Model Evaluate model accuracy on the test set. ## Example model.eval()# Set to evaluation mode correct =0 total =0 with torch.no_grad(): # No gradient calculation during evaluation for images, labels in test_loader: outputs = model(images) _, predicted = torch.max(outputs,1)# Predicted category total += labels.size(0) correct +=(predicted == labels).sum().item() accuracy =100 * correct / total print(f"Test Accuracy: {accuracy:.2f}%") ### 7. Complete Code The complete code is as follows: ## Example import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms # 1. Data loading and preprocessing transform = transforms.Compose([ transforms.ToTensor(),# Convert to tensor transforms.Normalize((0.5,),(0.5,))# Normalize to [-1, 1] ]) # Load MNIST dataset train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True) test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True) train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True) test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False) # 2. Define CNN model class SimpleCNN(nn.Module): def __init__ (self): super(SimpleCNN,self). __init__ () # Define convolutional layer self.conv1= nn.Conv2d(1,32, kernel_size=3, stride=1, padding=1)# Input 1 channel, output 32 channels self.conv2= nn.Conv2d(32,64, kernel_size=3, stride=1, padding=1)# Input 32 channels, output 64 channels # Define fully connected layer self.fc1= nn.Linear(64 * 7 * 7,128)# Input to fully connected layer after flattening self.fc2= nn.Linear(128,10)# 10 categories def forward(self, x): x = F.relu(self.conv1(x))# First convolution + ReLU x = F.max_pool2d(x,2)# Max pooling x = F.relu(self.conv2(x))# Second convolution + ReLU x = F.max_pool2d(x,2)# Max pooling x = x.view(-1,64 * 7 * 7)# Flatten x = F.relu(self.fc1(x))# Fully connected layer + ReLU x =self.fc2(x)# Final layer output return x # Create model instance model = SimpleCNN() # 3. Define loss function and optimizer criterion = nn.CrossEntropyLoss()# Multi-class cross-entropy loss optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) # 4. Model training num_epochs =5 model.train()# Set model to training mode for epoch in range(num_epochs): total_loss =0 for images, labels in train_loader: outputs = model(images)# Forward propagation loss = criterion(outputs, labels)# Calculate loss optimizer.zero_grad()# Clear gradients loss.backward()# Backward propagation optimizer.step()# Update parameters total_loss += loss.item() print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss / len(train_loader):.4f}") # 5. Model testing model.eval()# Set model to evaluation mode correct =0 total =0 with torch.no_grad(): # Disable gradient calculation for images, labels in test_loader: outputs = model(images) _, predicted = torch.max(outputs,1) total += labels.size(0) correct +=(predicted == labels).sum().item() accuracy =100 * correct / total print(f"Test Accuracy: {accuracy:.2f}%") ### Output Explanation **1. Training Loss Output** The code outputs the average loss once per epoch, for example: Epoch [1/5], Loss: 0.2325Epoch [2/5], Loss: 0.0526Epoch [3/5], Loss: 0.0366Epoch [4/5], Loss: 0.0273Epoch [5/5], Loss: 0.0221 **Explanation:** The gradual decrease in loss indicates that the model is gradually converging. **2. Test Set Accuracy** The code outputs the final classification accuracy on the test set, for example: Test Accuracy: 98.96% **Explanation:** The model achieves 98.96% classification accuracy on the MNIST test set, which is a good result for a simple CNN model. ### 7. Visualization Results We can visualize some samples from the test data along with their prediction results. ## Example import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms import matplotlib.pyplot as plt # 1. Data loading and preprocessing transform = transforms.Compose([ transforms.ToTensor(),# Convert to tensor transforms.Normalize((0.5,),(0.5,))# Normalize to [-1, 1] ]) # Load MNIST dataset

YouTip

Pytorch Cnn

📂 Categories