Pytorch Cnn
PyTorch Convolutional Neural Networks (CNN) is a class of deep learning models specifically designed to handle data with grid-like topological structures (such as images).
CNN is the core technology for computer vision tasks such as image classification, object detection, and segmentation.
The following image shows the structure and workflow of a typical Convolutional Neural Network (CNN) for image recognition tasks.
!(#)
In the image, the output layer of the CNN gives probabilities for three categories: Donald (0.2), Goofy (0.1), and Tweety (0.7), indicating that the network believes the input image is most likely Tweety.
Here is a brief description of each component:
* **Input Image**: The raw image data received by the network.
* **Convolution**: Using a kernel to slide over the input image, extract features, and generate feature maps.
* **Pooling**: Usually applied after convolutional layers, reducing the size of feature maps through max pooling or average pooling while retaining important features, generating pooled feature maps.
* **Feature Extraction**: Gradually extracting high-level features of images through the combination of multiple convolution and pooling layers.
* **Flatten Layer**: Converting multi-dimensional feature maps into one-dimensional vectors for input to fully connected layers.
* **Fully Connected Layer**: Similar to traditional neural network layers, used to map extracted features to output categories.
* **Classification**: The output layer of the network, performing classification based on the output of fully connected layers.
* **Probabilistic Distribution**: The output layer gives the probability of each category, representing the likelihood that the input image belongs to each category.
### Basic Structure of Convolutional Neural Networks
**1. Input Layer**
Receives raw image data, which is typically represented as a three-dimensional array where two dimensions represent the width and height of the image, and the third dimension represents color channels (e.g., RGB images have three channels).
**2. Convolutional Layer**
Uses kernels to extract local features such as edges and textures.
Formula:
!(#)
* x: Input image.
* k: Kernel (weight matrix).
* b: Bias.
Applies a set of learnable filters (or kernels) to perform convolution operations on the input image to extract local features.
Each filter slides over the input image to generate a feature map, representing the filter's activation at different positions.
A convolutional layer can have multiple filters, each generating a feature map, and all feature maps together form a feature map set.
### 3. Activation Function
Nonlinear activation functions such as ReLU (Rectified Linear Unit) are typically applied after convolutional layers to introduce nonlinearity, enabling the network to learn more complex patterns.
The ReLU function is defined as: f(x)=max(0,x), which means if the input is less than 0, the output is 0; otherwise, the output is the input value.
**4. Pooling Layer**
* Used to reduce the spatial dimensions of feature maps, decreasing computational load and parameter count while retaining the most important feature information.
* The most common pooling operations are max pooling and average pooling.
* Max pooling selects the maximum value within a region, while average pooling calculates the average value within a region.
**5. Normalization Layer (Optional)**
* For example, Local Response Normalization (LRN) or Batch Normalization.
* These layers help accelerate the training process and improve model stability.
**6. Fully Connected Layer**
* At the end of the CNN, flatten the feature maps extracted from previous layers into one-dimensional vectors, then input them to the fully connected layer.
* Each neuron in the fully connected layer is connected to all neurons in the previous layer, used to synthesize features and perform final classification or regression.
**7. Output Layer**
Depending on the task, the output layer can have different forms.
For classification tasks, the Softmax function is typically used to convert outputs into probability distributions, representing the probability that the input belongs to each category.
**8. Loss Function**
Used to measure the difference between model predictions and true labels.
Common loss functions include Cross-Entropy Loss for multi-class classification tasks and Mean Squared Error (MSE) for regression tasks.
**9. Optimizer**
Used to update network weights based on the gradient of the loss function. Common optimizers include Stochastic Gradient Descent (SGD), Adam, RMSprop, etc.
**10. Regularization (Optional)**
Includes techniques such as Dropout and L1/L2 regularization to prevent model overfitting.
These layers can be stacked to form deeper network structures to improve the model's learning capability.
The depth and complexity of CNN can be adjusted according to the requirements of the task.
* * *
## PyTorch Implementation of a CNN Example
The following example demonstrates how to build a simple CNN model using PyTorch for digit classification on the MNIST dataset.
Main steps:
* **Data Loading and Preprocessing**: Use torchvision to load and preprocess MNIST data.
* **Model Construction**: Define convolutional layers, pooling layers, and fully connected layers.
* **Training**: Train the model through loss function and optimizer.
* **Evaluation**: Calculate model accuracy on the test set.
* **Visualization**: Display some test samples and their prediction results.
### 1. Import Required Libraries
import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim
### 2. Data Loading
Use the MNIST dataset provided by torchvision to load and preprocess data.
## Example
transform = transforms.Compose([
transforms.ToTensor(),# Convert to tensor
transforms.Normalize((0.5,),(0.5,))# Normalize to [-1, 1]
])
# Load MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)
### 3. Define CNN Model
Use nn.Module to build a CNN.
## Example
class SimpleCNN(nn.Module):
def __init__ (self):
super(SimpleCNN,self). __init__ ()
# Define convolutional layer: input 1 channel, output 32 channels, kernel size 3x3
self.conv1= nn.Conv2d(1,32, kernel_size=3, stride=1, padding=1)
# Define convolutional layer: input 32 channels, output 64 channels
self.conv2= nn.Conv2d(32,64, kernel_size=3, stride=1, padding=1)
# Define fully connected layer
self.fc1= nn.Linear(64 * 7 * 7,128)# Input size = feature map size * number of channels
self.fc2= nn.Linear(128,10)# 10 categories
def forward(self, x):
x = F.relu(self.conv1(x))# First convolution + ReLU
x = F.max_pool2d(x,2)# Max pooling
x = F.relu(self.conv2(x))# Second convolution + ReLU
x = F.max_pool2d(x,2)# Max pooling
x = x.view(-1,64 * 7 * 7)# Flatten operation
x = F.relu(self.fc1(x))# Fully connected layer + ReLU
x =self.fc2(x)# Fully connected layer output
return x
# Create model instance
model = SimpleCNN()
### 4. Define Loss Function and Optimizer
Use cross-entropy loss and stochastic gradient descent optimizer.
criterion = nn.CrossEntropyLoss() # Multi-class cross-entropy loss optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) # Learning rate and momentum
### 5. Train Model
Train the model for 5 epochs, output training loss after each epoch.
## Example
num_epochs =5
model.train()# Set to training mode
for epoch in range(num_epochs):
total_loss =0
for images, labels in train_loader:
# Forward propagation
outputs = model(images)
loss = criterion(outputs, labels)
# Backward propagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss / len(train_loader):.4f}")
### 6. Test Model
Evaluate model accuracy on the test set.
## Example
model.eval()# Set to evaluation mode
correct =0
total =0
with torch.no_grad(): # No gradient calculation during evaluation
for images, labels in test_loader:
outputs = model(images)
_, predicted = torch.max(outputs,1)# Predicted category
total += labels.size(0)
correct +=(predicted == labels).sum().item()
accuracy =100 * correct / total
print(f"Test Accuracy: {accuracy:.2f}%")
### 7. Complete Code
The complete code is as follows:
## Example
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
# 1. Data loading and preprocessing
transform = transforms.Compose([
transforms.ToTensor(),# Convert to tensor
transforms.Normalize((0.5,),(0.5,))# Normalize to [-1, 1]
])
# Load MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)
# 2. Define CNN model
class SimpleCNN(nn.Module):
def __init__ (self):
super(SimpleCNN,self). __init__ ()
# Define convolutional layer
self.conv1= nn.Conv2d(1,32, kernel_size=3, stride=1, padding=1)# Input 1 channel, output 32 channels
self.conv2= nn.Conv2d(32,64, kernel_size=3, stride=1, padding=1)# Input 32 channels, output 64 channels
# Define fully connected layer
self.fc1= nn.Linear(64 * 7 * 7,128)# Input to fully connected layer after flattening
self.fc2= nn.Linear(128,10)# 10 categories
def forward(self, x):
x = F.relu(self.conv1(x))# First convolution + ReLU
x = F.max_pool2d(x,2)# Max pooling
x = F.relu(self.conv2(x))# Second convolution + ReLU
x = F.max_pool2d(x,2)# Max pooling
x = x.view(-1,64 * 7 * 7)# Flatten
x = F.relu(self.fc1(x))# Fully connected layer + ReLU
x =self.fc2(x)# Final layer output
return x
# Create model instance
model = SimpleCNN()
# 3. Define loss function and optimizer
criterion = nn.CrossEntropyLoss()# Multi-class cross-entropy loss
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# 4. Model training
num_epochs =5
model.train()# Set model to training mode
for epoch in range(num_epochs):
total_loss =0
for images, labels in train_loader:
outputs = model(images)# Forward propagation
loss = criterion(outputs, labels)# Calculate loss
optimizer.zero_grad()# Clear gradients
loss.backward()# Backward propagation
optimizer.step()# Update parameters
total_loss += loss.item()
print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss / len(train_loader):.4f}")
# 5. Model testing
model.eval()# Set model to evaluation mode
correct =0
total =0
with torch.no_grad(): # Disable gradient calculation
for images, labels in test_loader:
outputs = model(images)
_, predicted = torch.max(outputs,1)
total += labels.size(0)
correct +=(predicted == labels).sum().item()
accuracy =100 * correct / total
print(f"Test Accuracy: {accuracy:.2f}%")
### Output Explanation
**1. Training Loss Output**
The code outputs the average loss once per epoch, for example:
Epoch [1/5], Loss: 0.2325Epoch [2/5], Loss: 0.0526Epoch [3/5], Loss: 0.0366Epoch [4/5], Loss: 0.0273Epoch [5/5], Loss: 0.0221
**Explanation:** The gradual decrease in loss indicates that the model is gradually converging.
**2. Test Set Accuracy**
The code outputs the final classification accuracy on the test set, for example:
Test Accuracy: 98.96%
**Explanation:** The model achieves 98.96% classification accuracy on the MNIST test set, which is a good result for a simple CNN model.
### 7. Visualization Results
We can visualize some samples from the test data along with their prediction results.
## Example
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
# 1. Data loading and preprocessing
transform = transforms.Compose([
transforms.ToTensor(),# Convert to tensor
transforms.Normalize((0.5,),(0.5,))# Normalize to [-1, 1]
])
# Load MNIST dataset
YouTip