PyTorch Basics
\n\nPyTorch is an open-source deep learning framework, widely popular for its flexibility and dynamic computation graphs.
\n\nPyTorch mainly has the following basic concepts: Tensor, Autograd, nn.Module, optim, etc.
\n\n- \n
- Tensor: The core data structure in PyTorch, supporting multi-dimensional arrays and accelerated computation on CPU or GPU. \n
- Autograd: PyTorch provides automatic differentiation, making it easy to compute gradients for models, facilitating backpropagation and optimization. \n
- Neural Network (nn.Module): PyTorch provides a simple yet powerful API for building neural network models, making it convenient to perform forward propagation and define models. \n
- Optimizers: Use optimizers (such as Adam, SGD, etc.) to update model parameters to minimize loss. \n
- Device: Models and tensors can be moved to GPU to accelerate computation. \n
\n\n
PyTorch Architecture Overview
\n\nPyTorch adopts a modular design, consisting of multiple core components that work together. Understanding the roles and relationships of these components is key to mastering PyTorch.
\n\nPyTorch Architecture Diagram
\n\n\nβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\nβ PyTorch Ecosystem β\nβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€\nβ torchvision β torchtext β torchaudio β Other specialized libsβ\nβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€\nβ PyTorch Core β\nβββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββββββββββββ€\nβ torch.nn β torch.optim β torch.utils β\nβ (Neural Net) β (Optimizers) β (Utilities) β\nβββββββββββββββββΌββββββββββββββββββΌββββββββββββββββββββββββββββ€\nβ β β torch.utils.data β\nβ torch Core β autograd β (Data Loading) β\nβ (Tensor Calc) β (Auto Diff) β β\nβββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββββββββββββ\n\n\n
PyTorch adopts a layered architecture design, from top to bottom:
\n\n1. Python API (Top Layer)
\n\n- \n
torch: Core tensor computation (similar to NumPy, with GPU support). \ntorch.nn: Neural network layers, loss functions, etc. \ntorch.autograd: Automatic differentiation (backpropagation). \n- Developer-facing interfaces, simple and easy to use. \n
2. C++ Core (Middle Layer)
\n\n- \n
- ATen: Core tensor operation library (400+ operations). \n
- JIT: Just-in-time compilation for model optimization. \n
- Autograd Engine: Low-level implementation of automatic differentiation. \n
- High-performance computation, connecting Python with underlying hardware. \n
3. Base Libraries (Bottom Layer)
\n\n- \n
- TH/THNN: C language implementation of basic tensor and neural network operations. \n
- THC/THCUNN: Corresponding CUDA (GPU) versions. \n
- Direct hardware manipulation (CPU/GPU),Ultimate optimized for speed. \n
Execution Flow:
\n\nPython code β C++ core computation β Underlying CUDA/C library acceleration β Return results.
\n\nBoth maintaining ease of use and ensuring high performance.
\n\nTensor
\n\nTensor is the core data structure in PyTorch, used for storing and manipulating multi-dimensional arrays.
\n\nTensors can be viewed as multi-dimensional arrays that support accelerated computation operations.
\n\nIn PyTorch, the concept of tensors is similar to arrays in NumPy, but PyTorch tensors can run on different devices such as CPU and GPU, making them very suitable for large-scale parallel computation, especially in the deep learning field.
\n\n- \n
- Dimensionality: The dimensionality of a tensor refers to its multi-dimensional array structure. For example, a scalar (0-dimensional tensor) is a single number, a vector (1-dimensional tensor) is a one-dimensional array, a matrix (2-dimensional tensor) is a two-dimensional array, and so on. \n\n
- Shape: The shape of a tensor refers to the size of each dimension. For example, a tensor with shape
(3, 4)means it has 3 rows and 4 columns. \n\n - Dtype: The data type of a tensor defines the memory size required to store each element and how it is interpreted. PyTorch supports multiple data types, including integer types (such as
torch.int8,torch.int32), floating-point types (such astorch.float32,torch.float64), and boolean type (torch.bool). \n
Tensor Creation:
\n\nExample
\n\nimport torch\n\n# Create a 2x3 tensor of all zeros\na = torch.zeros(2,3)\nprint(a)\n\n# Create a 2x3 tensor of all ones\nb = torch.ones(2,3)\nprint(b)\n\n# Create a 2x3 tensor of random numbers\nc = torch.randn(2,3)\nprint(c)\n\n# Create tensor from NumPy array\nimport numpy as np\nnumpy_array = np.array([[1,2],[3,4]])\ntensor_from_numpy = torch.from_numpy(numpy_array)\nprint(tensor_from_numpy)\n\n# Create tensor on specified device (CPU/GPU)\ndevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")\nd = torch.randn(2,3, device=device)\nprint(d)\n\n\nOutput results similar to:
\n\ntensor([[0., 0., 0.], [0., 0., 0.]]) tensor([[1., 1., 1.], [1., 1., 1.]]) tensor([[ 1.0189, -0.5718, -1.2814], [-0.5865, 1.0855, 1.1727]]) tensor([[1, 2], [3, 4]]) tensor([[-0.3360, 0.2203, 1.3463], [-0.5982, -0.2704, 0.5429]])\n\n
Common Tensor Operations:
\n\nExample
\n\n# Tensor addition\ne = torch.randn(2,3)\nf = torch.randn(2,3)\nprint(e + f)\n\n# Element-wise multiplication\nprint(e * f)\n\n# Tensor transpose\ng = torch.randn(3,2)\nprint(g.t()) # or g.transpose(0, 1)\n\n# Tensor shape\nprint(g.shape) # returns shape\n\n\nTensor and Device
\n\nPyTorch tensors can exist on different devices, including CPU and GPU. You can move tensors to GPU to accelerate computation:
\n\nif torch.cuda.is_available():\n tensor_gpu = tensor_from_list.to('cuda') # Move tensor to GPU\n\n\nGradient and Automatic Differentiation
\n\nPyTorch tensors support automatic differentiation, which is a key feature in deep learning. When you create a tensor that requires gradients, PyTorch can automatically compute its gradients:
\n\nExample
\n\n# Create a tensor that requires gradient\ntensor_requires_grad = torch.tensor([1.0], requires_grad=True)\n\n# Perform some operations\ntensor_result = tensor_requires_grad * 2\n\n# Compute gradient\ntensor_result.backward()\nprint(tensor_requires_grad.grad) # Output gradient\n\n\nMemory and Performance
\n\nPyTorch tensors also provide some memory management features, such as .clone(), .detach(), and .to() methods, which can help you optimize memory usage and improve performance.
\n\n
Autograd
\n\nAutomatic Differentiation (Autograd) is a core feature in deep learning frameworks that allows computers to automatically compute derivatives of mathematical functions.
\n\nIn deep learning, autograd is mainly used for two purposes: computing gradients during neural network training, and implementing the backpropagation algorithm.
\n\nAutograd is based on the Chain Rule, a mathematical rule for computing derivatives of complex functions. The chain rule states that the derivative of a composite function is the product of the derivatives of its components. In deep learning, models are typically complex functions composed of many layers, and autograd can efficiently compute gradients for these layers.
\n\nDynamic Graph vs Static Graph:
\n\n- \n
- Dynamic Graph: In a dynamic graph, the computation graph is built dynamically at runtime. The graph updates with each operation execution, making debugging and modifying models easier. PyTorch uses dynamic graphs. \n\n
- Static Graph: In a static graph, the computation graph is built before execution and does not change. TensorFlow originally used static graphs, but later also supported dynamic graphs. \n
PyTorch provides automatic differentiation functionality through the autograd module.
\n\nThe torch.Tensor object has a requires_grad attribute, used to indicate whether gradients need to be computed for that tensor.
When you create a tensor with requires_grad=True, PyTorch will automatically track all operations on it, so that gradients can be computed later.
Creating tensors that require gradients:
\n\nExample
\n\n# Create a tensor that requires gradient computation\nx = torch.randn(2,2, requires_grad=True)\nprint(x)\n\n# Perform some operations\ny = x + 2\nz = y * y * 3\nout = z.mean()\nprint(out)\n\n\nOutput results similar to:
\n\ntensor([[0., 0., 0.], [0., 0., 0.]]) tensor([[1., 1., 1.], [1., 1., 1.]]) tensor([[ 1.0189, -0.5718, -1.2814], [-0.5865, 1.0855, 1.1727]]) tensor([[1, 2], [3, 4]]) tensor([[-0.3360, 0.2203, 1.3463], [-0.5982, -0.2704, 0.5429]]) tianqixin@Mac-mini -test % python3 test.py tensor([[-0.1908, 0.2811], [ 0.8068, 0.8002]], requires_grad=True) tensor(18.1469, grad_fn=<MeanBackward0>)\n\n
Backpropagation
\n\nOnce the computation graph is defined, gradients can be computed using the .backward() method.
Example
\n\n# Backpropagation, compute gradients\nout.backward()\n\n# View x's gradient\nprint(x.grad)\n\n\nIn neural network training, autograd is mainly used to implement the backpropagation algorithm.
\n\nBackpropagation is a method for training neural networks by computing gradients of the loss function with respect to network parameters. In each iteration, the forward propagation of the network computes the output and loss, then backpropagation computes the gradients of the loss with respect to each parameter, and uses these gradients to update the parameters.
\n\nStopping Gradient Computation
\n\nIf you don't want certain tensors' gradients to be computed (for example, when you don't need backpropagation), you can use torch.no_grad() or set requires_grad=False.
Example
\n\n# Use torch.no_grad() to disable gradient computation\nwith torch.no_grad():\n y = x * 2\n\n\n\n\n
Neural Network (nn.Module)
\n\nA neural network is a computational model that mimics the connections of neurons in the human brain, consisting of multiple layers of nodes (neurons) that learn complex patterns and relationships in data.
\n\nNeural networks optimize prediction results by adjusting the connection weights between neurons, a process involving forward propagation, loss calculation, backpropagation, and parameter updates.
\n\nTypes of neural networks include feedforward neural networks, convolutional neural networks (CNN), recurrent neural networks (RNN), and long short-term memory networks (LSTM), which have wide applications in image recognition, speech processing, natural language processing, and other fields.
\n\nPyTorch provides a very convenient interface for building neural network models, namely torch.nn.Module.
We can inherit from the nn.Module class and define our own network layers.
Creating a simple neural network:
\n\nExample
\n\nimport torch.nn as nn\nimport torch.optim as optim\n\n# Define a simple fully connected neural network\nclass SimpleNN(nn.Module):\n def __init__(self):\n super(SimpleNN, self).__init__()\n self.fc1 = nn.Linear(2, 2) # Input layer to hidden layer\n self.fc2 = nn.Linear(2, 1) # Hidden layer to output layer\n\n def forward(self, x):\n x = torch.relu(self.fc1(x)) # ReLU activation function\n x = self.fc2(x)\n return x\n\n# Create network instance\nmodel = SimpleNN()\n\n# Print model structure\nprint(model)\n\n\nOutput:
\n\nSimpleNN(\n (fc1): Linear(in_features=2, out_features=2, bias=True)\n (fc2): Linear(in_features=2, out_features=1, bias=True)\n)\n\n
Training Process:
\n\n- \n
- Forward Propagation: In the forward propagation stage, input data is passed through network layers, with each layer applying weights and activation functions, until output is produced. \n\n
- Calculate Loss: Based on the network's output and true labels, compute the value of the loss function. \n\n
- Backpropagation: Backpropagation uses automatic differentiation technology to compute gradients of the loss function with respect to each parameter. \n\n
- Parameter Update: Use the optimizer to update the network's weights and biases based on gradients. \n\n
- Iteration: Repeat the above process until the model's performance on training data reaches a satisfactory level. \n
Forward Propagation and Loss Calculation
\n\nExample
\n\n# Random input\nx = torch.randn(1, 2)\n\n# Forward propagation\noutput = model(x)\nprint(output)\n\n# Define loss function (e.g., Mean Squared Error MSE)\ncriterion = nn.MSELoss()\n\n# Assume target value is 1\ntarget = torch.randn(1, 1)\n\n# Calculate loss\nloss = criterion(output, target)\nprint(loss)\n\n\nOptimizers
\n\nOptimizers update neural network parameters during training to reduce the value of the loss function.
\n\nPyTorch provides multiple optimizers, such as SGD, Adam, etc.
\n\nUsing optimizers for parameter updates:
\n\nExample
\n\n# Define optimizer (using Adam optimizer)\noptimizer = optim.Adam(model.parameters(), lr=0.001)\n\n# Training steps\noptimizer.zero_grad() # Clear gradients\nloss.backward() # Backpropagation\noptimizer.step() # Update parameters\n\n\n\n\n
Training Models
\n\nTraining models is the core process in machine learning and deep learning, aiming to learn model parameters through large amounts of data, so that the model can make accurate predictions on new, unseen data.
\n\nTraining models typically includes the following steps:
\n\n- \n
- Data Preparation:\n
- \n
- Collect and process data, including cleaning, standardization, and normalization. \n
- Split data into training set, validation set, and test set. \n
\n\n - Define Model:\n
- \n
- Choose model architecture, such as decision tree, neural network, etc. \n
- Initialize model parameters (weights and biases). \n
\n\n - Choose Loss Function:\n
- \n
- Choose appropriate loss function based on task type (such as classification, regression). \n
\n\n - Choose Optimizer:\n
- \n
- Choose an optimization algorithm, such as SGD, Adam, etc., to update model parameters. \n
\n\n - Forward Propagation:\n
- \n
- In each iteration, pass input data through the model to compute predicted output. \n
\n\n - Calculate Loss:\n
- \n
- Use loss function to evaluate the difference between predicted output and true labels. \n
\n\n - Backpropagation:\n
- \n
- Use automatic differentiation to compute gradients of loss with respect to model parameters. \n
\n\n - Parameter Update:\n
- \n
- Update model parameters based on computed gradients and optimizer strategy. \n
\n\n - Iterative Optimization:\n
- \n
- Repeat steps 5-8 until model performance on validation set no longer improves or reaches predetermined number of iterations. \n
\n\n - Evaluation and Testing:\n
- \n
- Evaluate final model performance using test set, ensuring model is not overfitting. \n
\n\n - Model Tuning:\n
- \n
- Adjust parameters based on model performance on test set, such as changing learning rate, adding regularization, etc. \n
\n\n - Deploy Model:\n
- \n
- Deploy trained model to production environment for actual prediction tasks. \n
\n
Example
\n\nimport torch\nimport torch.nn as nn\nimport torch.optim as optim\n\n# 1. Define a simple neural network model\nclass SimpleNN(nn.Module):\n def __init__(self):\n super(SimpleNN, self).__init__()\n self.fc1 = nn.Linear(
YouTip