Pandas Df Pivot Table
[ Pandas Common Functions](#)
* * *
`df.pivot_table()` is a member method of DataFrame, used to **create a pivot table**. It is an enhanced version of `pivot()`, supporting aggregation operations and can handle duplicate row index and column index combinations.
Pivot table is one of the most commonly used tools in data analysis, which can group, summarize and reshape data from different angles, making it easy to discover patterns and trends in data.
**Word meaning**: `pivot_table` means "pivot table", which is a table that can dynamically change data layout and supports automatic aggregation of duplicate data.
* * *
## Basic Syntax and Parameters
`df.pivot_table()` is an instance method of DataFrame, called through the dot operator.
### Syntax Format
DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')
### Parameter Description
* **Parameter**: `index`
* Type: Column name, label, list of labels, or None.
* Description: The column to use as row index. Can be a single column or multiple columns (list), multiple columns will create a hierarchical index.
* **Parameter**: `columns`
* Type: Column name or label.
* Description: The column to use as column index, used to spread unique values as column names.
* **Parameter**: `values`
* Type: Column name, label, list of labels, or None.
* Description: The numeric column to aggregate. If not specified, all numeric columns will be aggregated.
* **Parameter**: `aggfunc`
* Type: Function, string, or list of functions.
* Description: Aggregation function. Common values include `'sum'` (sum), `'mean'` (average, default), `'count'` (count), `'min'` (minimum), `'max'` (maximum), `'median'` (median), `'std'` (standard deviation), etc.
* **Parameter**: `fill_value`
* Type: Scalar or None.
* Description: Value used to fill missing values. Default is None, keeping NaN.
* **Parameter**: `margins`
* Type: Boolean.
* Description: Whether to add row and column totals. Default is False. When set to True, summary data will be added at the end of rows and columns.
* **Parameter**: `margins_name`
* Type: String.
* Description: The name of the total row/column, default is 'All'. Only effective when margins=True.
### Function Description
* **Return value**: Returns a DataFrame, which is the pivot table.
* **Feature**: Unlike `pivot()`, `pivot_table()` will not error due to duplicate index+columns combinations, it will automatically use `aggfunc` for aggregation.
* * *
## Examples
Let's thoroughly master the usage of `df.pivot_table()` through a series of examples from simple to complex.
### Example 1: Basic Usage - Creating a Simple Pivot Table
## Example
import pandas as pd
# 1. Create sales data
sales = pd.DataFrame({
'product': ['A','B','C','A','B','C','A','B','C','A','B','C'],
'region': ['North','North','North','South','South','South','East','East','East','West','West','West'],
'sales': [100,150,200,180,170,160,190,200,210,220,230,240]
})
print("=== Sales data ===")
print(sales)
# 2. Create pivot table - summarize sales by product and region (default aggregation is mean)
pivot = sales.pivot_table(values='sales', index='product', columns='region', aggfunc='sum')
print("n=== df.pivot_table() Sales pivot table (sum)===")
print(pivot)
**Expected Output:**
=== Sales data === product region sales 0 A North 1001 B North 1502 C North 2003 A South 1804 B South 1705 C South 1606 A East 1907 B East 2008 C East 2109 A West 22010 B West 23011 C West 240=== df.pivot_table() Sales pivot table (sum)=== region East North South West product A 190 100 180 220 B 200 150 170 230 C 210 200 160 240
**Code Analysis:**
1. The original data is in long format, each row records a product's sales in a certain region.
2. `pivot_table()` uses product as row index, region as column index, and sales as values.
3. Use `aggfunc='sum'` to summarize the sales.
4. The result is a clear two-dimensional table, making it easy to compare sales performance of different products in different regions.
### Example 2: Using Different Aggregation Functions
The `aggfunc` parameter can specify various aggregation methods to analyze data.
## Example
import pandas as pd
import numpy as np
# 1. Create richer sales data (each product has multiple records)
sales = pd.DataFrame({
'product': ['A','B','A','B','A','B','A','B','A','B'],
'region': ['North','North','South','South','East','East','West','West','North','South'],
'sales': [100,150,200,180,190,200,220,230,110,190],
'quantity': [10,15,20,18,19,20,22,23,11,19]
})
print("=== Sales data ===")
print(sales)
# 2. Calculate average sales (default aggregation function)
print("n=== Average sales ===")
mean_result = sales.pivot_table(values='sales', index='product', columns='region', aggfunc='mean')
print(mean_result.round(1))
# 3. Use multiple aggregation functions at the same time (pass a dictionary)
print("n=== Multiple aggregation functions ===")
multi_agg = sales.pivot_table(
values='sales',
index='product',
columns='region',
aggfunc={'sales': ['sum','mean','max']}
)
print(multi_agg)
# 4. Different aggregation functions for different columns
print("n=== Different aggregations for different columns ===")
diff_agg = sales.pivot_table(
values=['sales','quantity'],
index='product',
columns='region',
aggfunc={'sales': 'sum','quantity': 'mean'}
)
print(diff_agg)
**Expected Output:**
===
YouTip