Pandas Pivot
Data reshaping is a common operation in data analysis, used to change the layout structure of data. Pandas provides functions like pivot, melt, stack, and unstack to convert between long format and wide format.
* * *
## pivot Pivot Table
`pivot` converts long format data to wide format, similar to Excel's pivot table functionality.
### Basic Usage
## Instance
import pandas as pd
# Create long format data
df = pd.DataFrame({
"Date": ["2024-01-01","2024-01-01","2024-01-02","2024-01-02"],
"Product": ["A","B","A","B"],
"Sales Volume": [100,150,120,90]
})
print("Long format Data:")
print(df)
print()
# Convert to wide format
pivot_df = df.pivot(index="Date", columns="Product", values="Sales Volume")
print("Wide format Data:")
print(pivot_df)
### Multiple Index
## Instance
import pandas as pd
df = pd.DataFrame({
"Year": ["2024","2024","2024","2024"],
"Quarter": ["Q1","Q1","Q2","Q2"],
"Product": ["A","B","A","B"],
"Sales Volume": [100,150,120,90]
})
print("Data:")
print(df)
print()
# Multiple index pivot
pivot_df = df.pivot(index=["Year","Quarter"], columns="Product", values="Sales Volume")
print("Multi-index pivot:")
print(pivot_df)
### Aggregation Function
## Instance
import pandas as pd
# Cases with duplicate values
df = pd.DataFrame({
"Date": ["2024-01-01","2024-01-01","2024-01-01","2024-01-02"],
"Product": ["A","A","B","B"],
"Sales Volume": [100,110,80,90]
})
print("Data with duplicate values:")
print(df)
print()
# pivot does not support duplicate values by default, need to use pivot_table and specify aggregation function
pivot_df = df.pivot_table(index="Date", columns="Product", values="Sales Volume", aggfunc="sum")
print("Aggregate using sum:")
print(pivot_df)
print()
# Use mean
print("Aggregate using mean:")
print(df.pivot_table(index="Date", columns="Product", values="Sales Volume", aggfunc="mean"))
> `pivot` does not allow duplicate values, use `pivot_table` with aggregation function when there are duplicates.
* * *
## melt Unpivot
`melt` converts wide format data to long format, which is the inverse operation of pivot.
## Instance
import pandas as pd
# Create wide format data
df = pd.DataFrame({
"Date": ["2024-01-01","2024-01-02"],
"A": [100,120],
"B": [150,90]
})
print("Wide format Data:")
print(df)
print()
# Convert to long format
melted = df.melt(id_vars="Date", var_name="Product", value_name="Sales Volume")
print("Long format Data:")
print(melted)
### Keep Multiple Columns Unchanged
## Instance
import pandas as pd
df = pd.DataFrame({
"City": ["Beijing","Shanghai"],
"2023Revenue": [1000,800],
"2023Profit": [200,150],
"2024Revenue": [1200,950],
"2024Profit": [250,180]
})
print("Wide format:")
print(df)
print()
# Keep "City" unchanged, convert other columns to long format
melted = df.melt(
id_vars="City",
var_name="Metrics",
value_name="Numeric values"
)
print("Long format:")
print(melted)
* * *
## stack and unstack
stack and unstack are reshaping functions specifically for MultiIndex.
### unstack
## Instance
import pandas as pd
# Create data with multi-level index
df = pd.DataFrame({
"A": [1,2,3,4],
"B": [5,6,7,8]
}, index=pd.MultiIndex.from_tuples(
[("X",1),("X",2),("Y",1),("Y",2)],
names=["Category","ID"]
))
print("MultiIndex Data:")
print(df)
print()
# unstack converts inner index to columns
print("unstack After:")
print(df.unstack())
### stack
## Instance
import pandas as pd
# Create wide format (with column index)
df = pd.DataFrame({
("A","X"): [1,2],
("A","Y"): [3,4],
("B","X"): [5,6],
("B","Y"): [7,8]
})
print("With multi-level column index:")
print(df)
print()
# stack converts column index to inner index
print("stack After:")
print(df.stack())
* * *
## Practice: Business Report Conversion
## Instance
import pandas as pd
# Simulate business data - sales records
sales = pd.DataFrame({
"Date": * 4,
"Product": ["Mobile Phone","Computer","Tablet","Headphones"],
"Channel": ["Online","Online","Offline","Online"],
"Sales Revenue": [10000,20000,8000,5000]
})
print("Raw sales Data:")
print(sales)
print()
# Use pivot to convert
pivot = sales.pivot_table(
index="Product",
columns="Channel",
values="Sales Revenue",
aggfunc="sum",
fill_value=0
)
print("ChannelSales Pivot:")
print(pivot)
print()
# Restore to long format
print("Convert back to Long format:")
print(pivot.reset_index().melt(id_vars="Product", var_name="Channel", value_name="Sales Revenue"))
* * *
## Reshaping Function Selection
| Function | Purpose | Typical Scenario |
| --- | --- | --- |
| `pivot` | LongβWide | Row/column conversion |
| `pivot_table` | LongβWide (with aggregation) | When there are duplicate values |
| `melt` | WideβLong | Data organization |
| `unstack` | IndexβColumn | Multi-level index |
| `stack` | ColumnβIndex | Multi-level index |
> The purpose of data reshaping is to make data more suitable for analysis or display. Choose the appropriate conversion method based on downstream requirements.
YouTip