YouTip LogoYouTip

Pandas Pivot

Data reshaping is a common operation in data analysis, used to change the layout structure of data. Pandas provides functions like pivot, melt, stack, and unstack to convert between long format and wide format. * * * ## pivot Pivot Table `pivot` converts long format data to wide format, similar to Excel's pivot table functionality. ### Basic Usage ## Instance import pandas as pd # Create long format data df = pd.DataFrame({ "Date": ["2024-01-01","2024-01-01","2024-01-02","2024-01-02"], "Product": ["A","B","A","B"], "Sales Volume": [100,150,120,90] }) print("Long format Data:") print(df) print() # Convert to wide format pivot_df = df.pivot(index="Date", columns="Product", values="Sales Volume") print("Wide format Data:") print(pivot_df) ### Multiple Index ## Instance import pandas as pd df = pd.DataFrame({ "Year": ["2024","2024","2024","2024"], "Quarter": ["Q1","Q1","Q2","Q2"], "Product": ["A","B","A","B"], "Sales Volume": [100,150,120,90] }) print("Data:") print(df) print() # Multiple index pivot pivot_df = df.pivot(index=["Year","Quarter"], columns="Product", values="Sales Volume") print("Multi-index pivot:") print(pivot_df) ### Aggregation Function ## Instance import pandas as pd # Cases with duplicate values df = pd.DataFrame({ "Date": ["2024-01-01","2024-01-01","2024-01-01","2024-01-02"], "Product": ["A","A","B","B"], "Sales Volume": [100,110,80,90] }) print("Data with duplicate values:") print(df) print() # pivot does not support duplicate values by default, need to use pivot_table and specify aggregation function pivot_df = df.pivot_table(index="Date", columns="Product", values="Sales Volume", aggfunc="sum") print("Aggregate using sum:") print(pivot_df) print() # Use mean print("Aggregate using mean:") print(df.pivot_table(index="Date", columns="Product", values="Sales Volume", aggfunc="mean")) > `pivot` does not allow duplicate values, use `pivot_table` with aggregation function when there are duplicates. * * * ## melt Unpivot `melt` converts wide format data to long format, which is the inverse operation of pivot. ## Instance import pandas as pd # Create wide format data df = pd.DataFrame({ "Date": ["2024-01-01","2024-01-02"], "A": [100,120], "B": [150,90] }) print("Wide format Data:") print(df) print() # Convert to long format melted = df.melt(id_vars="Date", var_name="Product", value_name="Sales Volume") print("Long format Data:") print(melted) ### Keep Multiple Columns Unchanged ## Instance import pandas as pd df = pd.DataFrame({ "City": ["Beijing","Shanghai"], "2023Revenue": [1000,800], "2023Profit": [200,150], "2024Revenue": [1200,950], "2024Profit": [250,180] }) print("Wide format:") print(df) print() # Keep "City" unchanged, convert other columns to long format melted = df.melt( id_vars="City", var_name="Metrics", value_name="Numeric values" ) print("Long format:") print(melted) * * * ## stack and unstack stack and unstack are reshaping functions specifically for MultiIndex. ### unstack ## Instance import pandas as pd # Create data with multi-level index df = pd.DataFrame({ "A": [1,2,3,4], "B": [5,6,7,8] }, index=pd.MultiIndex.from_tuples( [("X",1),("X",2),("Y",1),("Y",2)], names=["Category","ID"] )) print("MultiIndex Data:") print(df) print() # unstack converts inner index to columns print("unstack After:") print(df.unstack()) ### stack ## Instance import pandas as pd # Create wide format (with column index) df = pd.DataFrame({ ("A","X"): [1,2], ("A","Y"): [3,4], ("B","X"): [5,6], ("B","Y"): [7,8] }) print("With multi-level column index:") print(df) print() # stack converts column index to inner index print("stack After:") print(df.stack()) * * * ## Practice: Business Report Conversion ## Instance import pandas as pd # Simulate business data - sales records sales = pd.DataFrame({ "Date": * 4, "Product": ["Mobile Phone","Computer","Tablet","Headphones"], "Channel": ["Online","Online","Offline","Online"], "Sales Revenue": [10000,20000,8000,5000] }) print("Raw sales Data:") print(sales) print() # Use pivot to convert pivot = sales.pivot_table( index="Product", columns="Channel", values="Sales Revenue", aggfunc="sum", fill_value=0 ) print("ChannelSales Pivot:") print(pivot) print() # Restore to long format print("Convert back to Long format:") print(pivot.reset_index().melt(id_vars="Product", var_name="Channel", value_name="Sales Revenue")) * * * ## Reshaping Function Selection | Function | Purpose | Typical Scenario | | --- | --- | --- | | `pivot` | Long→Wide | Row/column conversion | | `pivot_table` | Long→Wide (with aggregation) | When there are duplicate values | | `melt` | Wide→Long | Data organization | | `unstack` | Index→Column | Multi-level index | | `stack` | Column→Index | Multi-level index | > The purpose of data reshaping is to make data more suitable for analysis or display. Choose the appropriate conversion method based on downstream requirements.
← Pandas RollingPandas Merge β†’