Pandas Multiindex

MultiIndex is a powerful indexing feature in Pandas that allows creating multi-level hierarchies on rows or columns. It is particularly useful when dealing with high-dimensional data, group statistics, and panel data. * * * ## Creating MultiIndex ### Creating from Lists ## Example import pandas as pd # Method 1: Create using arrays arrays =[ ["A","A","B","B","C","C"], [1,2,1,2,1,2] ] # Using pd.MultiIndex.from_arrays index = pd.MultiIndex.from_arrays(arrays, names=["Category","Number"]) print("Created from arrays:") print(index) print() # Method 2: Create using tuple list tuples =[ ("A",1),("A",2),("B",1),("B",2),("C",1),("C",2) ] index = pd.MultiIndex.from_tuples(tuples, names=["Category","Number"]) print("Created from tuples:") print(index) print() # Method 3: Using product index = pd.MultiIndex.from_product( [["A","B","C"],[1,2,3]], names=["Category","Number"] ) print("Created from product:") print(index) ### Creating DataFrame with MultiIndex ## Example import pandas as pd import numpy as np # Create DataFrame with MultiIndex df = pd.DataFrame( np.random.randn(6,4), index=pd.MultiIndex.from_tuples( [("2024","Q1"),("2024","Q2"),("2024","Q3"), ("2025","Q1"),("2025","Q2"),("2025","Q3")] ), columns=["Beijing","Shanghai","Guangzhou","Shenzhen"] ) df.index.names=["Year","Quarter"] print("DataFrame with MultiIndex:") print(df) * * * ## Accessing MultiIndex ### Using loc/iloc ## Example import pandas as pd # Create sample data df = pd.DataFrame({ "Chinese": [85,92,78,88], "Math": [90,88,95,82] }, index=pd.MultiIndex.from_tuples( [("Grade 1","Class A"),("Grade 1","Class B"),("Grade 2","Class A"),("Grade 2","Class B")], names=["Grade","Class"] )) print("Original data:") print(df) print() # Access outer index print("Access all Grade 1:") print(df.loc) print() # Access inner index print("Access Class A:") print(df.loc[:,"Class A"]) print() # Access multi-level index print("Access Grade 1 Class A:") print(df.loc[("Grade 1","Class A")]) ### Using xs ## Example import pandas as pd # Create sample data df = pd.DataFrame({ "Chinese": [85,92,78,88], "Math": [90,88,95,82] }, index=pd.MultiIndex.from_tuples( [("Grade 1","Class A"),("Grade 1","Class B"),("Grade 2","Class A"),("Grade 2","Class B")], names=["Grade","Class"] )) # Use xs to access specific level values print("Using xs to access Grade 1:") print(df.xs("Grade 1", level="Grade")) print() print("Using xs to access Class A:") print(df.xs("Class A", level="Class")) * * * ## Transforming MultiIndex ### Stacking and Unstacking ## Example import pandas as pd import numpy as np # Create wide format DataFrame df = pd.DataFrame( np.arange(12).reshape(3,4), index=pd.MultiIndex.from_tuples( [("Beijing","2024"),("Shanghai","2024"),("Guangzhou","2024")] ), columns=pd.MultiIndex.from_tuples( [("Q1","Revenue"),("Q1","Profit"),("Q2","Revenue"),("Q2","Profit")] ) ) print("Original data (nested columns):") print(df) print() # unstack: convert inner index to columns df_unstacked = df.unstack() print("After unstack:") print(df_unstacked) print() # stack: convert columns back to inner index df_stacked = df_unstacked.stack() print("After stack:") print(df_stacked) * * * ## Sorting MultiIndex ## Example import pandas as pd # Create DataFrame with shuffled indices df = pd.DataFrame({ "Value": [1,2,3,4,5,6] }, index=pd.MultiIndex.from_tuples( [("C",2),("A",1),("B",2),("A",2),("C",1),("B",1)], names=["Letter","Number"] )) print("Shuffled data:") print(df) print() # Sort by outer level df_sorted1 = df.sort_index() print("Sorted by outer level:") print(df_sorted1) print() # Sort by inner level df_sorted2 = df.sort_index(level=1) print("Sorted by inner level:") print(df_sorted2) print() # Multi-level sort df_sorted3 = df.sort_index(level=[0,1]) print("Sorted by multiple levels:") print(df_sorted3) * * * ## Practical Example: Group Statistics MultiIndex is ideal for group statistics and pivot analysis. ## Example import pandas as pd import numpy as np # Create sales data np.random.seed(42) df = pd.DataFrame({ "Year": * 6 + * 6, "Quarter": ["Q1","Q2","Q3","Q4"] * 3, "Product": ["Phone","Phone","Computer","Computer"] * 3, "Region": ["East China","South China","North China"] * 4, "Sales": np.random.randint(100,500,12) }) print("Original sales data:") print(df) print() # Set MultiIndex and perform group statistics df_grouped = df.set_index(["Year","Quarter","Product","Region"]) print("Grouped by Year, Quarter, Product, Region:") print(df_grouped) # Summarize by year yearly = df_grouped.groupby(level="Year").sum() print("n Annual Sales:") print(yearly) # Summarize by year and quarter quarterly = df_grouped.groupby(level=["Year","Quarter"]).sum() print("n Quarterly Sales:") print(quarterly) * * * ## Common Issues and Considerations **1. Confused Index Levels** Be careful when manipulating data to avoid level confusion. It's recommended to check `df.index.names` before and after operations. **2. Incorrect Index Access** `loc` uses label-based access, while `iloc` uses position-based access. Do not mix them. **3. Duplicate Indices During concat** When concatenating data with `concat`, duplicate indices may cause unexpected results. You can use `ignore_index=True` to reset indices. > MultiIndex is a core feature of Pandas for handling high-dimensional data. Proper use of MultiIndex can make data organization clearer and statistical analysis more convenient.

YouTip

Pandas Multiindex

📂 Categories