YouTip LogoYouTip

Pandas Df Stack

[![Image 1: Pandas Common Functions](#) Pandas Common Functions](#) * * * `df.stack()` is a member method of DataFrame used to **convert column indices into row indices**. It "stacks" columns from wide-format data into rows, producing a Series or DataFrame with hierarchical indexing. This is an important operation in data reshaping and is often used together with `unstack()` to convert data back and forth. **Word Definition**: `stack` means "to pile up", here it refers to piling column data into rows, transforming a wide table into a long table. * * * ## Basic Syntax and Parameters `df.stack()` is an instance method of DataFrame, called using dot notation. ### Syntax Format DataFrame.stack(level=-1, dropna=True) ### Parameter Description * **Parameter**: `level` * Type: int, str, int list, or str list. * Description: Specifies which level to stack. For multi-level indices, you can specify a specific level number (starting from 0) or level name. Default is -1, indicating the innermost level. * **Parameter**: `dropna` * Type: Boolean. * Description: Whether to drop rows where all values are NaN in the result. Default is True, meaning rows with all NaN values will be removed. ### Function Description * **Return Value**: Returns a Series or DataFrame. If only one column is stacked from the original DataFrame, a Series is returned; otherwise, a DataFrame is returned. * **Effect**: Converts column indices into row indices, producing a longer data format. The stacked data is in "long format", suitable for statistical analysis or visualization. * * * ## Examples Let's go through a series of examples from simple to complex to fully master the usage of `df.stack()`. ### Example 1: Basic Usage - Convert Columns to Rows ## Example import pandas as pd import numpy as np # 1. Create wide-format data (each quarter is a column) df = pd.DataFrame({ 'name': ['Alice','Bob','Charlie'], 'Q1': [100,150,200], 'Q2': [120,160,210], 'Q3': [130,170,220] }) print("=== Original Wide-Format Data ===") print(df) print(f"Column Names: {df.columns.tolist()}") # 2. Use df.stack() to convert columns to rows stacked = df.stack() print("n=== df.stack() Result ===") print(stacked) print(f"nType: {type(stacked)}") # 3. Convert result to DataFrame for viewing print("n=== Converted to DataFrame ===") stacked_df = stacked.reset_index() stacked_df.columns=['name','quarter','sales'] print(stacked_df) **Expected Output:** === Original Wide-Format Data === name Q1 Q2 Q3 0 Alice 100 120 1301 Bob 150 160 1702 Charlie 'Q2', 'Q3'], columns=['name', 'quarter', 'sales']... **Code Explanation:** 1. The original data is in wide format, with Q1, Q2, Q3 being three quarterly columns. 2. `stack()` "stacks" the Q1, Q2, Q3 columns into rows. 3. The result is a Series type with two levels of index: outer level is the original row index, inner level is the original column names (quarters). 4. The stacked data becomes "long format", with each row containing only one quarterly value. ### Example 2: Stacking Multi-Level Column Indices When a DataFrame has multi-level column indices, you can specify which level to stack. ## Example import pandas as pd import numpy as np # 1. Create data with multi-level column indices # Outer level: Product Type (Electronics, Furniture) # Inner level: Quarter (Q1, Q2) df = pd.DataFrame( [[100,110,200,210],[150,160,180,190]], index=['Store_A','Store_B'], columns=pd.MultiIndex.from_tuples([ ('Electronics','Q1'),('Electronics','Q2'), ('Furniture','Q1'),('Furniture','Q2') ], names=['Product','Quarter']) ) print("=== Original Data (Multi-Level Column Index) ===") print(df) print(f"Column Index Levels: {df.columns.names}") # 2. Stack inner level columns (default, level=-1) print("n=== Stack Inner Level Columns (level=-1) ===") stacked_inner = df.stack() print(stacked_inner) # 3. Stack outer level columns print("n=== Stack Outer Level Columns (level=0) ===") stacked_outer = df.stack(level=0) print(stacked_outer) # 4. Stack all columns (produce even longer format) print("n=== Stack All Columns ===") stacked_all = df.stack(level=[0,1]) print(stacked_all) **Expected Output:** === Original Data (Multi-Level Column Index) ===Product Electronics FurnitureQuarter Q1 Q2 Q1 Q ... **Code Explanation:** * The DataFrame with multi-level column indices has column names in tuple form: (Product, Quarter). * `level=-1` (default) stacks the innermost level, i.e., the Quarter level, leaving Product as a column. * `level=0` stacks the outer level, i.e., the Product level, leaving Quarter as a column. * This flexibility allows you to choose which level to stack based on your analytical needs. ### Example 3: Handling Missing Values Use the `dropna` parameter to control how missing values are handled. ## Example import pandas as pd import numpy as np # 1. Create data with missing values df = pd.DataFrame({ 'name': ['Alice','Bob'], 'Math': [100, np.nan], 'English': [90,85], 'Science': [np.nan,95] }) print("=== Original Data (With Missing Values) ===") print(df) # 2. dropna=True (default), remove rows with all NaN print("n=== dropna=True (Default) ===") stacked_drop = df.stack(dropna=True) print(stacked_drop) # 3. dropna=False, keep NaN values print("n=== dropna=False ===") stacked_keep = df.stack(dropna=False) print(stacked_keep) # 4. Practical application: organize stacked results into analyzable long format print("n=== Organize into Long-Format DataFrame ===") result = df.set_index('name').stack().reset_index() result.columns=['name','subject','score'] print(result) # 5. Remove missing values result_clean = result.dropna() print("n=== After Removing Missing Values ===") print(result_clean) **Expected Output:** === Original Data (With Missing Values) === name Math English Science0 Alice 100.0 90.0 NaN1 Bob NaN 85.0 95.02=== Stack Inner Level Columns (level=-1) ===Store_A Electronics Q1 100 Q2 110 Furniture Q1 200 Q2 210Store_B Electronics Q1 150 ... **Code Explanation:** * `dropna=True` (default) removes rows where all values are NaN. * In data analysis, it's common to preserve the original data and then remove missing values as needed. * After stacking, use `reset_index()` to convert hierarchical indices into regular columns for easier processing. ### Example 4: Using stack with unstack `stack()` and `unstack()` are inverse operations. Used together, they allow flexible data reshaping. ## Example import pandas as pd # 1. Create initial data df = pd.DataFrame({ 'product': ['A','B','C'], 'North': [100,150,200], 'South': [180,170,160], 'East': [190,200,210] }) print("=== Original Data ===") print(df) # 2. Use unstack to convert regions into columns (first make it multi-level indexed) df_indexed = df.set_index('product') print("n=== After Setting Index ===") print(df_indexed) # 3. Use stack to convert columns to rows (stack is the inverse of unstack) print("n=== unstack + stack Round-trip ===") # First unstack unstacked = df_indexed.unstack() print("unstack result:") print(unstacked) # Then stack back restacked = unstacked.stack() print("nstack back:") print(restacked) # 4. Complete example: create multi-level index then transform print("n=== Multi-Level Index Transformation ===") # Create multi-level indexed DataFrame multi_df = pd.DataFrame( [[100,110],[150,160]], index=pd.MultiIndex.from_tuples([('A','2023'),('B','2023')], names=['product','year']), columns=['North','South'] ) print("Original Multi-Level Indexed Data:") print(multi_df) # unstack year print("nunstack year:") print(multi_df.unstack(level='year')) # unstack columns print("nunstack columns (to columns):") print(multi_df.unstack(level='columns')) **Expected Output:** === unstack + stack Round-trip === unstack result: product North A 100 B 150 C 200South A 180 B 170... **Code Explanation:** * `stack()` converts column indices into row indices, transforming a wide table into a long table. * `unstack()` converts row indices into column indices, transforming a long table into a wide table. * Together, these operations enable flexible data shape transformations. * Understanding the inverse relationship between these operations is crucial for data analysis. > **Tip:** `stack()` and `unstack()` are core tools for data reshaping. `stack()` piles columns into rows (wide to long), while `unstack()` spreads rows into columns (long to wide). [![Image 2: Pandas Common Functions](#) Pandas Common Functions](#)
← Pandas Pd MeltPandas Groupby Transform β†’