Pandas Pd Melt

[![Image 1: Pandas Common Functions](#) Pandas Common Functions](#) * * * `pd.melt()` is a top-level function in the Pandas library used for **widening tables to long tables**. It "melts" wide-format data containing multiple columns into a long format, similar to the inverse pivot operation in Excel. This is one of the most commonly used functions in data reshaping, especially useful when you need to convert data into long format before data visualization or merge multiple columns into one. **Word Definition**: `melt` means "to melt or dissolve", here referring to melting the wide table structure into a long table structure, similar to gathering multiple column data together. * * * ## Basic Syntax and Parameters `pd.melt()` is a top-level function in the Pandas library used to convert wide-format data into long-format. ### Syntax Format pd.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None) ### Parameter Description * **Parameter**: `frame` * Type: DataFrame. * Description: **Required parameter**. The source DataFrame to be converted from wide to long format. * **Parameter**: `id_vars` * Type: Column name, list of tuples, or None. * Description: Columns to use as identifiers (IDs), which remain unchanged after conversion. Can be a single column name or a list of column names. If not specified, all columns are treated as value_vars. * **Parameter**: `value_vars` * Type: Column name, list of tuples, or None. * Description: Columns to "melt", i.e., which columns to convert into rows. If not specified, all columns except id_vars will be converted. * **Parameter**: `var_name` * Type: String or None. * Description: Name of the new column that stores original column names. Default is 'variable'. * **Parameter**: `value_name` * Type: String. * Description: Name of the generated value column. Default is 'value'. * **Parameter**: `col_level` * Type: int, str, or None. * Description: If columns are MultiIndex (multi-level index), specify which level to melt. ### Function Description * **Return Value**: Returns a DataFrame with data transformed from wide format to long format. * **Effect**: Converts values from multiple columns into two columns: one storing original column names (variable), and another storing corresponding values (value). * * * ## Examples Let's go through a series of examples from simple to complex to fully master the usage of `pd.melt()`. ### Example 1: Basic Usage - Simple Wide Table to Long Table ## Example import pandas as pd # 1. Create wide-format data (one column per quarter) df = pd.DataFrame({ 'name': ['Alice','Bob','Charlie'], 'Q1_sales': [100,150,200], 'Q2_sales': [120,160,210], 'Q3_sales': [130,170,220] }) print("=== Original Wide Format Data ===") print(df) # 2. Use pd.melt() to convert to long format melted = pd.melt(df, id_vars=['name'], value_vars=['Q1_sales','Q2_sales','Q3_sales']) print("n=== pd.melt() Converted to Long Format ===") print(melted) # 3. Custom column names melted_custom = pd.melt( df, id_vars=['name'], value_vars=['Q1_sales','Q2_sales','Q3_sales'], var_name='quarter', value_name='sales' ) print("n=== Custom Column Names ===") print(melted_custom) **Expected Output:** === Original Wide Format Data === name Q1_sales Q2_sales Q3_sales 0 Alice 100 120 1301 Bob 150 160 1702 Charlie 200 210 220=== pd.melt() Converted to Long Format === name variable value 0 Alice Q1_sales 1001 Bob Q1_sales 1502 Charlie Q1_sales 2003 Alice Q2_sales 1204 Bob Q2_sales 1605 Charlie Q2_sales 2106 Alice Q3_sales 1307 Bob Q3_sales 1708 Charlie Q3_sales 220=== Custom Column Names === name quarter sales 0 Alice Q1_sales 1001 Bob Q1_sales 1502 Charlie Q1_sales 2003 Alice Q2_sales 200 **Code Explanation:** 1. The original data is in wide format, with each row representing a person and Q1, Q2, Q3 being three quarterly sales data columns. 2. `id_vars=['name']` specifies that the name column remains unchanged as an identifier. 3. `value_vars=['Q1_sales', 'Q2_sales', 'Q3_sales']` specifies which columns to melt. 4. The result adds a new variable column (storing original column names) and a value column (storing corresponding values). 5. Using `var_name` and `value_name` allows customizing the names of the new columns. ### Example 2: Not Specifying value_vars - Melting All Columns Except id_vars When value_vars is not specified, it automatically melts all columns except id_vars. ## Example import pandas as pd # 1. Create data with more columns df = pd.DataFrame({ 'id': [1,2,3], 'name': ['Alice','Bob','Charlie'], 'math': [90,85,92], 'english': [88,90,87], 'science': [95,88,91] }) print("=== Original Data ===") print(df) # 2. Specify only id_vars, automatically melt other columns melted = pd.melt(df, id_vars=['id','name']) print("n=== Melting All Non-id Columns ===") print(melted) # 3. Multiple id_vars case print("n=== Multiple id_vars ===") melted_multi = pd.melt(df, id_vars=['name'], value_vars=['math','english','science']) print(melted_multi) # 4. Only one id_vars print("n=== One id_vars ===") melted_one = pd.melt(df, id_vars=['id']) print(melted_one) **Expected Output:** === Original Data === id name math english science 0 1 Alice 90 88 951 2 Bob 85 90 882 3 Charlie 92 87 91=== Melting All Non-id Columns === id name variable value 0 1 Alice math 901 2 Bob math 852 3 Charlie math 923 1 Alice english 884 2 Bob english 905 3 Charlie english 876 1 Alice science 957 2 Bob science 888 3 Charlie science 91=== Melting into Two Columns (variable and value). - When value_vars is not specified, the system automatically identifies and melts all columns except id_vars - id_vars can be a single column or a combination of multiple columns **Code Explanation:** * When `value_vars` is not specified, it automatically melts all columns except those in `id_vars`. * When only `id_vars` is provided without `value_vars`, each data row generates a number of new rows equal to the number of value_vars. * This method is very convenient when you need to uniformly process multi-column data. ### Example 3: No id_vars - Melting All Columns When id_vars is not specified, the original DataFrame index becomes the identifier. ## Example import pandas as pd # 1. Create data (no explicit id column) df = pd.DataFrame({ 'Q1': [100,150,200], 'Q2': [120,160,210], 'Q3': [130,170,220] }) print("=== Original Data ===") print(df) # 2. Do not specify id_vars, use default index melted = pd.melt(df) print("n=== Not Specifying id_vars ===") print(melted) # 3. Restore original index print("n=== Reset Index ===") melted_with_index = pd.melt(df, ignore_index=False) print(melted_with_index) # 4. Practical Case: Creating Identifiers df_with_id = pd.DataFrame({ 'year': [2023,2023,2023], 'Q1': [100,150,200], 'Q2': [120,160,210] }) print("n=== Original Data (with Year) ===") print(df_with_id) # First add year to id_vars melted_year = pd.melt(df_with_id, id_vars=['year']) print("n=== Melted Result Including Year ===") print(melted_year) **Expected Output:** === Original Data === Q1 Q2 Q3 0 100 120 1301 150 160 1702 200 210 220=== Not Specifying id_vars === variable value 0 Q1 1001 Q1 1502 Q1 2003 Q2 1204 Q2 1605 Q2 2106 Q3 1307 Q3 1708 Q3 220=== Melting All Columns, converting original column names to values in the variable column and original values to the value column. **Code Explanation:** * When id_vars is not specified, only two columns remain: variable and value. * Using `ignore_index=False` preserves the original index, making it easier to compare with the original data later. * In practice, it's usually better to specify one or more id_vars to retain key information. ### Example 4: Practical Case - Preparing Data for Visualization melt() is a great helper for data visualization; many plotting libraries (like Seaborn, Matplotlib) work better with long-format data. ## Example import pandas as pd # 1. Create sales data (one column per region) sales = pd.DataFrame({ 'product': ['A','B','C','D'], 'Jan': [1200,1500,900,1100], 'Feb': [1350,1600,950,1200], 'Mar': [1100,1450,880,1050] }) print("=== Wide Format Sales Data ===") print(sales) # 2. Melt into long format (for visualization) sales_long = pd.melt( sales, id_vars=['product'], var_name='month', value_name='sales' ) print("n=== Long Format Sales Data ===") print(sales_long) # 3. Data processing example: calculate year-over-year growth # Add month numbers month_map ={'Jan': 1,'Feb': 2,'Mar': 3} sales_long['month_num']= sales_long['month'].map(month_map) print("n=== Adding Month Numbers ===") print(sales_long) # 4. Filter data for specific months feb_sales = sales_long[sales_long['month']=='Feb'] print("n=== February Sales Data ===") print(feb_sales) **Expected Output:** === Wide Format Sales Data === product Jan Feb Mar0 A 1200 1350 1100 product Jan 180 170 6=== Long Format Sales Data (for Visualization) === product month sales 0 A Jan 12001 B Jan 15002 C Jan 9003 D Jan 11004 A Feb 1350 **Code Explanation:** * Wide-format data has one column per region/time point, making it easy to view and input. * Long-format data has one observation per row, which is the standard format for statistical analysis. * Visualization libraries like Seaborn accept long-format data, making it easy to create grouped bar charts, line plots, etc. * Using `id_vars` keeps classification variables such as product, region, etc., unaltered during melting. ### Example 5: Melting Multi-Level Column Indexes When a DataFrame has multi-level column indexes, you can use the `col_level` parameter. ## Example import pandas as pd # 1. Create data with multi-level column index df = pd.DataFrame( [[100,110,200,210],[150,160,180,190]], index=['Store_A','Store_B'], columns=pd.MultiIndex.from_tuples([ ('Electronics','Q1'),('Electronics','Q2'), ('Furniture','Q1'),('Furniture','Q2') ], names=['Category','Quarter']) ) print("=== Original Data (Multi-Level Column Index) ===") print(df) # 2. Melt all columns (without specifying level) print("n=== Melting All Columns ===") melted_all = pd.melt(df, ignore_index=False) print(melted_all) # 3. Only melt inner level (Quarter) print("n=== Only Melting Quarter Level ===") melted_quarter = pd.melt(df, col_level=1) print(melted_quarter) # 4. Melt outer level (Category) print("n=== Only Melting Category Level ===") melted_category = pd.melt(df, col_level=0) print(melted_category) **Expected Output:** === Original Data (Multi-Level Column Index) ===Category Electronics FurnitureQuarter Q1 Q2 Q1 Q2 0 100 110 200 2101 150 160 180 190=== Melting Different Levels Produces Different Results: - Melting Inner Level (Quarter) Preserves Category as ID - Melting Outer Level (Category) Preserves Quarter as ID **Code Explanation:** * For DataFrames with multi-level column indexes, column names are in tuple form: (Category, Quarter). * `col_level=0` melts the outer level (Category), preserving Quarter as ID. * `col_level=1` melts the inner level (Quarter), preserving Category as ID. * This flexibility allows you to choose which level to melt based on your analytical needs. > **Tip:** `pd.melt()` is a great helper for data visualization. Most plotting libraries (like Seaborn, Plotnine) work better with long-format data. Converting data with melt() before plotting is a common practice. [![Image 2: Pandas Common Functions](#) Pandas Common Functions](#)

YouTip

Pandas Pd Melt

📂 Categories