YouTip LogoYouTip

Pandas Pd Merge

[![Image 1: Pandas Common Functions](#) Pandas Common Functions](#)\n\n* * *\n\n`pd.merge()` is a function in the Pandas library used to **join two DataFrames by columns**. It is similar to the JOIN operation in SQL and can merge two data tables into one based on one or more common columns.\n\nThis is one of the most commonly used merging methods in data analysis and processing, particularly suitable for handling relational data and integrating related information scattered across different tables.\n\n**Word Definition**: `merge` means "to combine, blend", and here it refers to merging two data tables together based on common keys.\n\n* * *\n\n## Basic Syntax and Parameters\n\n`pd.merge()` is a top-level function in the Pandas library, used to implement JOIN operations similar to SQL.\n\n### Syntax Format\n\npd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)\n### Parameter Description\n\n* **Parameter**: `left`\n * Type: DataFrame.\n * Description: The left DataFrame, the first table participating in the merge.\n\n* **Parameter**: `right`\n * Type: DataFrame.\n * Description: The right DataFrame, the second table participating in the merge.\n\n* **Parameter**: `on`\n * Type: String or list of strings.\n * Description: The column name(s) (keys) used for joining. Both left and right DataFrames must have columns with the same name. If not specified, it will automatically look for columns with the same name to use as keys.\n\n* **Parameter**: `how`\n * Type: String ('left', 'right', 'outer', 'inner').\n * Description: The type of merge, similar to SQL JOIN types. `'inner'` means inner join (intersection), `'outer'` means outer join (union), `'left'` means left join (keep all records from the left table), `'right'` means right join (keep all records from the right table). The default is `'inner'`.\n\n* **Parameter**: `left_on`\n * Type: String or list of strings.\n * Description: The column name(s) from the left DataFrame to use as joining keys. Used when the joining key column names are different in the left and right tables.\n\n* **Parameter**: `right_on`\n * Type: String or list of strings.\n * Description: The column name(s) from the right DataFrame to use as joining keys. Used when the joining key column names are different in the left and right tables.\n\n* **Parameter**: `suffixes`\n * Type: Tuple.\n * Description: Suffixes used to differentiate when the left and right tables have columns with the same name (that are not joining keys). Defaults to `('_x', '_y')`.\n\n### Function Description\n\n* **Return Value**: Returns the merged DataFrame.\n* **Effect**: Horizontally merges two DataFrames into one DataFrame based on the specified keys and method.\n\n* * *\n\n## Examples\n\nLet's thoroughly master the usage of `pd.merge()` through a series of examples ranging from simple to complex.\n\n### Example 1: Basic Usage - Inner Join Based on a Single Column\n\n## Example\n\nimport pandas as pd\n\n# 1. Create two DataFrames\n\n employees = pd.DataFrame({\n\n'emp_id': [101,102,103,104],\n\n'name': ['Alice','Bob','Charlie','Diana'],\n\n'dept_id': [1,2,1,2]\n\n})\n\ndepartments = pd.DataFrame({\n\n'dept_id': [1,2,3],\n\n'dept_name': ['Engineering','Sales','Marketing']\n\n})\n\nprint("=== Employees table (employees) ===")\n\nprint(employees)\n\nprint("n=== Departments table (departments) ===")\n\nprint(departments)\n\n# 2. Using pd.merge() Merge based on dept_id column (inner join, default)\n\n result = pd.merge(employees, departments, on='dept_id')\n\nprint("n=== pd.merge(employees, departments, on='dept_id') Inner join ===")\n\nprint(result)\n\n**Expected Output:**\n\n=== Employees table (employees) === emp_id name dept_id 0 101 Alice 11 102 Bob 22 103 Charlie 13 104 Diana 2=== Departments table (departments) === dept_id dept_name 0 1 Engineering1 2 Sales2 3 Marketing=== pd.merge(employees, departments, on='dept_id') Inner join === emp_id name dept_id dept_name 0 101 Alice 1 Engineering1 103 Charlie 1 Engineering2 102 Bob 2 Sales3 104 Diana 2 Sales\n**Code Explanation:**\n\n1. The two tables are joined via the `dept_id` column, which is their common column.\n2. It uses an inner join by default (`how='inner'`), keeping only the key values present in both sides (dept_id 1 and 2), so the Marketing department with dept_id 3 is excluded.\n3. After merging, each employee is associated with their corresponding department name.\n\n### Example 2: Left Join and Right Join\n\nUsing the `how` parameter allows you to control the merge method, keeping all records from one side.\n\n## Example\n\nimport pandas as pd\n\n# Continue using the above data\n\n employees = pd.DataFrame({\n\n'emp_id': [101,102,103,104],\n\n'name': ['Alice','Bob','Charlie','Diana'],\n\n'dept_id': [1,2,1,2]\n\n})\n\ndepartments = pd.DataFrame({\n\n'dept_id': [1,2,3],\n\n'dept_name': ['Engineering','Sales','Marketing']\n\n})\n\n# 1. Left join - keeps all records from the left table\n\nprint("=== Left join (how='left') ===")\n\n left_result = pd.merge(employees, departments, on='dept_id', how='left')\n\nprint(left_result)\n\n# 2. Right join - keeps all records from the right table\n\nprint("n=== Right join (how='right') ===")\n\n right_result = pd.merge(employees, departments, on='dept_id', how='right')\n\nprint(right_result)\n\n# 3. Outer join - keeps all records from both tables\n\nprint("n=== Outer join (how='outer') ===")\n\n outer_result = pd.merge(employees, departments, on='dept_id', how='outer')\n\nprint(outer_result)\n\n**Expected Output:**\n\n=== Left join (how='left') === emp_id name dept_id dept_name 0 101 Alice 1 Engineering1 103 Charlie 1 Engineering2 102 Bob 2 Sales3 104 Diana 2 Sales=== Right join (how='right') === emp_id name dept_id dept_name 0 101 Alice 1 Engineering1 103 Charlie 1 Engineering2 102 Bob 2 Sales3 104 Diana 2 Sales4 NaN NaN 3 Marketing=== Outer join (how='outer') === emp_id name dept_id dept_name 0 101 Alice 1 Engineering1 103 Charlie 1 Engineering2 102 Bob 2 Sales3 104 Diana 2 Sales4 NaN NaN 3 Marketing\n**Code Explanation:**\n\n* The left join keeps all employee records from the left table (employees); unmatched departments in the right table are shown as `NaN`.\n* The right join keeps all departments from the right table (departments), even if no employees belong to that department.\n* The outer join keeps all records from both tables, filling unmatched fields with `NaN`.\n\n### Example 3: Handling Two Tables with Different Column Names\n\nWhen the joining key column names are different in the two tables, you can use the `left_on` and `right_on` parameters to specify them respectively.\n\n## Example\n\nimport pandas as pd\n\n# 1. Create two DataFrames with different column names\n\n students = pd.DataFrame({\n\n'student_id': [1,2,3,4],\n\n'name': ['Alice','Bob','Charlie','Diana'],\n\n'class_id': [101,102,101,103]\n\n})\n\nclasses = pd.DataFrame({\n\n'class_id': [101,102,103],\n\n'class_name': ['Class A','Class B','Class C'],\n\n'teacher': ['Mr. Smith','Ms. Johnson','Mrs. Lee']\n\n})\n\nprint("=== Students table ===")\n\nprint(students)\n\nprint("n=== Classes table ===")\n\nprint(classes)\n\n# 2. Handle different column names with left_on and right_on\n\n result = pd.merge(students, classes, left_on='class_id', right_on='class_id')\n\nprint("n=== Merging with different column names ===")\n\nprint(result)\n\n**Expected Output:**\n\n=== Students table === student_id name class_id 0 1 Alice 1011 2 Bob 1022 3 Charlie 1013 4 Diana 103=== Classes table === class_id class_name teacher 0 101 Class A Mr. Smith1 102 Class B Ms. Johnson2 103 Class C Mrs. Lee=== Merging with different column names === student_id name class_id class_name teacher 0 1 Alice 101 Class A Mr. Smith1 3 Charlie 101 Class A Mr. Smith2 2 Bob 102 Class B Indicator False3 indicator column print("n=== indicator=True Display merge details ===")print(result_indicator)

Expected Output:

=== Index as join key === name city score subject grade 0 Alice Beijing 85 Math A 1 Bob Shanghai 90 Math
← Pandas Pd CutPandas Pd Concat β†’