Pandas groupby.mean() Function
\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n
groupby.mean() is a aggAggregate function in Pandas used to compute averages after grouping data. It is commonly used together with groupbygroupby.sum(): first, data is grouped by the values of a specific column, and then the arithmetic mean of numeric columns within each group is calculated.
In data analysis, computing averages is a very common requirementβfor instance, calculating average salaries per department, average sales per region, or average scores per class. The mean() function enables quick completion of such tasks.
\\\\n\\\\n
Basic Syntax and Parameters
\\\\n\\\\nmean() is a member function of the GroupBy object and must be called after using groupby() to group the data.
Syntax
\\\\n\\\\nGroupBy.mean(numeric_only=False, engine=None, engine_kwargs=None)\\\\n\\\\n\\\\nParameter Description
\\\\n\\\\n| Parameter | \\\\nType | \\\\nDescription | \\\\nDefault | \\\\n
|---|---|---|---|
| numeric_only | \\\\nbool | \\\\nIf True, only numeric columns are averaged; if False, attempts to average all columns. | \\\\nFalse | \\\\n
| engine | \\\\nstr | \\\\nSpecifies the computation engine: 'cython' or 'numba'. None lets Pandas choose automatically. | \\\\nNone | \\\\n
| engine_kwargs | \\\\ndict | \\\\nDictionary of additional arguments passed to the underlying engine. | \\\\nNone | \\\\n
Return Value
\\\\n\\\\n- \\\\n
- Return Type:
SeriesorDataFrame\\\\n - Description: Returns the result after computing group-wise averages. If applied to a single column, returns a
Series; if applied to multiple columns, returns aDataFrame. \\\\n
\\\\n\\\\n
Examples
\\\\n\\\\nLetβs master the usage of groupby.mean() through a series of examples, from simple to complex.
Example 1: Basic Usage β Compute Averages After Grouping by a Single Column
\\\\n\\\\nThe most basic usage is grouping by one column and computing the average of another column.
\\\\n\\\\nimport pandas as pd\\\\n\\\\n# Create a DataFrame of student scores\\\\n\\\\n# Contains: student name, class, Chinese, Math, English scores\\\\n\\\\ndata = {\\\\n 'Name': ['Zhang San','Li Si','Wang Wu','Zhao Liu','Sun Qi','Zhou Ba','Wu Jiu','Zheng Shi'],\\\\n 'Class': ['A','A','A','B','B','B','B','A'],\\\\n 'Chinese': [85,92,78,88,95,82,90,87],\\\\n 'Math': [90,85,92,78,88,91,85,89],\\\\n 'English': [88,90,85,92,87,89,91,86]\\\\n}\\\\n\\\\n# Create DataFrame\\\\ndf = pd.DataFrame(data)\\\\n\\\\nprint("Student grade data:")\\\\nprint(df)\\\\nprint()\\\\n\\\\n# Group by "Class", compute average scores per class\\\\navg_by_class = df.groupby('Class').mean(numeric_only=True)\\\\n\\\\nprint("Average Score per Class:")\\\\nprint(avg_by_class)\\\\nprint()\\\\n\\\\n# Alternatively, compute average for a specific column only\\\\nmath_avg_by_class = df.groupby('Class')['Math'].mean()\\\\n\\\\nprint("Average Math Score per Class:")\\\\nprint(math_avg_by_class)\\\\n\\\\n\\\\nExpected Output:
\\\\n\\\\nStudent grade data:\\\\n Name Class Chinese Math English\\\\n0 Zhang San A 85 90 88\\\\n1 Li Si A 92 85 90\\\\n2 Wang Wu A 78 92 85\\\\n3 Zhao Liu B 88 78 92\\\\n4 Sun Qi B 95 88 87\\\\n5 Zhou Ba B 82 91 89\\\\n6 Wu Jiu B 90 85 91\\\\n7 Zheng Shi A 87 89 86\\\\n\\\\nAverage Score per Class:\\\\n Chinese Math English\\\\nClass \\\\nA 85.500000 89.000000 87.250000\\\\nB 88.750000 85.500000 89.750000\\\\n\\\\nAverage Math Score per Class:\\\\nClass\\\\nA 89.0\\\\nB 85.5\\\\nName: Math, dtype: float64\\\\n\\\\n\\\\nCode Explanation:
\\\\n\\\\n- \\\\n
df.groupby('Class')groups students into two groups, A and B, based on the "Class" column. \\\\n .mean(numeric_only=True)computes the average for all numeric columns (Chinese, Math, English). \\\\n - In the result, the class serves as the index, and subject averages become column data. \\\\n
Example 2: Compute Averages After Grouping by Multiple Columns
\\\\n\\\\nYou can group by multiple columns simultaneously and compute averages for numeric columns.
\\\\n\\\\nimport pandas as pd\\\\n\\\\n# Create sales data\\\\ndata = {\\\\n 'Region': ['North China','East','South China','North China','East','South China','North China','East'],\\\\n 'Product': ['A','B','C','B','A','C','A','B'],\\\\n 'SalesAmount': [1000,2000,1500,1800,2200,1600,1200,2100],\\\\n 'Profit': [200,400,300,360,440,320,240,420]\\\\n}\\\\n\\\\ndf = pd.DataFrame(data)\\\\n\\\\nprint("SalesData:")\\\\nprint(df)\\\\nprint()\\\\n\\\\n# Group by "Region" and "Product", compute average sales and profit\\\\navg_grouped = df.groupby(['Region','Product'], as_index=False).mean(numeric_only=True)\\\\n\\\\nprint("Mean SalesAmount and Profit after grouping by Region and Product:")\\\\nprint(avg_grouped)\\\\nprint()\\\\n\\\\n# Keep multi-level index format\\\\navg_indexed = df.groupby(['Region','Product']).mean(numeric_only=True)\\\\n\\\\nprint("Result in MultiIndex format:")\\\\nprint(avg_indexed)\\\\n\\\\n\\\\nExpected Output:
\\\\n\\\\nSalesData:\\\\n Region Product SalesAmount Profit\\\\n0 North China A 1000 200\\\\n1 East B 2000 400\\\\n2 South China C 1500 300\\\\n3 North China B 1800 360\\\\n4 East A 2200 440\\\\n5 South China C 1600 320\\\\n6 North China A 1200 240\\\\n7 East B 2100 420\\\\n\\\\nMean SalesAmount and Profit after grouping by Region and Product:\\\\n Region Product SalesAmount Profit\\\\n0 East A 2200.0 440.0\\\\n1 East B 2050.0 410.0\\\\n2 South China C 1550.0 310.0\\\\n3 North China A 1100.0 220.0\\\\n4 North China B 1800.0 360.0\\\\n\\\\nResult in MultiIndex format:\\\\n SalesAmount Profit\\\\nRegion Product \\\\nEast A 2200.0 440.0\\\\n B 2050.0 410.0\\\\nSouth China C 1550.0 310.0\\\\nNorth China A 1100.0 220.0\\\\n B 1800.0 360.0\\\\n\\\\n\\\\nCode Explanation:
\\\\n\\\\n- \\\\n
['Region', 'Product']uses a list to group by multiple columns. \\\\n - With
as_index=False, the result is a DataFrame with grouping columns retained as regular columns. \\\\n - The multi-level index format is more concise and suitable for subsequent data analysis. \\\\n
Example 3: Computing Averages with Missing Values
\\\\n\\\\nWhen data contains missing values (NaN), mean() automatically ignores them during computation.
import pandas as pd\\\\nimport numpy as np\\\\n\\\\n# Create employee salary data with missing values\\\\ndata = {\\\\n 'Department': ['Sales','Sales','Sales','technology','technology','technology','Admin','Admin'],\\\\n 'Salary': [5000,6000, np.nan,8000,9000, np.nan,4500, np.nan]\\\\n}\\\\n\\\\ndf = pd.DataFrame(data)\\\\n\\\\nprint("Employee Salary Data (including missing values):")\\\\nprint(df)\\\\nprint()\\\\n\\\\n# By default, mean() ignores NaN values\\\\navg_with_nan = df.groupby('Department')['Salary'].mean()\\\\n\\\\nprint("Default mean calculation (ignoring NaN):")\\\\nprint(avg_with_nan)\\\\nprint()\\\\n\\\\n# To treat NaN as 0, fill missing values first\\\\navg_filled = df.groupby('Department')['Salary'].apply(lambda x: x.fillna(0).mean())\\\\n\\\\nprint("SetNaNMean Salary after treating as 0:")\\\\nprint(avg_filled)\\\\nprint()\\\\n\\\\n# Note: groupby.mean() does not have a skipna parameter, but similar behavior can be achieved via fillna()\\\\n\\\\n\\\\nExpected Output:
\\\\n\\\\nEmployee Salary Data (including missing values):\\\\n Department Salary\\\\n0 Sales 5000.0\\\\n1 Sales 6000.0\\\\n2 Sales NaN\\\\n3 technology 8000.0\\\\n4 technology 9000.0\\\\n5 technology NaN\\\\n6 Admin 4500.0\\\\n7 Admin NaN\\\\n\\\\nDefault mean calculation (ignoring NaN):\\\\nDepartment\\\\ntechnology 8500.0\\\\nAdmin 4500.0\\\\nSales 5500.0\\\\nName: Salary, dtype: float64\\\\n\\\\nSetNaNMean Salary after treating as 0:\\\\nDepartment\\\\ntechnology 5666.666667\\\\nAdmin 2250.0\\\\nSales 3666.666667\\\\nName: Salary, dtype: float64\\\\n\\\\n\\\\nCode Explanation:
\\\\n\\\\n- \\\\n
- By default,
mean()ignores NaN values in calculations. \\\\n - The Sales department has two valid values (5000, 6000), so the average is 5500. \\\\n
- To treat NaN as 0 before computing the average, use
fillna(0)to fill missing values first. \\\\n
Example 4: Combining with transform to Compute Within-Group Proportions
\\\\n\\\\nThe transform method broadcasts group-wise averages back to each row of the original data, which is useful for computing within-group proportions or deviations from group averages.
import pandas as pd\\\\n\\\\n# Create student score data\\\\ndata = {\\\\n 'Name': ['Zhang San','Li Si','Wang Wu','Zhao Liu','Sun Qi','Zhou Ba'],\\\\n 'Class': ['A','A','A','B','B','B'],\\\\n 'Chinese': [85,92,78,88,95,82],\\\\n 'Math': [90,85,92,78,88,91]\\\\n}\\\\n\\\\ndf = pd.DataFrame(data)\\\\n\\\\nprint("Student grade data:")\\\\nprint(df)\\\\nprint()\\\\n\\\\n# Compute class-level Math average, broadcast to each row\\\\ndf['ClassMathAverage Score'] = df.groupby('Class')['Math'].transform('mean')\\\\n\\\\n# Compute deviation of each student's score from class average\\\\ndf['Difference from average'] = df['Math'] - df['ClassMathAverage Score']\\\\n\\\\nprint("Add ClassAverage Score and the Data after difference:")\\\\nprint(df)\\\\nprint()\\\\n\\\\n# Compute percentage of each student's Math score relative to class average\\\\ndf['ClassWithin-class percentage'] = (df['Math'] / df['ClassMathAverage Score'] * 100).round(2)\\\\n\\\\nprint("Data after adding percentages:")\\\\nprint(df)\\\\n\\\\n\\\\nExpected Output:
\\\\n\\\\nStudent grade data:\\\\n Name Class Chinese Math\\\\n0 Zhang San A 85 90\\\\n1 Li Si A 92 85\\\\n2 Wang Wu A 78 92\\\\n3 Zhao Liu B 88 78\\\\n4 Sun Qi B 95 88\\\\n5 Zhou Ba B 82 91\\\\n\\\\nAdd ClassAverage Score and the Data after difference:\\\\n Name Class Chinese Math ClassMathAverage Score Difference from average\\\\n0 Zhang San A 85 90 89.000000 1.000000\\\\n1 Li Si A 92 85 89.000000 -4.000000\\\\n2 Wang Wu A 78 92 89.000000 3.000000\\\\n3 Zhao Liu B 88 78 85.666667 -7.666667\\\\n4 Sun Qi B 95 88 85.666667 2.333333\\\\n5 Zhou Ba B 82 91 85.666667 5.333333\\\\n\\\\nData after adding percentages:\\\\n Name Class Chinese Math ClassMathAverage Score Difference from Average Percentage within Class\\\\n0 Zhang San A 85 90 89.000000 1.000000 101.12\\\\n1 Li Si A 92 85 89.000000 -4.000000 95.51\\\\n2 Wang Wu A 78 92 89.000000 3.000000 103.37\\\\n3 Zhao Liu B 88 78 85.666667 -7.666667 91.04\\\\n4 Sun Qi B 95 88 85.666667 2.333333 102.72\\\\n5 Zhou Ba B 82 91 85.666667 5.333333 106.23\\\\n\\\\n\\\\nCode Explanation:
\\\\n\\\\n- \\\\n
transform('mean')computes the average for each group and broadcasts it back to every row in the original DataFrame. \\\\n - Each student can thus see their classβs average score, facilitating comparison. \\\\n
- This approach is highly useful in scenarios like analyzing βan individualβs position within a groupβ. \\\\n
\\\\n\\\\n
\\\\n\\\\n\\\\nNotes: By default, the
\\\\nmean()function ignores NaN values. If all values in a group are missing, it returns NaN. Unlikesum(),mean()does not have amin_countparameter.
\\\\n\\\\n
YouTip