Pandas Series Median

Pandas Series.median() Function |\n\n

Image 1: Pandas Common functions Pandas General Functions

\n\n

Series.median() is a function in Pandas used to calculate the median (middle value) of a Series. The median is the value located at the middle position after sorting the data. It is not affected by extreme values and serves as a robust measure for describing the central tendency of a dataset.

\n\n

When there are outliers or skewed distributions in the data, the median can more accurately reflect the center of the data than the mean. It is frequently used in scenarios such as income analysis, real estate statistics, and academic performance evaluation.

\n\n

Basic Syntax and Parameters

\n\n

median() is a member function of the Series object, called directly using the dot operator.

\n\n

Syntax Format

\n\n

Series.median(axis=None, skipna=True, level=None, numeric_only=None, **kwargs)

\n\n

Parameter Description

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n

Parameter	Type	Description	Default Value
axis	int	Specifies the axis. Since Series has only one row, this parameter is mainly for compatibility with DataFrame.	None
skipna	bool	If True, NaN values are skipped during calculation; if False, the result will be NaN if any NaN is encountered.	True
level	int or str	If the Series is a MultiIndex, specifies which level to compute the median for.	None
numeric_only	bool	If True, only numeric data is considered; otherwise, it attempts to convert to numeric types.	False

\n\n

Return Value

\n\n

Return Type: float
Description: Returns the median of all elements in the Series. If the number of elements is even, it returns the average of the two middle elements.

\n\n

Examples

\n\n

Let's explore a series of examples from simple to complex to fully master the usage of Series.median().

\n\n

Example 1: Basic Usage – Median of Odd Number of Elements

\n\n

For an odd number of elements, the median is the middle value after sorting.

\n\n

Example

\n\n

import pandas as pd\n\n# Create a Series containing employee incomes\n\n# Simulate monthly income (in thousands) of 7 employees in a department\n\nincome = pd.Series([5,6,7,8,9,10,50])\n\n# Calculate median\n\nmedian_income = income.median()\n\nprint("Employee Monthly Income (in thousands):")\n\nprint(income)\n\nprint()\n\nprint(f"Average Income: {income.mean():.2f} thousand")\n\nprint(f"Median Income: {median_income} thousand")\n\nprint()\n\nprint("Note: The average is significantly affected by the extreme value (50 thousand), while the median better reflects the actual level.")\n\n

\n\n

Output:

\n\n

Employee Monthly Income (in thousands):\n0     5\n1     6\n2     7\n3     8\n4     9\n5    10\n6    50\ndtype: int64\n\nAverage Income: 13.57 thousand\nMedian Income: 8.0 thousand\n\nNote: The average is significantly affected by the extreme value (50 thousand), while the median better reflects the actual level.\n

\n\n

Code Explanation:

\n\n

Sorted data: [5, 6, 7, 8, 9, 10, 50]
The middle element is at index 3, which is 8.
Due to the extreme value (50), the average (13.57) is inflated, while the median (8) better represents most people’s income levels.

\n\n

Example 2: Median of Even Number of Elements

\n\n

For an even number of elements, the median is the average of the two middle elements.

\n\n

Example

\n\n

import pandas as pd\n\n# Create a Series with 6 elements\n\n# Simulate exam scores of 6 students\n\nscores = pd.Series([75,82,88,92,95,100])\n\n# Calculate median\n\nmedian_score = scores.median()\n\nprint("Student Exam Scores:")\n\nprint(scores)\n\nprint()\n\nprint(f"Average Score: {scores.mean():.2f}")\n\nprint(f"Median Score: {median_score}")\n\n

\n\n

Output:

\n\n

Student Exam Scores:\n0    75\n1    82\n2    88\n3    92\n4    95\n5   100\ndtype: int64\n\nAverage Score: 88.67\nMedian Score: 90.0\n

\n\n

Code Explanation:

\n\n

Sorted data: [75, 82, 88, 92, 95, 100]
The two middle elements are 88 (index 2) and 92 (index 3).
Median = (88 + 92) / 2 = 90.

\n\n

Example 3: Handling Data with Missing Values

\n\n

The skipna parameter determines how missing values are handled.

\n\n

Example

\n\n

import pandas as pd\n\nimport numpy as np\n\n# Create a Series with missing values\n\ndata_with_nan = pd.Series([10,20, np.nan,30,40, np.nan,50])\n\nprint("Data with Missing Values:")\n\nprint(data_with_nan)\n\nprint()\n\n# Default skipna=True, skips NaN when calculating median\n\nmedian_skipna = data_with_nan.median()\n\nprint(f"Median with skipna=True (default): {median_skipna}")\n\n# Set skipna=False\n\nmedian_no_skipna = data_with_nan.median(skipna=False)\n\nprint(f"Median with skipna=False: {median_no_skipna}")\n\n

\n\n

Output:

\n\n

Data with Missing Values:\n0    10.0\n1    20.0\n2     NaN\n3    30.0\n4    40.0\n5     NaN\n6    50.0\ndtype: float64\n\nMedian with skipna=True (default): 30.0\nMedian with skipna=False: nan\n

\n\n

Code Explanation:

\n\n

Valid data: [10, 20, 30, 40, 50], total of 5 elements.
After sorting, the middle value is 30.
With skipna=False, any NaN results in NaN.

\n\n

Example 4: Comparing Mean and Median in Skewed Data

\n\n

Demonstrates the advantage of median in handling skewed data.

\n\n

Example

\n\n

import pandas as pd\n\n# Simulate monthly income data for 10 households in a community\n\n# Most people earn between 3000–5000 yuan, but a few have high incomes\n\nincome_data = pd.Series([3000,3500,3800,4000,4200,4500,4800,5000,8000,50000])\n\nprint("Monthly Income Data of Community Residents (yuan):")\n\nprint(income_data)\n\nprint()\n\nmean_income = income_data.mean()\n\nmedian_income = income_data.median()\n\nprint(f"Mean: {mean_income:.2f} yuan")\n\nprint(f"Median: {median_income} yuan")\n\nprint()\n\nprint("Analysis:")\n\nprint("The mean (10380 yuan) is greatly inflated by the extreme value (50000 yuan).")\n\nprint("The median (4350 yuan) better reflects the actual income level of most residents.")\n\nprint("This is why statistical departments often use the median when reporting income data.")\n\n

\n\n

Output:

\n\n

Monthly Income Data of Community Residents (yuan):\n0     3000\n1     3500\n2     3800\n3     4000\n4     4200\n5     4500\n6     4800\n7     5000\n8     8000\n9    50000\ndtype: int64\n\nMean: 10380.00 yuan\nMedian: 4350.0 yuan\n

\n\n

Notes

\n\n

The median is insensitive to extreme values and is more robust than the mean.
When the number of data points is even, the median is the average of the two middle numbers.
In datasets with significant skewness, the median better reflects the central location compared to the mean.
The behavior of the skipna parameter is consistent with that of mean().

\n\n

Summary

\n\n

Series.median() is an important function for describing the central tendency of data. Its main features include:

\n\n

Not affected by extreme values, providing stable results.
For even-numbered elements, returns the average of the two middle elements.
Especially useful in analyzing skewed data like income or housing prices.
The syntax and parameters are identical to mean(), making it easy to learn and use.

\n\n

In practical data analysis, it is recommended to calculate both the mean and median to gain a more comprehensive understanding of the data distribution. When the two differ significantly, it suggests the presence of skewness or outliers in the data.

\n\n

Image 2: Pandas Common functions Pandas General Functions

YouTip