\n\n
Series.median() is a function in Pandas used to calculate the median (middle value) of a Series. The median is the value located at the middle position after sorting the data. It is not affected by extreme values and serves as a robust measure for describing the central tendency of a dataset.
When there are outliers or skewed distributions in the data, the median can more accurately reflect the center of the data than the mean. It is frequently used in scenarios such as income analysis, real estate statistics, and academic performance evaluation.
\n\n\n\n
Basic Syntax and Parameters
\n\nmedian() is a member function of the Series object, called directly using the dot operator.
Syntax Format
\n\nSeries.median(axis=None, skipna=True, level=None, numeric_only=None, **kwargs)\n\nParameter Description
\n\n| Parameter | \nType | \nDescription | \nDefault Value | \n
|---|---|---|---|
| axis | \nint | \nSpecifies the axis. Since Series has only one row, this parameter is mainly for compatibility with DataFrame. | \nNone | \n
| skipna | \nbool | \nIf True, NaN values are skipped during calculation; if False, the result will be NaN if any NaN is encountered. | \nTrue | \n
| level | \nint or str | \nIf the Series is a MultiIndex, specifies which level to compute the median for. | \nNone | \n
| numeric_only | \nbool | \nIf True, only numeric data is considered; otherwise, it attempts to convert to numeric types. | \nFalse | \n
Return Value
\n\n- \n
- Return Type:
float\n - Description: Returns the median of all elements in the Series. If the number of elements is even, it returns the average of the two middle elements. \n
\n\n
Examples
\n\nLet's explore a series of examples from simple to complex to fully master the usage of Series.median().
Example 1: Basic Usage β Median of Odd Number of Elements
\n\nFor an odd number of elements, the median is the middle value after sorting.
\n\nExample
\n\nimport pandas as pd\n\n# Create a Series containing employee incomes\n\n# Simulate monthly income (in thousands) of 7 employees in a department\n\nincome = pd.Series([5,6,7,8,9,10,50])\n\n# Calculate median\n\nmedian_income = income.median()\n\nprint("Employee Monthly Income (in thousands):")\n\nprint(income)\n\nprint()\n\nprint(f"Average Income: {income.mean():.2f} thousand")\n\nprint(f"Median Income: {median_income} thousand")\n\nprint()\n\nprint("Note: The average is significantly affected by the extreme value (50 thousand), while the median better reflects the actual level.")\n\n\n\nOutput:
\n\nEmployee Monthly Income (in thousands):\n0 5\n1 6\n2 7\n3 8\n4 9\n5 10\n6 50\ndtype: int64\n\nAverage Income: 13.57 thousand\nMedian Income: 8.0 thousand\n\nNote: The average is significantly affected by the extreme value (50 thousand), while the median better reflects the actual level.\n\n\nCode Explanation:
\n\n- \n
- Sorted data: [5, 6, 7, 8, 9, 10, 50] \n
- The middle element is at index 3, which is 8. \n
- Due to the extreme value (50), the average (13.57) is inflated, while the median (8) better represents most peopleβs income levels. \n
Example 2: Median of Even Number of Elements
\n\nFor an even number of elements, the median is the average of the two middle elements.
\n\nExample
\n\nimport pandas as pd\n\n# Create a Series with 6 elements\n\n# Simulate exam scores of 6 students\n\nscores = pd.Series([75,82,88,92,95,100])\n\n# Calculate median\n\nmedian_score = scores.median()\n\nprint("Student Exam Scores:")\n\nprint(scores)\n\nprint()\n\nprint(f"Average Score: {scores.mean():.2f}")\n\nprint(f"Median Score: {median_score}")\n\n\n\nOutput:
\n\nStudent Exam Scores:\n0 75\n1 82\n2 88\n3 92\n4 95\n5 100\ndtype: int64\n\nAverage Score: 88.67\nMedian Score: 90.0\n\n\nCode Explanation:
\n\n- \n
- Sorted data: [75, 82, 88, 92, 95, 100] \n
- The two middle elements are 88 (index 2) and 92 (index 3). \n
- Median = (88 + 92) / 2 = 90. \n
Example 3: Handling Data with Missing Values
\n\nThe skipna parameter determines how missing values are handled.
Example
\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create a Series with missing values\n\ndata_with_nan = pd.Series([10,20, np.nan,30,40, np.nan,50])\n\nprint("Data with Missing Values:")\n\nprint(data_with_nan)\n\nprint()\n\n# Default skipna=True, skips NaN when calculating median\n\nmedian_skipna = data_with_nan.median()\n\nprint(f"Median with skipna=True (default): {median_skipna}")\n\n# Set skipna=False\n\nmedian_no_skipna = data_with_nan.median(skipna=False)\n\nprint(f"Median with skipna=False: {median_no_skipna}")\n\n\n\nOutput:
\n\nData with Missing Values:\n0 10.0\n1 20.0\n2 NaN\n3 30.0\n4 40.0\n5 NaN\n6 50.0\ndtype: float64\n\nMedian with skipna=True (default): 30.0\nMedian with skipna=False: nan\n\n\nCode Explanation:
\n\n- \n
- Valid data: [10, 20, 30, 40, 50], total of 5 elements. \n
- After sorting, the middle value is 30. \n
- With
skipna=False, any NaN results in NaN. \n
Example 4: Comparing Mean and Median in Skewed Data
\n\nDemonstrates the advantage of median in handling skewed data.
\n\nExample
\n\nimport pandas as pd\n\n# Simulate monthly income data for 10 households in a community\n\n# Most people earn between 3000β5000 yuan, but a few have high incomes\n\nincome_data = pd.Series([3000,3500,3800,4000,4200,4500,4800,5000,8000,50000])\n\nprint("Monthly Income Data of Community Residents (yuan):")\n\nprint(income_data)\n\nprint()\n\nmean_income = income_data.mean()\n\nmedian_income = income_data.median()\n\nprint(f"Mean: {mean_income:.2f} yuan")\n\nprint(f"Median: {median_income} yuan")\n\nprint()\n\nprint("Analysis:")\n\nprint("The mean (10380 yuan) is greatly inflated by the extreme value (50000 yuan).")\n\nprint("The median (4350 yuan) better reflects the actual income level of most residents.")\n\nprint("This is why statistical departments often use the median when reporting income data.")\n\n\n\nOutput:
\n\nMonthly Income Data of Community Residents (yuan):\n0 3000\n1 3500\n2 3800\n3 4000\n4 4200\n5 4500\n6 4800\n7 5000\n8 8000\n9 50000\ndtype: int64\n\nMean: 10380.00 yuan\nMedian: 4350.0 yuan\n\n\n\n\n
Notes
\n\n- \n
- The median is insensitive to extreme values and is more robust than the mean. \n
- When the number of data points is even, the median is the average of the two middle numbers. \n
- In datasets with significant skewness, the median better reflects the central location compared to the mean. \n
- The behavior of the
skipnaparameter is consistent with that ofmean(). \n
\n\n
Summary
\n\nSeries.median() is an important function for describing the central tendency of data. Its main features include:
- \n
- Not affected by extreme values, providing stable results. \n
- For even-numbered elements, returns the average of the two middle elements. \n
- Especially useful in analyzing skewed data like income or housing prices. \n
- The syntax and parameters are identical to
mean(), making it easy to learn and use. \n
In practical data analysis, it is recommended to calculate both the mean and median to gain a more comprehensive understanding of the data distribution. When the two differ significantly, it suggests the presence of skewness or outliers in the data.
\n\n
YouTip