YouTip LogoYouTip

Pandas Series Mean

Image 1: Pandas Common Functions Pandas Common Functions\n\n* * *\n\n`Series.mean()` is a function in Pandas used to calculate the average of all elements in a Series. The mean is one of the most commonly used statistical measures, representing the central tendency of a dataset.\n\nWhether calculating average wages, average sales, or average scores, `mean()` can help you get results quickly. It is a core metric for describing the central tendency of data.\n\n* * *\n\n## Basic Syntax and Parameters\n\n`mean()` is a member function of the Series object, called directly through the dot operator.\n\n### Syntax Format\n\nSeries.mean(axis=None, skipna=True, level=None, numeric_only=None, **kwargs)\n### Parameter Description\n\n| Parameter | Type | Description | Default Value |\n| --- | --- | --- | --- |\n| axis | int | Specifies the axis direction. Series only has one row of data, so this parameter is mainly for compatibility with DataFrame. | None |\n| skipna | bool | If True, skips NaN values during calculation; if False, returns NaN when encountering NaN. | True |\n| level | int or str | If the Series has a MultiIndex, specifies the level to calculate. | None |\n| numeric_only | bool | If True, only calculates for numeric data; otherwise, attempts to convert to numeric. | False |\n\n### Return Value\n\n* **Return Type**: `float` (even if the input is integers, the return value will be converted to float)\n* **Description**: Returns the average of all valid elements in the Series. If all elements are NaN, returns NaN.\n\n* * *\n\n## Examples\n\nLet's thoroughly master the usage of `Series.mean()` through a series of examples from simple to complex.\n\n### Example 1: Basic Usage - Calculating the Average Score of Students\n\nThe most basic usage is to create a numeric Series, then call `mean()` to calculate the average.\n\n## Example\n\nimport pandas as pd\n\n# Create a Series containing student scores\n\n# Simulate exam scores of 8 students in a class\n\n scores = pd.Series([85,92,78,90,88,76,95,82])\n\n# Calculate average score\n\n average_score = scores.mean()\n\nprint("Student Exam Scores:")\n\nprint(scores)\n\nprint()\n\nprint(f"Average Score: {average_score:.2f} points")\n\n**Output:**\n\nStudentExam Scores: 0 851 922 783 904 885 766 957 82 dtype: int64 Average score: 85.75 Minute\n**Code Analysis:**\n\n* Created a Series containing scores of 8 students.\n* `mean()` calculates the sum of all scores (685) divided by the number of students (8), resulting in an average of 85.75.\n* Note: The return value is a float, even though the input is integers.\n\n### Example 2: Handling Data with Missing Values\n\nReal-world data often contains missing values. The `skipna` parameter determines how these missing values are handled.\n\n## Example\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create a Series containing missing values\n\n# Simulate a student who missed some exams due to illness\n\n scores_with_nan = pd.Series([85,92, np.nan,90,88, np.nan,95,82])\n\nprint("Data with missing exam scores:")\n\nprint(scores_with_nan)\n\nprint()\n\n# Default skipna=True, skip NaN when calculating average\n\n average_skipna = scores_with_nan.mean()\n\nprint(f"Average score when skipna=True (default): {average_skipna:.2f}")\n\n# Set skipna=False, return NaN when encountering NaN\n\n average_no_skipna = scores_with_nan.mean(skipna=False)\n\nprint(f"Average score when skipna=False: {average_no_skipna}")\n\n**Output:**\n\nData including absent scores: 0 85.01 92.02 NaN3 90.04 88.05 NaN6 95.07 82.0 dtype: float64 skipna=True(Average score (default): 88.67 skipna=False Average score: nan\n**Code Analysis:**\n\n* When `skipna=True` (default value), only calculates the average of valid values.\n* Calculation process: (85 + 92 + 90 + 88 + 95 + 82) / 6 = 532 / 6 = 88.67\n* When `skipna=False`, as long as NaN exists, the result will return NaN.\n\n### Example 3: Using MultiIndex Series\n\nFor Series with hierarchical indexing (MultiIndex), you can specify which level to calculate the average for.\n\n## Example\n\nimport pandas as pd\n\n# Create a MultiIndex Series\n\n# Simulate scores of students from two classes\n\n multi_index_scores = pd.Series(\n\n[85,92,78,90,88,76,95,82],\n\n index=pd.MultiIndex.from_tuples([\n\n('AClass','Student1'),('AClass','Student2'),('AClass','Student3'),('AClass','Student4'),\n\n('BClass','Student1'),('BClass','Student2'),('BClass','Student3'),('BClass','Student4')\n\n],\n\n names=['Class','Student']\n\n)\n\n)\n\nprint("Student scores from two classes:")\n\nprint(multi_index_scores)\n\nprint()\n\n# Calculate average score of all students\n\n overall_mean = multi_index_scores.mean()\n\nprint(f"Average score of all students: {overall_mean:.2f}")\n\nprint()\n\n# Calculate average score by class (using level parameter)\n\n class_a_mean = multi_index_scores.xs('AClass', level='Class').mean()\n\n class_b_mean = multi_index_scores.xs('BClass', level='Class').mean()\n\nprint(f"Class A average score: {class_a_mean:.2f}")\n\nprint(f"Class B average score: {class_b_mean:.2f}")\n\n**Output:**\n\nStudent Grades for Two Classes: Class Student AClass Student1 85 Student2 92 Student3 78 Student4 90 BClass Student1 88 Student2 76 Student3 95 Student4 82 dtype: int64 All Student Average Score: 85.75 AClassStudentAverage Score: 86.25 BClassStudentAverage score: 85.25\n**Code Analysis:**\n\n* Created a Series with two-level index, with index names "Class" and "Student".\n* Calling `mean()` directly calculates the average of all elements.\n* Using the `xs()` method allows filtering data by a specific level before calculating the average.\n\n### Example 4: Application in Real Data Analysis\n\nDemonstrating typical applications of `mean()` in actual business scenarios.\n\n## Example\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create simulated daily closing price data for a stock\n\n stock_prices = pd.Series([\n\n100.5,102.3,98.7,101.2,103.5,\n\n105.2,103.8,107.1,108.3,106.9\n\n], index=['Monday','Tuesday','Wednesday','Thursday','Friday',\n\n'Monday','Tuesday','Wednesday','Thursday','Friday'])\n\n# Create data for two periods\n\n week1 = stock_prices[:5]\n\n week2 = stock_prices[5:]\n\nprint("Week 1 stock prices:")\n\nprint(week1)\n\nprint(f"Week 1 average price: {week1.mean():.2f}")\n\nprint()\n\nprint("Week 2 stock prices:")\n\nprint(week2)\n\nprint(f"Week 2 average price: {week2.mean():.2f}")\n\nprint()\n\n# Calculate average price difference between two weeks\n\n price_diff = week2.mean() - week1.mean()\n\nprint(f"Average price difference between two weeks: {price_diff:.2f}")\n\n**Output:**\n\nStock Prices for Week 1: Monday 100.5Tuesday 102.3Wednesday 98.7Thursday 101.2Friday 103.5 dtype: float64 Average price for the first week: 101.24Second week stock prices: Monday 105.2Tuesday 103.8Wednesday 107.1Thursday 108.3Friday 106.9 dtype: float64 Average price for the second week: 106.26Difference in average prices between the two weeks: 5.02\n\n* * *\n\n## Notes\n\n* `mean()` always returns a float, even if all input data are integers.\n* The mean is sensitive to extreme values (outliers), which can significantly affect the calculation result.\n* If there are many missing values in the data, be aware that the average is calculated based on partial data when using `skipna=True`.\n* For Series containing non-numeric data, data cleaning is needed beforehand or use the `numeric_only=True` parameter.\n\n* * *\n\n## Summary\n\n`Series.mean()` is one of the most commonly used statistical functions in Pandas. Its main features include:\n\n* Simple and easy to use, called directly through the dot operator.\n* Supports handling missing values, controlled by the `skipna` parameter.\n* Supports MultiIndex, can calculate by selecting levels.\n* Return value is always float, ensuring precision.\n\nThe mean is an important measure for describing the central tendency of data, but when extreme values exist in the data, consider using the median (`median()`) to better reflect the central position of the data.\n\nImage 2: Pandas Common Functions Pandas Common Functions
← Pandas Series StdPandas Df Groupby β†’