Pandas Series.sum() Function |
\n\n\n\n\n\n
Series.sum() is a function in Pandas used to calculate the sum of all elements in a Series. It is one of the most commonly used statistical functions in data analysis and can quickly obtain the total of numerical data.
Whether it's calculating total sales, total profit, or total quantity, sum() can help you complete these tasks quickly. It is especially suitable for scenarios such as financial data processing, statistical reports, and sales data analysis.
\n\n
Basic Syntax and Parameters
\n\nsum() is a member function of the Series object, called directly using the dot operator.
Syntax Format
\n\nSeries.sum(axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)\n\nParameter Description
\n\n| Parameter | \nType | \nDescription | \nDefault Value | \n
|---|---|---|---|
| axis | \nint | \nSpecifies the axis. Since Series has only one row, this parameter is mainly for compatibility with DataFrame. | \nNone | \n
| skipna | \nbool | \nIf True, skips NaN values during calculation; if False, returns NaN when encountering NaN. | \nTrue | \n
| level | \nint or str | \nIf the Series is a MultiIndex, specifies the level to compute. | \nNone | \n
| numeric_only | \nbool | \nIf True, only computes numerical data; otherwise attempts to convert to numerical. | \nFalse | \n
| min_count | \nint | \nMinimum number of valid values required for computation. If fewer than this, returns NaN. | \n0 | \n
Return Value
\n\n- \n
- Return Type: Numeric (
int,float, ornumpy.nan) \n - Description: Returns the sum of all elements in the Series. If all elements are NaN (and skipna=True), returns 0. \n
\n\n
Examples
\n\nLetβs go through a series of examples from simple to complex to fully master the usage of Series.sum().
Example 1: Basic Usage β Calculating the Sum of a Numerical List
\n\nThe most basic way is to create a numerical Series and then call sum() to calculate the total.
Example
\n\nimport pandas as pd\n\n# Create a Series containing sales data\n\n# Simulate daily sales data for a week\n\nsales_data = pd.Series([1200,1500,1800,900,2100,1600,1350])\n\n# Calculate total sales\n\ntotal_sales = sales_data.sum()\n\nprint("Daily Sales:")\n\nprint(sales_data)\n\nprint()\n\nprint(f"Total Sales: {total_sales}")\n\n\nOutput:
\n\nDaily Sales:\n0 1200\n1 1500\n2 1800\n3 900\n4 2100\n5 1600\n6 1350\ndtype: int64\n\nTotal Sales: 10450\n\n\nCode Explanation:
\n\n- \n
- A Series containing 7 days of sales data is created, with integer data type. \n
sum()simply iterates over all elements and adds them up to get the total of 10450. \n
Example 2: Handling Data with Missing Values
\n\nIn real-world data, missing values are common. The skipna parameter determines how to handle these missing values.
Example
\n\nimport pandas as pd\nimport numpy as np\n\n# Create a Series with missing values\n\n# Simulate some missing sales data for certain dates\n\nsales_with_nan = pd.Series([1200,1500, np.nan,1800, np.nan,2100,1600])\n\nprint("Sales Data with Missing Values:")\n\nprint(sales_with_nan)\n\nprint()\n\n# Default skipna=True, skip NaN when calculating sum\n\ntotal_skipna = sales_with_nan.sum()\n\nprint(f"Sum when skipna=True (default): {total_skipna}")\n\n# Set skipna=False, return NaN when encountering NaN\n\ntotal_no_skipna = sales_with_nan.sum(skipna=False)\n\nprint(f"Sum when skipna=False: {total_no_skipna}")\n\n\nOutput:
\n\nSales Data with Missing Values:\n0 1200.0\n1 1500.0\n2 NaN\n3 1800.0\n4 NaN\n5 2100.0\n6 1600.0\ndtype: float64\n\nSum when skipna=True (default): 8200.0\nSum when skipna=False: nan\n\n\nCode Explanation:
\n\n- \n
- When
skipna=True(default),sum()automatically skips NaN values and calculates the sum of valid values. \n - Calculation process: 1200 + 1500 + 1800 + 2100 + 1600 = 8200 \n
- When
skipna=False, any presence of NaN results in NaN, which is useful when data completeness is critical. \n
Example 3: Using the min_count Parameter
\n\nThe min_count parameter sets the minimum number of valid values required for calculation, useful when ensuring sufficient data quality.
Example
\n\nimport pandas as pd\nimport numpy as np\n\n# Create a Series mostly with missing values\n\nsparse_data = pd.Series([np.nan, np.nan,100, np.nan, np.nan])\n\nprint("Data Mostly Missing:")\n\nprint(sparse_data)\n\nprint()\n\n# Default min_count=0, calculates if at least 0 valid values exist\n\nresult_default = sparse_data.sum()\n\nprint(f"Result when min_count=0 (default): {result_default}")\n\n# Set min_count=3, requires at least 3 valid values\n\nresult_min_count = sparse_data.sum(min_count=3)\n\nprint(f"Result when min_count=3: {result_min_count}")\n\n# Set min_count=2, only needs 2 valid values\n\nresult_min_count_2 = sparse_data.sum(min_count=2)\n\nprint(f"Result when min_count=2: {result_min_count_2}")\n\n\nOutput:
\n\nData Mostly Missing:\n0 NaN\n1 NaN\n2 100.0\n3 NaN\n4 NaN\ndtype: float64\n\nResult when min_count=0 (default): 100.0\nResult when min_count=3: nan\nResult when min_count=2: 100.0\n\n\nCode Explanation:
\n\n- \n
- This Series has only 1 valid value (100). \n
min_count=3means at least 3 valid values are needed, but there is only 1, so returns NaN. \nmin_count=2means at least 2 valid values are needed, but there is only 1, so also returns NaN. \n- This parameter is very useful in high-quality data analysis scenarios to identify insufficient valid data. \n
Example 4: Application in Real-World Data Analysis
\n\nCombining practical business scenarios to demonstrate typical applications of sum().
Example
\n\nimport pandas as pd\n\n# Create simulated sales data DataFrame\n\nsales_data = pd.DataFrame({\n 'Month': ['Jan','Feb','Mar','Apr','May','Jun'],\n 'East China': [12000,15000,18000,16000,14000,17000],\n 'North China': [10000,11000,13000,12500,11500,14000],\n 'South China': [15000,17000,19000,18000,16000,20000]\n})\n\nprint("First Half-Year Sales by Region:")\n\nprint(sales_data)\n\nprint()\n\n# Calculate total sales for each region\n\nfor region in ['East China','North China','South China']:\n total = sales_data.sum()\n print(f"{region} Total Sales (First Half-Year): {total} Yuan")\n\nprint()\n\n# Calculate total sales across all regions\n\ntotal_all = sales_data[['East China','North China','South China']].sum().sum()\n\nprint(f"Total Sales Across All Regions (First Half-Year): {total_all} Yuan")\n\n\nOutput:
\n\nFirst Half-Year Sales by Region:\n Month East China North China South China\n0 Jan 12000 10000 15000\n1 Feb 15000 11000 17000\n2 Mar 18000 13000 19000\n3 Apr 16000 12500 18000\n4 May 14000 11500 16000\n5 Jun 17000 14000 20000\n\nEast China Total Sales (First Half-Year): 92000 Yuan\nNorth China Total Sales (First Half-Year): 72000 Yuan\nSouth China Total Sales (First Half-Year): 105000 Yuan\nTotal Sales Across All Regions (First Half-Year): 269000 Yuan\n\n\n\n\n
Notes
\n\n- \n
sum()skips NaN values by default, which is usually the expected behavior. \n- If you need to sum a Series with non-numerical data, clean the data first or use the
numeric_only=Trueparameter. \n - The
min_countparameter is very useful in data validation and quality checks, helping identify insufficient valid data. \n - For large datasets,
sum()performs well due to its underlying vectorized operations in NumPy. \n
\n\n
Summary
\n\nSeries.sum() is one of the most fundamental and commonly used statistical functions in Pandas. Its main features include:
- \n
- Simple to use, directly called via dot notation. \n
- Supports handling missing values via the
skipnaparameter. \n - Can set a minimum requirement for valid values using the
min_countparameter. \n - Uses optimized NumPy implementation under the hood for efficient computation. \n
In practical data analysis, sum() is often used together with other aggregation functions like mean(), count(), etc., for comprehensive statistical analysis.
YouTip