YouTip LogoYouTip

Pandas Series Sum

Pandas Series.sum() Function |

\n\n

Image 1: Pandas Common functions Pandas General Functions

\n\n
\n\n

Series.sum() is a function in Pandas used to calculate the sum of all elements in a Series. It is one of the most commonly used statistical functions in data analysis and can quickly obtain the total of numerical data.

\n\n

Whether it's calculating total sales, total profit, or total quantity, sum() can help you complete these tasks quickly. It is especially suitable for scenarios such as financial data processing, statistical reports, and sales data analysis.

\n\n
\n\n

Basic Syntax and Parameters

\n\n

sum() is a member function of the Series object, called directly using the dot operator.

\n\n

Syntax Format

\n\n
Series.sum(axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)
\n\n

Parameter Description

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ParameterTypeDescriptionDefault Value
axisintSpecifies the axis. Since Series has only one row, this parameter is mainly for compatibility with DataFrame.None
skipnaboolIf True, skips NaN values during calculation; if False, returns NaN when encountering NaN.True
levelint or strIf the Series is a MultiIndex, specifies the level to compute.None
numeric_onlyboolIf True, only computes numerical data; otherwise attempts to convert to numerical.False
min_countintMinimum number of valid values required for computation. If fewer than this, returns NaN.0
\n\n

Return Value

\n\n
    \n
  • Return Type: Numeric (int, float, or numpy.nan)
  • \n
  • Description: Returns the sum of all elements in the Series. If all elements are NaN (and skipna=True), returns 0.
  • \n
\n\n
\n\n

Examples

\n\n

Let’s go through a series of examples from simple to complex to fully master the usage of Series.sum().

\n\n

Example 1: Basic Usage – Calculating the Sum of a Numerical List

\n\n

The most basic way is to create a numerical Series and then call sum() to calculate the total.

\n\n

Example

\n\n
import pandas as pd\n\n# Create a Series containing sales data\n\n# Simulate daily sales data for a week\n\nsales_data = pd.Series([1200,1500,1800,900,2100,1600,1350])\n\n# Calculate total sales\n\ntotal_sales = sales_data.sum()\n\nprint("Daily Sales:")\n\nprint(sales_data)\n\nprint()\n\nprint(f"Total Sales: {total_sales}")\n
\n\n

Output:

\n\n
Daily Sales:\n0    1200\n1    1500\n2    1800\n3     900\n4    2100\n5    1600\n6    1350\ndtype: int64\n\nTotal Sales: 10450\n
\n\n

Code Explanation:

\n\n
    \n
  • A Series containing 7 days of sales data is created, with integer data type.
  • \n
  • sum() simply iterates over all elements and adds them up to get the total of 10450.
  • \n
\n\n

Example 2: Handling Data with Missing Values

\n\n

In real-world data, missing values are common. The skipna parameter determines how to handle these missing values.

\n\n

Example

\n\n
import pandas as pd\nimport numpy as np\n\n# Create a Series with missing values\n\n# Simulate some missing sales data for certain dates\n\nsales_with_nan = pd.Series([1200,1500, np.nan,1800, np.nan,2100,1600])\n\nprint("Sales Data with Missing Values:")\n\nprint(sales_with_nan)\n\nprint()\n\n# Default skipna=True, skip NaN when calculating sum\n\ntotal_skipna = sales_with_nan.sum()\n\nprint(f"Sum when skipna=True (default): {total_skipna}")\n\n# Set skipna=False, return NaN when encountering NaN\n\ntotal_no_skipna = sales_with_nan.sum(skipna=False)\n\nprint(f"Sum when skipna=False: {total_no_skipna}")\n
\n\n

Output:

\n\n
Sales Data with Missing Values:\n0    1200.0\n1    1500.0\n2       NaN\n3    1800.0\n4       NaN\n5    2100.0\n6    1600.0\ndtype: float64\n\nSum when skipna=True (default): 8200.0\nSum when skipna=False: nan\n
\n\n

Code Explanation:

\n\n
    \n
  • When skipna=True (default), sum() automatically skips NaN values and calculates the sum of valid values.
  • \n
  • Calculation process: 1200 + 1500 + 1800 + 2100 + 1600 = 8200
  • \n
  • When skipna=False, any presence of NaN results in NaN, which is useful when data completeness is critical.
  • \n
\n\n

Example 3: Using the min_count Parameter

\n\n

The min_count parameter sets the minimum number of valid values required for calculation, useful when ensuring sufficient data quality.

\n\n

Example

\n\n
import pandas as pd\nimport numpy as np\n\n# Create a Series mostly with missing values\n\nsparse_data = pd.Series([np.nan, np.nan,100, np.nan, np.nan])\n\nprint("Data Mostly Missing:")\n\nprint(sparse_data)\n\nprint()\n\n# Default min_count=0, calculates if at least 0 valid values exist\n\nresult_default = sparse_data.sum()\n\nprint(f"Result when min_count=0 (default): {result_default}")\n\n# Set min_count=3, requires at least 3 valid values\n\nresult_min_count = sparse_data.sum(min_count=3)\n\nprint(f"Result when min_count=3: {result_min_count}")\n\n# Set min_count=2, only needs 2 valid values\n\nresult_min_count_2 = sparse_data.sum(min_count=2)\n\nprint(f"Result when min_count=2: {result_min_count_2}")\n
\n\n

Output:

\n\n
Data Mostly Missing:\n0    NaN\n1    NaN\n2    100.0\n3    NaN\n4    NaN\ndtype: float64\n\nResult when min_count=0 (default): 100.0\nResult when min_count=3: nan\nResult when min_count=2: 100.0\n
\n\n

Code Explanation:

\n\n
    \n
  • This Series has only 1 valid value (100).
  • \n
  • min_count=3 means at least 3 valid values are needed, but there is only 1, so returns NaN.
  • \n
  • min_count=2 means at least 2 valid values are needed, but there is only 1, so also returns NaN.
  • \n
  • This parameter is very useful in high-quality data analysis scenarios to identify insufficient valid data.
  • \n
\n\n

Example 4: Application in Real-World Data Analysis

\n\n

Combining practical business scenarios to demonstrate typical applications of sum().

\n\n

Example

\n\n
import pandas as pd\n\n# Create simulated sales data DataFrame\n\nsales_data = pd.DataFrame({\n    'Month': ['Jan','Feb','Mar','Apr','May','Jun'],\n    'East China': [12000,15000,18000,16000,14000,17000],\n    'North China': [10000,11000,13000,12500,11500,14000],\n    'South China': [15000,17000,19000,18000,16000,20000]\n})\n\nprint("First Half-Year Sales by Region:")\n\nprint(sales_data)\n\nprint()\n\n# Calculate total sales for each region\n\nfor region in ['East China','North China','South China']:\n    total = sales_data.sum()\n    print(f"{region} Total Sales (First Half-Year): {total} Yuan")\n\nprint()\n\n# Calculate total sales across all regions\n\ntotal_all = sales_data[['East China','North China','South China']].sum().sum()\n\nprint(f"Total Sales Across All Regions (First Half-Year): {total_all} Yuan")\n
\n\n

Output:

\n\n
First Half-Year Sales by Region:\n  Month  East China  North China  South China\n0   Jan       12000        10000        15000\n1   Feb       15000        11000        17000\n2   Mar       18000        13000        19000\n3   Apr       16000        12500        18000\n4   May       14000        11500        16000\n5   Jun       17000        14000        20000\n\nEast China Total Sales (First Half-Year): 92000 Yuan\nNorth China Total Sales (First Half-Year): 72000 Yuan\nSouth China Total Sales (First Half-Year): 105000 Yuan\nTotal Sales Across All Regions (First Half-Year): 269000 Yuan\n
\n\n
\n\n

Notes

\n\n
    \n
  • sum() skips NaN values by default, which is usually the expected behavior.
  • \n
  • If you need to sum a Series with non-numerical data, clean the data first or use the numeric_only=True parameter.
  • \n
  • The min_count parameter is very useful in data validation and quality checks, helping identify insufficient valid data.
  • \n
  • For large datasets, sum() performs well due to its underlying vectorized operations in NumPy.
  • \n
\n\n
\n\n

Summary

\n\n

Series.sum() is one of the most fundamental and commonly used statistical functions in Pandas. Its main features include:

\n\n
    \n
  • Simple to use, directly called via dot notation.
  • \n
  • Supports handling missing values via the skipna parameter.
  • \n
  • Can set a minimum requirement for valid values using the min_count parameter.
  • \n
  • Uses optimized NumPy implementation under the hood for efficient computation.
  • \n
\n\n

In practical data analysis, sum() is often used together with other aggregation functions like mean(), count(), etc., for comprehensive statistical analysis.

\n\n

Image 2: Pandas Common functions Pandas General Functions

← Pandas Series MedianPandas Df Filter β†’