Pandas Series.var() Function |

\n\n

Image 1: Pandas Common functions Pandas General Functions

\n\n

Series.var() is a function in Pandas used to calculate the variance of a Series. Variance is the square of the standard deviation and measures the spread of data, serving as a fundamental indicator in statistical analysis.

\n\n

The larger the variance, the more dispersed the data; the smaller the variance, the more concentrated the data. It has wide applications in fields such as statistical analysis, machine learning, and signal processing.

\n\n

Basic Syntax and Parameters

\n\n

var() is a member function of the Series object, called directly using the dot operator.

\n\n

Syntax Format

\n\n

Series.var(axis=None, skipna=True, level=None, numeric_only=None, ddof=1, **kwargs)

\n\n

Parameter Description

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n

Parameter	Type	Description	Default Value
axis	int	Specifies the axis. Since Series has only one row, this parameter is mainly for compatibility with DataFrame.	None
skipna	bool	If True, NaN values are skipped during calculation; if False, the result will be NaN if any NaN is encountered.	True
level	int or str	If the Series is a MultiIndex, specifies which level to compute.	None
numeric_only	bool	If True, only numeric data is computed; otherwise, it attempts to convert to numeric.	False
ddof	int	Degree of freedom adjustment parameter. ddof=1 uses sample variance (n-1), ddof=0 uses population variance (n).	1

\n\n

Return Value

\n\n

Return Type: float
Description: Returns the variance of elements in the Series. By default, it uses sample variance (divided by n-1).

\n\n

Examples

\n\n

Let's go through a series of examples from simple to complex to fully master the usage of Series.var().

\n\n

Example 1: Basic Usage – Understanding the Concept of Variance

\n\n

Variance is the square of the standard deviation, measuring the deviation of data from the mean.

\n\n

Example

\n\n

import pandas as pd\n\n# Two groups of data\n\ngroup_a = pd.Series([85,86,87,88,89])\n\ngroup_b = pd.Series([70,75,85,95,100])\n\nprint("Group A (more concentrated):")\n\nprint(group_a)\n\nprint(f"Mean: {group_a.mean():.2f}")\n\nprint(f"Variance: {group_a.var():.2f}")\n\nprint(f"Standard Deviation: {group_a.std():.2f}")\n\nprint()\n\nprint("Group B (more scattered):")\n\nprint(group_b)\n\nprint(f"Mean: {group_b.mean():.2f}")\n\nprint(f"Variance: {group_b.var():.2f}")\n\nprint(f"Standard Deviation: {group_b.std():.2f}")\n\nprint()\n\nprint("Note: Variance is the square of standard deviation (2.50^2 = 6.25, 12.50^2 = 156.25)")\n

\n\n

Output:

\n\n

AGroup data (more concentrated): 0 851 862 873 884 89 dtype: int64 Mean: 85.00Variance: 2.50Standard deviation: 1.58 BGroup data (more dispersed): 0 701 752 853 954 100 dtype: int64 Mean: 85.00Variance: 156.25Standard deviation: 12.50Note: Variance is the square of the standard deviation. (1.58^2 ≈ 2.50, 12.50^2 = 156.25)\n

\n\n

Code Explanation:

\n\n

Group A has a variance of 2.50, indicating very concentrated data.
Group B has a variance of 156.25, indicating very scattered data.
Variance = Standard Deviation² (approximate due to ddof=1 adjustment).

\n\n

Example 2: The Role of the ddof Parameter

\n\n

The ddof parameter controls whether to use sample variance or population variance.

\n\n

Example

\n\n

import pandas as pd\n\n# Create a dataset\n\ndata = pd.Series([2,4,4,4,5,5,7,9])\n\nprint("Data:")\n\nprint(data)\n\nprint()\n\n# Default ddof=1, use sample variance (divided by n-1)\n\nsample_var = data.var(ddof=1)\n\nprint(f"Sample Variance (ddof=1): {sample_var:.4f}")\n\n# ddof=0, use population variance (divided by n)\n\npopulation_var = data.var(ddof=0)\n\nprint(f"Population Variance (ddof=0): {population_var:.4f}")\n\nprint()\n\n# Verify: square root of sample variance ≈ sample standard deviation\n\nimport math\n\nprint(f"Square root of sample variance: {math.sqrt(sample_var):.4f}")\n\nprint(f"Sample standard deviation: {data.std():.4f}")\n

\n\n

Output:

\n\n

Data: 0 21 42 43 44 55 56 77 9 dtype: int64 Sample variance (ddof=1）：5.1429Population variance (ddof=0）：4.5000Square root of sample variance: 2.2678Sample standard deviation: 2.2678\n

\n\n

Example 3: Handling Data with Missing Values

\n\n

Example

\n\n

import pandas as pd\n\nimport numpy as np\n\n# Create a Series with missing values\n\ndata_with_nan = pd.Series([10,20, np.nan,30,40, np.nan,50])\n\nprint("Data with missing values:")\n\nprint(data_with_nan)\n\nprint()\n\n# Default skipna=True\n\nvar_skipna = data_with_nan.var()\n\nprint(f"Variance when skipna=True (default): {var_skipna:.4f}")\n\n# Set skipna=False\n\nvar_no_skipna = data_with_nan.var(skipna=False)\n\nprint(f"Variance when skipna=False: {var_no_skipna}")\n

\n\n

Output:

\n\n

Data containing missing values: 0 10.01 20.02 NaN3 30.04 40.05 NaN6 50.0 dtype: float64 skipna=True（Variance (default): 250.0000 skipna=False Variance when: nan\n

\n\n

Example 4: Practical Application Comparison Between Variance and Standard Deviation

\n\n

In practical applications, both variance and standard deviation have their advantages.

\n\n

Example

\n\n

import pandas as pd\n\n# Simulate monthly returns (%) for two investment portfolios\n\nportfolio_a = pd.Series([2.5,3.0, -1.0,1.5,2.0,2.8, -0.5,1.2,1.8,2.2])\n\nportfolio_b = pd.Series([5.0, -3.5,8.0, -2.0,6.5, -4.0,7.0, -1.5,4.5, -2.0])\n\nprint("Portfolio A (monthly returns %):")\n\nprint(portfolio_a)\n\nprint(f"Variance: {portfolio_a.var():.4f}")\n\nprint(f"Standard Deviation (volatility): {portfolio_a.std():.2f}%")\n\nprint()\n\nprint("Portfolio B (monthly returns %):")\n\nprint(portfolio_b)\n\nprint(f"Variance: {portfolio_b.var():.4f}")\n\nprint(f"Standard Deviation (volatility): {portfolio_b.std():.2f}%")\n\nprint()\n\nprint("Analysis:")\n\nprint("- Portfolio A has lower variance and standard deviation, indicating more stable performance")\n\nprint("- Portfolio B has higher variance and standard deviation, indicating higher risk")\n\nprint("- From a risk perspective, Portfolio A is better suited for conservative investors")\n

\n\n

Output:

\n\n

Portfolio A (monthly returns %）：0 2.51 3.02 1.03 1.54 2.05 2.86 0.57 1.28 1.89 2.2 dtype: int64 Variance: 1.29Standard deviation (volatility): 1.14%Portfolio B (monthly returns %）：0 5.01 -3.52 8.03 -2.04 6.55 -4.06 7.07 -1.58 4.59 -2.0 dtype: int64 Variance: 19.56Standard deviation (volatility): 4.42%Analysis: The risk (volatility) of Portfolio B is approximately 4 times that of Portfolio A.\n

\n\n

Notes

\n\n

Variance is the square of the standard deviation, and their relationship is: std = sqrt(var).
By default, sample variance (ddof=1) is used, suitable for statistical analysis.
The unit of variance is the square of the original data unit, making it less intuitive to interpret; standard deviation retains the original unit, making it easier to understand.
In scenarios requiring mathematical calculations involving variance (e.g., covariance analysis), variance should be used.

\n\n

Summary

\n\n

Series.var() is a basic function in statistical analysis. Its main features include:

\n\n

By default, it uses sample variance (ddof=1).
Variance is the square of the standard deviation, making it more convenient in mathematical operations.
Standard deviation retains the original unit, making it easier to intuitively understand.
In finance, both variance and standard deviation are used to measure risk (volatility).

\n\n

In practice, if you need an intuitive understanding of data dispersion, standard deviation is more appropriate; if mathematical operations (such as calculating covariance) are required, variance is more convenient.

\n\n

Image 2: Pandas Common functions Pandas General Functions

YouTip