Pandas Series.var() Function |
\n\n\n\n\n\n
Series.var() is a function in Pandas used to calculate the variance of a Series. Variance is the square of the standard deviation and measures the spread of data, serving as a fundamental indicator in statistical analysis.
The larger the variance, the more dispersed the data; the smaller the variance, the more concentrated the data. It has wide applications in fields such as statistical analysis, machine learning, and signal processing.
\n\n\n\n
Basic Syntax and Parameters
\n\nvar() is a member function of the Series object, called directly using the dot operator.
Syntax Format
\n\nSeries.var(axis=None, skipna=True, level=None, numeric_only=None, ddof=1, **kwargs)\n\nParameter Description
\n\n| Parameter | \nType | \nDescription | \nDefault Value | \n
|---|---|---|---|
| axis | \nint | \nSpecifies the axis. Since Series has only one row, this parameter is mainly for compatibility with DataFrame. | \nNone | \n
| skipna | \nbool | \nIf True, NaN values are skipped during calculation; if False, the result will be NaN if any NaN is encountered. | \nTrue | \n
| level | \nint or str | \nIf the Series is a MultiIndex, specifies which level to compute. | \nNone | \n
| numeric_only | \nbool | \nIf True, only numeric data is computed; otherwise, it attempts to convert to numeric. | \nFalse | \n
| ddof | \nint | \nDegree of freedom adjustment parameter. ddof=1 uses sample variance (n-1), ddof=0 uses population variance (n). | \n1 | \n
Return Value
\n\n- \n
- Return Type:
float\n - Description: Returns the variance of elements in the Series. By default, it uses sample variance (divided by n-1). \n
\n\n
Examples
\n\nLet's go through a series of examples from simple to complex to fully master the usage of Series.var().
Example 1: Basic Usage β Understanding the Concept of Variance
\n\nVariance is the square of the standard deviation, measuring the deviation of data from the mean.
\n\nExample
\n\nimport pandas as pd\n\n# Two groups of data\n\ngroup_a = pd.Series([85,86,87,88,89])\n\ngroup_b = pd.Series([70,75,85,95,100])\n\nprint("Group A (more concentrated):")\n\nprint(group_a)\n\nprint(f"Mean: {group_a.mean():.2f}")\n\nprint(f"Variance: {group_a.var():.2f}")\n\nprint(f"Standard Deviation: {group_a.std():.2f}")\n\nprint()\n\nprint("Group B (more scattered):")\n\nprint(group_b)\n\nprint(f"Mean: {group_b.mean():.2f}")\n\nprint(f"Variance: {group_b.var():.2f}")\n\nprint(f"Standard Deviation: {group_b.std():.2f}")\n\nprint()\n\nprint("Note: Variance is the square of standard deviation (2.50^2 = 6.25, 12.50^2 = 156.25)")\n\n\nOutput:
\n\nAGroup data (more concentrated): 0 851 862 873 884 89 dtype: int64 Mean: 85.00Variance: 2.50Standard deviation: 1.58 BGroup data (more dispersed): 0 701 752 853 954 100 dtype: int64 Mean: 85.00Variance: 156.25Standard deviation: 12.50Note: Variance is the square of the standard deviation. (1.58^2 β 2.50, 12.50^2 = 156.25)\n\n\nCode Explanation:
\n\n- \n
- Group A has a variance of 2.50, indicating very concentrated data. \n
- Group B has a variance of 156.25, indicating very scattered data. \n
- Variance = Standard DeviationΒ² (approximate due to ddof=1 adjustment). \n
Example 2: The Role of the ddof Parameter
\n\nThe ddof parameter controls whether to use sample variance or population variance.
Example
\n\nimport pandas as pd\n\n# Create a dataset\n\ndata = pd.Series([2,4,4,4,5,5,7,9])\n\nprint("Data:")\n\nprint(data)\n\nprint()\n\n# Default ddof=1, use sample variance (divided by n-1)\n\nsample_var = data.var(ddof=1)\n\nprint(f"Sample Variance (ddof=1): {sample_var:.4f}")\n\n# ddof=0, use population variance (divided by n)\n\npopulation_var = data.var(ddof=0)\n\nprint(f"Population Variance (ddof=0): {population_var:.4f}")\n\nprint()\n\n# Verify: square root of sample variance β sample standard deviation\n\nimport math\n\nprint(f"Square root of sample variance: {math.sqrt(sample_var):.4f}")\n\nprint(f"Sample standard deviation: {data.std():.4f}")\n\n\nOutput:
\n\nData: 0 21 42 43 44 55 56 77 9 dtype: int64 Sample variance (ddof=1οΌοΌ5.1429Population variance (ddof=0οΌοΌ4.5000Square root of sample variance: 2.2678Sample standard deviation: 2.2678\n\n\nExample 3: Handling Data with Missing Values
\n\nExample
\n\nimport pandas as pd\n\nimport numpy as np\n\n# Create a Series with missing values\n\ndata_with_nan = pd.Series([10,20, np.nan,30,40, np.nan,50])\n\nprint("Data with missing values:")\n\nprint(data_with_nan)\n\nprint()\n\n# Default skipna=True\n\nvar_skipna = data_with_nan.var()\n\nprint(f"Variance when skipna=True (default): {var_skipna:.4f}")\n\n# Set skipna=False\n\nvar_no_skipna = data_with_nan.var(skipna=False)\n\nprint(f"Variance when skipna=False: {var_no_skipna}")\n\n\nOutput:
\n\nData containing missing values: 0 10.01 20.02 NaN3 30.04 40.05 NaN6 50.0 dtype: float64 skipna=TrueοΌVariance (default): 250.0000 skipna=False Variance when: nan\n\n\nExample 4: Practical Application Comparison Between Variance and Standard Deviation
\n\nIn practical applications, both variance and standard deviation have their advantages.
\n\nExample
\n\nimport pandas as pd\n\n# Simulate monthly returns (%) for two investment portfolios\n\nportfolio_a = pd.Series([2.5,3.0, -1.0,1.5,2.0,2.8, -0.5,1.2,1.8,2.2])\n\nportfolio_b = pd.Series([5.0, -3.5,8.0, -2.0,6.5, -4.0,7.0, -1.5,4.5, -2.0])\n\nprint("Portfolio A (monthly returns %):")\n\nprint(portfolio_a)\n\nprint(f"Variance: {portfolio_a.var():.4f}")\n\nprint(f"Standard Deviation (volatility): {portfolio_a.std():.2f}%")\n\nprint()\n\nprint("Portfolio B (monthly returns %):")\n\nprint(portfolio_b)\n\nprint(f"Variance: {portfolio_b.var():.4f}")\n\nprint(f"Standard Deviation (volatility): {portfolio_b.std():.2f}%")\n\nprint()\n\nprint("Analysis:")\n\nprint("- Portfolio A has lower variance and standard deviation, indicating more stable performance")\n\nprint("- Portfolio B has higher variance and standard deviation, indicating higher risk")\n\nprint("- From a risk perspective, Portfolio A is better suited for conservative investors")\n\n\nOutput:
\n\nPortfolio A (monthly returns %οΌοΌ0 2.51 3.02 1.03 1.54 2.05 2.86 0.57 1.28 1.89 2.2 dtype: int64 Variance: 1.29Standard deviation (volatility): 1.14%Portfolio B (monthly returns %οΌοΌ0 5.01 -3.52 8.03 -2.04 6.55 -4.06 7.07 -1.58 4.59 -2.0 dtype: int64 Variance: 19.56Standard deviation (volatility): 4.42%Analysis: The risk (volatility) of Portfolio B is approximately 4 times that of Portfolio A.\n\n\n\n\n
Notes
\n\n- \n
- Variance is the square of the standard deviation, and their relationship is: std = sqrt(var). \n
- By default, sample variance (ddof=1) is used, suitable for statistical analysis. \n
- The unit of variance is the square of the original data unit, making it less intuitive to interpret; standard deviation retains the original unit, making it easier to understand. \n
- In scenarios requiring mathematical calculations involving variance (e.g., covariance analysis), variance should be used. \n
\n\n
Summary
\n\nSeries.var() is a basic function in statistical analysis. Its main features include:
- \n
- By default, it uses sample variance (ddof=1). \n
- Variance is the square of the standard deviation, making it more convenient in mathematical operations. \n
- Standard deviation retains the original unit, making it easier to intuitively understand. \n
- In finance, both variance and standard deviation are used to measure risk (volatility). \n
In practice, if you need an intuitive understanding of data dispersion, standard deviation is more appropriate; if mathematical operations (such as calculating covariance) are required, variance is more convenient.
\n\n
YouTip