Ref Stat Variance
## Python statistics.variance() Method
The `statistics.variance()` method is a built-in function in Python's `statistics` module used to calculate the **sample variance** of a given dataset.
Variance is a measurement of the spread of a dataset. It quantifies how much the numbers in the dataset deviate from their mean (average). A low variance indicates that the data points tend to be very close to the mean, while a high variance indicates that the data points are spread out over a wider range of values.
---
### Syntax
```python
statistics.variance(data, xbar=None)
```
### Parameters
| Parameter | Type | Description |
| :--- | :--- | :--- |
| `data` | Iterable | A sequence or iterable of real-valued numbers (e.g., `list`, `tuple`). |
| `xbar` | Float / Decimal (Optional) | The known mean of the dataset. If omitted or set to `None`, the method will automatically calculate the mean of the dataset. |
### Return Value
* **Returns:** A `float` (or `Fraction`/`Decimal` depending on the input types) representing the sample variance of the dataset.
* **Exceptions:**
* Raises a `StatisticsError` if `data` has fewer than two values.
---
### Mathematical Formula
The `variance()` method calculates the **sample variance** (often denoted as $s^2$), which uses Bessel's correction ($n - 1$ in the denominator) to provide an unbiased estimator of the population variance:
$$s^2 = \frac{\sum (x - \bar{x})^2}{n - 1}$$
Where:
* $x$ represents each value in the dataset.
* $\bar{x}$ is the sample mean (represented by the `xbar` parameter).
* $n$ is the number of data points.
---
### Code Examples
#### Example 1: Basic Usage with a List of Integers
```python
import statistics
# Sample dataset
data = [1, 2, 3, 4, 5]
# Calculate sample variance
val_variance = statistics.variance(data)
print(f"The sample variance is: {val_variance}")
```
**Output:**
```text
The sample variance is: 2.5
```
#### Example 2: Providing a Pre-calculated Mean (`xbar`)
If you have already calculated the mean of your dataset, you can pass it to the `xbar` parameter to avoid redundant calculations and improve performance.
```python
import statistics
data = [10.0, 20.0, 30.0, 40.0, 50.0]
# Pre-calculate the mean
mean_val = statistics.mean(data) # mean is 30.0
# Calculate variance using the pre-calculated mean
val_variance = statistics.variance(data, xbar=mean_val)
print(f"The sample variance is: {val_variance}")
```
**Output:**
```text
The sample variance is: 250.0
```
#### Example 3: Working with Decimal and Fraction Types
The `statistics` module preserves high precision by supporting `Decimal` and `Fraction` types.
```python
from decimal import Decimal
from fractions import Fraction
import statistics
# Using Decimal
decimal_data = [Decimal("1.5"), Decimal("2.5"), Decimal("3.5")]
print("Decimal Variance:", statistics.variance(decimal_data))
# Using Fraction
fraction_data = [Fraction(1, 2), Fraction(3, 4), Fraction(5, 6)]
print("Fraction Variance:", statistics.variance(fraction_data))
```
**Output:**
```text
Decimal Variance: 1.0
Fraction Variance: 7/144
```
---
### Considerations & Best Practices
#### 1. Sample Variance vs. Population Variance
* **`statistics.variance()`**: Calculates **sample variance** (divides by $n - 1$). Use this when your data represents a *sample* (subset) of a larger population.
* **`statistics.pvariance()`**: Calculates **population variance** (divides by $n$). Use this when your dataset represents the *entire* population.
#### 2. Minimum Data Points
The dataset passed to `variance()` must contain **at least two data points**. Passing a single value or an empty iterable will raise a `StatisticsError`:
```python
import statistics
# This will raise a StatisticsError: variance requires at least two data points
statistics.variance()
```
YouTip