Ref Stat Variance

## Python statistics.variance() Method The `statistics.variance()` method is a built-in function in Python's `statistics` module used to calculate the **sample variance** of a given dataset. Variance is a measurement of the spread of a dataset. It quantifies how much the numbers in the dataset deviate from their mean (average). A low variance indicates that the data points tend to be very close to the mean, while a high variance indicates that the data points are spread out over a wider range of values. --- ### Syntax ```python statistics.variance(data, xbar=None) ``` ### Parameters | Parameter | Type | Description | | :--- | :--- | :--- | | `data` | Iterable | A sequence or iterable of real-valued numbers (e.g., `list`, `tuple`). | | `xbar` | Float / Decimal (Optional) | The known mean of the dataset. If omitted or set to `None`, the method will automatically calculate the mean of the dataset. | ### Return Value * **Returns:** A `float` (or `Fraction`/`Decimal` depending on the input types) representing the sample variance of the dataset. * **Exceptions:** * Raises a `StatisticsError` if `data` has fewer than two values. --- ### Mathematical Formula The `variance()` method calculates the **sample variance** (often denoted as $s^2$), which uses Bessel's correction ($n - 1$ in the denominator) to provide an unbiased estimator of the population variance: $$s^2 = \frac{\sum (x - \bar{x})^2}{n - 1}$$ Where: * $x$ represents each value in the dataset. * $\bar{x}$ is the sample mean (represented by the `xbar` parameter). * $n$ is the number of data points. --- ### Code Examples #### Example 1: Basic Usage with a List of Integers ```python import statistics # Sample dataset data = [1, 2, 3, 4, 5] # Calculate sample variance val_variance = statistics.variance(data) print(f"The sample variance is: {val_variance}") ``` **Output:** ```text The sample variance is: 2.5 ``` #### Example 2: Providing a Pre-calculated Mean (`xbar`) If you have already calculated the mean of your dataset, you can pass it to the `xbar` parameter to avoid redundant calculations and improve performance. ```python import statistics data = [10.0, 20.0, 30.0, 40.0, 50.0] # Pre-calculate the mean mean_val = statistics.mean(data) # mean is 30.0 # Calculate variance using the pre-calculated mean val_variance = statistics.variance(data, xbar=mean_val) print(f"The sample variance is: {val_variance}") ``` **Output:** ```text The sample variance is: 250.0 ``` #### Example 3: Working with Decimal and Fraction Types The `statistics` module preserves high precision by supporting `Decimal` and `Fraction` types. ```python from decimal import Decimal from fractions import Fraction import statistics # Using Decimal decimal_data = [Decimal("1.5"), Decimal("2.5"), Decimal("3.5")] print("Decimal Variance:", statistics.variance(decimal_data)) # Using Fraction fraction_data = [Fraction(1, 2), Fraction(3, 4), Fraction(5, 6)] print("Fraction Variance:", statistics.variance(fraction_data)) ``` **Output:** ```text Decimal Variance: 1.0 Fraction Variance: 7/144 ``` --- ### Considerations & Best Practices #### 1. Sample Variance vs. Population Variance * **`statistics.variance()`**: Calculates **sample variance** (divides by $n - 1$). Use this when your data represents a *sample* (subset) of a larger population. * **`statistics.pvariance()`**: Calculates **population variance** (divides by $n$). Use this when your dataset represents the *entire* population. #### 2. Minimum Data Points The dataset passed to `variance()` must contain **at least two data points**. Passing a single value or an empty iterable will raise a `StatisticsError`: ```python import statistics # This will raise a StatisticsError: variance requires at least two data points statistics.variance() ```

YouTip

Ref Stat Variance

📂 Categories