Numpy Statistical Functions

NumPy provides many statistical functions for finding the minimum element, maximum element, percentile, standard deviation, variance, etc., from an array. ### numpy.amin() and numpy.amax() `numpy.amin()` is used to compute the minimum value of array elements along a specified axis. `numpy.amin(a, axis=None, out=None, keepdims=, initial=, where=)` Parameter Description: * `a`: The input array, which can be a NumPy array or an array-like object. * `axis`: Optional parameter, used to specify the axis along which to compute the minimum value. If this parameter is not provided, the minimum value of the entire array is returned. It can be an integer representing the axis index, or a tuple representing multiple axes. * `out`: Optional parameter, used to specify the storage location for the result. * `keepdims`: Optional parameter, if True, the number of dimensions of the result array will be the same as the input array. If False (default), axes with a dimension of 1 after computation will be removed. * `initial`: Optional parameter, used to specify an initial value, and then compute the minimum value over the array elements. * `where`: Optional parameter, a boolean array used to specify that only elements satisfying the condition should be considered. `numpy.amax()` is used to compute the maximum value of array elements along a specified axis. `numpy.amax(a, axis=None, out=None, keepdims=, initial=, where=)` Parameter Description: * `a`: The input array, which can be a NumPy array or an array-like object. * `axis`: Optional parameter, used to specify the axis along which to compute the maximum value. If this parameter is not provided, the maximum value of the entire array is returned. It can be an integer representing the axis index, or a tuple representing multiple axes. * `out`: Optional parameter, used to specify the storage location for the result. * `keepdims`: Optional parameter, if True, the number of dimensions of the result array will be the same as the input array. If False (default), axes with a dimension of 1 after computation will be removed. * `initial`: Optional parameter, used to specify an initial value, and then compute the maximum value over the array elements. * `where`: Optional parameter, a boolean array used to specify that only elements satisfying the condition should be considered. ## Example ```python import numpy as np a = np.array([[3,7,5],[8,4,3],[2,4,9]]) print('Our array is:') print(a) print('n') print('Calling amin() function:') print(np.amin(a,1)) print('n') print('Calling amin() function again:') print(np.amin(a,0)) print('n') print('Calling amax() function:') print(np.amax(a)) print('n') print('Calling amax() function again:') print(np.amax(a, axis = 0)) Output: Our array is: [ ] Calling amin() function: Calling amin() function again: Calling amax() function: 9 Calling amax() function again: ### numpy.ptp() The **numpy.ptp()** function calculates the range (maximum value - minimum value) of elements in an array. `numpy.ptp(a, axis=None, out=None, keepdims=, initial=, where=)` Parameter Description: * `a`: The input array, which can be a NumPy array or an array-like object. * `axis`: Optional parameter, used to specify the axis along which to compute the peak-to-peak value. If this parameter is not provided, the peak-to-peak value of the entire array is returned. It can be an integer representing the axis index, or a tuple representing multiple axes. * `out`: Optional parameter, used to specify the storage location for the result. * `keepdims`: Optional parameter, if True, the number of dimensions of the result array will be the same as the input array. If False (default), axes with a dimension of 1 after computation will be removed. * `initial`: Optional parameter, used to specify an initial value, and then compute the peak-to-peak value over the array elements. * `where`: Optional parameter, a boolean array used to specify that only elements satisfying the condition should be considered. ## Example ```python import numpy as np a = np.array([[3,7,5],[8,4,3],[2,4,9]]) print('Our array is:') print(a) print('n') print('Calling ptp() function:') print(np.ptp(a)) print('n') print('Calling ptp() function along axis 1:') print(np.ptp(a, axis = 1)) print('n') print('Calling ptp() function along axis 0:') print(np.ptp(a, axis = 0)) Output: Our array is: [ ] Calling ptp() function: 7 Calling ptp() function along axis 1: Calling ptp() function along axis 0: ### numpy.percentile() Percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. The function `numpy.percentile()` takes the following parameters. `numpy.percentile(a, q, axis)` Parameter Description: * a: Input array * q: The percentile to compute, which should be between 0 and 100 inclusive. * axis: The axis along which the percentile is computed. **First, let's clarify what a percentile is:** The p-th percentile is a value such that at least p percent of the data items are less than or equal to this value, and at least (100-p) percent of the data items are greater than or equal to this value. For example, college entrance examination scores are often reported in percentiles. Suppose a candidate's raw score in the Chinese section of the entrance exam is 54. It's not easy to know how his score compares to other students taking the same exam. However, if the raw score of 54 corresponds exactly to the 70th percentile, we can know that about 70% of the students scored lower than him, and about 30% scored higher. Here, p = 70. ## Example ```python import numpy as np a = np.array([[10, 7, 4], [3, 2, 1]]) print('Our array is:') print(a) print('Calling percentile() function:') print(np.percentile(a, 50)) print(np.percentile(a, 50, axis=0)) print(np.percentile(a, 50, axis=1)) print(np.percentile(a, 50, axis=1, keepdims=True)) Output: Our array is: [ ] Calling percentile() function: 3.5 [6.5 4.5 2.5] [7. 2.] [[7.] [2.]] ### numpy.median() The `numpy.median()` function is used to compute the median (middle value) of the elements in array `a`. `numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=)` Parameter Description: * `a`: The input array, which can be a NumPy array or an array-like object. * `axis`: Optional parameter, used to specify the axis along which to compute the median. If this parameter is not provided, the median of the entire array is computed. It can be an integer representing the axis index, or a tuple representing multiple axes. * `out`: Optional parameter, used to specify the storage location for the result. * `overwrite_input`: Optional parameter, if True, allows the use of the input array's memory in the computation. This may improve performance in some cases but could modify the contents of the input array. * `keepdims`: Optional parameter, if True, the number of dimensions of the result array will be the same as the input array. If False (default), axes with a dimension of 1 after computation will be removed. ## Example ```python import numpy as np a = np.array([[30,65,70],[80,95,10],[50,90,60]]) print('Our array is:') print(a) print('n') print('Calling median() function:') print(np.median(a)) print('n') print('Calling median() function along axis 0:') print(np.median(a, axis = 0)) print('n') print('Calling median() function along axis 1:') print(np.median(a, axis = 1)) Output: Our array is: [ ] Calling median() function: 65.0 Calling median() function along axis 0: [50. 90. 60.] Calling median() function along axis 1: [65. 80. 60.] ### numpy.mean() The `numpy.mean()` function returns the arithmetic mean of the array elements. If an axis is provided, it computes the mean along that axis. The arithmetic mean is the sum of the elements along the axis divided by the number of elements. `numpy.mean(a, axis=None, dtype=None, out=None, keepdims=)` Parameter Description: * `a`: The input array, which can be a NumPy array or an array-like object. * `axis`: Optional parameter, used to specify the axis along which to compute the mean. If this parameter is not provided, the mean of the entire array is computed. It can be an integer representing the axis index, or a tuple representing multiple axes. * `dtype`: Optional parameter, used to specify the data type for the output. If not provided, an appropriate data type is chosen based on the input data type. * `out`: Optional parameter, used to specify the storage location for the result. * `keepdims`: Optional parameter, if True, the number of dimensions of the result array will be the same as the input array. If False (default), axes with a dimension of 1 after computation will be removed. ## Example ```python import numpy as np a = np.array([[1,2,3],[3,4,5],[4,5,6]]) print('Our array is:') print(a) print('n') print('Calling mean() function:') print(np.mean(a)) print('n') print('Calling mean() function along axis 0:') print(np.mean(a, axis = 0)) print('n') print('Calling mean() function along axis 1:') print(np.mean(a, axis = 1)) Output: Our array is: [ ] Calling mean() function: 3.6666666666666665 Calling mean() function along axis 0: [2.66666667 3.66666667 4.66666667] Calling mean() function along axis 1: [2. 4. 5.] ### numpy.average() The `numpy.average()` function computes the weighted average of the array elements based on their respective weights given in another array. This function can accept an axis parameter. If no axis is specified, the array is flattened. The weighted average is calculated by multiplying each value by its corresponding weight, summing these products, and then dividing by the sum of the weights. Consider the array [1,2,3,4] and the corresponding weights [4,3,2,1]. The weighted average is calculated by summing the products of corresponding elements and dividing by the sum of the weights. Weighted average = (1*4+2*3+3*2+4*1)/(4+3+2+1) Function syntax: `numpy.average(a, axis=None, weights=None, returned=False)` Parameter Description: * `a`: The input array, which can be a NumPy array or an array-like object. * `axis`: Optional parameter, used to specify the axis along which to compute the weighted average. If this parameter is not provided, the weighted average of the entire array is computed. It can be an integer representing the axis index, or a tuple representing multiple axes. * `weights`: Optional parameter, used to specify the weights corresponding to the data points. If no weight array is provided, equal weights are assumed. * `returned`: Optional parameter, if True, both the weighted average and the sum of weights are returned. ## Example ```python import numpy as np a = np.array([1,2,3,4]) print('Our array is:') print(a) print('n') print('Calling average() function:') print(np.average(a)) print('n') wts = np.array([4,3,2,1]) print('Calling average() function again:') print(np.average(a,weights = wts)) print('n') print('Sum of weights:') print(np.average([1,2,3, 4],weights = [4,3,2,1], returned = True)) Output: Our array is: Calling average() function: 2.5 Calling average() function again: 2.0 Sum of weights: (2.0, 10.0) In a multidimensional array, you can specify the axis for computation. ## Example ```python import numpy as np a = np.arange(6).reshape(3,2) print('Our array is:') print(a) print('n') print('Modified array:') wt = np.array([3,5]) print(np.average(a, axis = 1, weights = wt)) print('n') print('Modified array:') print(np.average(a, axis = 1, weights = wt, returned = True)) Output: Our array is: [ ] Modified array: [0.625 2.625 4.625] Modified array: (array([0.625, 2.625, 4.625]), array([8., 8., 8.])) ### Standard Deviation Standard deviation is a measure of the amount of variation or dispersion of a set of values. Standard deviation is the arithmetic square root of the variance. The standard deviation formula is as follows: std = sqrt(mean((x - x.mean())**2)) If the array is [1, 2, 3, 4], then its mean is 2.5. Therefore, the squared differences are [2.25, 0.25, 0.25, 2.25], and the square root of their mean divided by 4, i.e., sqrt(5/4), is 1.1180339887498949. ## Example ```python import numpy as np print(np.std([1,2,3,4])) Output: 1.1180339887498949 ### Variance In statistics, variance (sample variance) is the average of the squared differences from the mean, i.e., mean((x - x.mean())** 2). In other words, standard deviation is the square root of the variance. ## Example ```python import numpy as np print(np.var([1,2,3,4])) Output: 1.25

YouTip

Numpy Statistical Functions

📂 Categories