Matplotlib Hist
We can use the hist() method in pyplot to draw a histogram.
The hist() method is a function in the pyplot submodule of the Matplotlib library used to draw histograms.
The hist() method can be used to visualize the distribution of data, such as observing the central tendency, skewness, and outliers of the data.
The syntax of the hist() method is as follows:
matplotlib.pyplot.hist(x, bins=None, range=None, density=False, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=None, label=None, stacked=False, **kwargs)
**Parameter Description:**
* `x`: represents the data to plot in the histogram, which can be a one-dimensional array or list.
* `bins`: optional parameter, represents the number of bins in the histogram. The default is 10.
* `range`: optional parameter, represents the value range of the histogram, which can be a tuple or list. The default is None, which uses the minimum and maximum values in the data.
* `density`: optional parameter, represents whether to normalize the histogram. The default is False, which means the height of the histogram represents the number of samples in each bin, not the frequency or probability density.
* `weights`: optional parameter, represents the weight of each data point. The default is None.
* `cumulative`: optional parameter, represents whether to plot the cumulative distribution. The default is False.
* `bottom`: optional parameter, represents the starting height of the histogram. The default is None.
* `histtype`: optional parameter, represents the type of histogram, which can be 'bar', 'barstacked', 'step', 'stepfilled', etc. The default is 'bar'.
* `align`: optional parameter, represents the alignment of the bins, which can be 'left', 'mid', or 'right'. The default is 'mid'.
* `orientation`: optional parameter, represents the orientation of the histogram, which can be 'vertical' or 'horizontal'. The default is 'vertical'.
* `rwidth`: optional parameter, represents the width of each bin. The default is None.
* `log`: optional parameter, represents whether to use a logarithmic scale on the y-axis. The default is False.
* `color`: optional parameter, represents the color of the histogram.
* `label`: optional parameter, represents the label of the histogram.
* `stacked`: optional parameter, represents whether to stack different histograms. The default is False.
* `**kwargs`: optional parameter, represents other plotting parameters.
In the following example, we simply use hist() to create a histogram:
## Example
import matplotlib.pyplot as plt
import numpy as np
# Generate a set of random data
data = np.random.randn(1000)
# Draw the histogram
plt.hist(data, bins=30, color='skyblue', alpha=0.8)
# Set chart properties
plt.title('TUTORIAL hist() Test')
plt.xlabel('Value')
plt.ylabel('Frequency')
# Display the chart
plt.show()
The result is as follows:
!(#)
The following example demonstrates how to use the hist() function to draw histograms for multiple data groups and compare them:
## Example
import matplotlib.pyplot as plt
import numpy as np
# Generate three sets of random data
data1 = np.random.normal(0,1,1000)
data2 = np.random.normal(2,1,1000)
data3 = np.random.normal(-2,1,1000)
# Draw the histogram
plt.hist(data1, bins=30, alpha=0.5, label='Data 1')
plt.hist(data2, bins=30, alpha=0.5, label='Data 2')
plt.hist(data3, bins=30, alpha=0.5, label='Data 3')
# Set chart properties
plt.title('TUTORIAL hist() TEST')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
# Display the chart
plt.show()
In the above example, we generated three different sets of random data and used the hist() function to draw their histograms. By setting different means and standard deviations, we can generate random data with different distribution characteristics.
We set the bins parameter to 30, which means the data range is divided into 30 equal-width intervals, and then the frequency of data in each interval is counted.
We set the alpha parameter to 0.5, which means the color transparency of each histogram is 50%.
We used the label parameter to set the label for each histogram so that it can be displayed in the legend.
Then we used the legend() function to display the legend. Finally, we used the title(), xlabel(), and ylabel() functions to set the chart title and axis labels.
The result is as follows:
!(#)
From the above chart, we can clearly see the distribution of these three data groups, where data1 and data2 are approximately normally distributed, while data3 is skewed.
This way of comparing histograms can help us analyze and compare the distribution of different data groups.
### Combined with Pandas
In the following example, we combine Pandas to draw a histogram:
## Example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Use NumPy to generate random numbers
random_data = np.random.normal(170,10,250)
# Convert data to Pandas DataFrame
dataframe = pd.DataFrame(random_data)
# Use Pandas hist() method to draw the histogram
dataframe.hist()
# Set chart properties
plt.title('TUTORIAL hist() Test')
plt.xlabel('X-Value')
plt.ylabel('Y-Value')
# Display the chart
plt.show()
The result is as follows:
!(#)
In addition to DataFrames, you can also use Series objects in Pandas to draw histograms. Simply replace the column in the DataFrame with a Series object.
## Example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generate random data
data = pd.Series(np.random.normal(size=100))
# Draw the histogram
# The bins parameter specifies the number of bars in the histogram
plt.hist(data, bins=10)
# Set
YouTip