Pandas df.head() Function
head() is one of the most commonly used functions in Pandas DataFrame and Series, used to quickly view the beginning portion of a dataset. It returns the first n rows of data, allowing us to understand the basic structure and content of the data without loading the entire dataset.
This function is used very frequently in data analysis tasks, especially when handling large datasets. Typically, we first use head() to inspect the first few rows of data to confirm whether the data structure meets expectations before proceeding with further analysis and processing.
Basic Syntax and Parameters
head() is a member function of DataFrame and Series, invoked using the dot operator .. It does not require any mandatory parameters, but accepts an optional parameter to specify the number of rows to return.
Syntax
DataFrame.head(n=5)
Series.head(n=5)
Parameter Description
| Parameter | Type | Required | Description | Default |
|---|---|---|---|---|
| n | int | Optional | Return the first n rows. If n exceeds the total number of rows, all rows are returned. | 5 |
Return Value Description
- Return Type: If the caller is a DataFrame, a DataFrame is returned; if the caller is a Series, a Series is returned.
- Number of Rows Returned: At most n rows are returned. If the dataset contains fewer than n rows, all rows are returned.
Examples
Letβs comprehensively master the usage of head() through a series of examples.
Example 1: Basic Usage β View the First Few Rows of a DataFrame
First, create a simple DataFrame, then use head() to view the first few rows.
Example
import pandas as pd
# Create a sample DataFrame containing student grade data
data = {
'name': ['Alice','Bob','Charlie','David','Eve','Frank','Grace','Henry','Iris','Jack'],
'age': [18,19,17,18,20,19,18,17,19,18],
'score': [85,92,78,90,88,95,82,76,89,91],
'grade': ['A','A','B','A','B','A','B','C','B','A']
}
df = pd.DataFrame(data)
# Default: return first 5 rows
print("First 5 rows by default:")
print(df.head())
# Specify returning first 3 rows
print("First 3 rows:")
print(df.head(3))
# Return first 7 rows
print("First 7 rows:")
print(df.head(7))
Output:
First 5 rows by default:
name age score grade
0 Alice 18 85 A
1 Bob 19 92 A
2 Charlie 17 78 B
3 David 18 90 A
4 Eve 20 88 B
First 3 rows:
name age score grade
0 Alice 18 85 A
1 Bob 19 92 A
2 Charlie 17 78 B
First 7 rows:
name age score grade
0 Alice 18 85 A
1 Bob 19 92 A
2 Charlie 17 78 B
3 David 18 90 A
4 Eve 20 88 B
5 Frank 19 95 A
6 Grace 18 82 B
Code Explanation:
- A DataFrame with 10 rows of student data was created.
- When
df.head()is called without arguments, it returns the first 5 rows by default β the most common usage. df.head(3)returns the first 3 rows; the number of rows to view can be adjusted as needed.- When the specified n exceeds the total number of rows, all rows are returned without raising an error.
Example 2: View the First Few Elements of a Series
head() is not only applicable to DataFrames but also to Series objects.
Example
import pandas as pd
# Create a Series containing a sequence of numeric values
s = pd.Series([10,20,30,40,50,60,70,80,90,100])
# View first 3 elements of the Series
print("First 3 elements:")
print(s.head(3))
# Create a Series with custom index
s2 = pd.Series([100,200,300,400,500], index=['a','b','c','d','e'])
print("First 2 elements of a Series with custom index:")
print(s2.head(2))
Output:
First 3 elements:
0 10
1 20
2 30
dtype: int64
First 2 elements of a Series with custom index:
a 100
b 200
dtype: int64
Code Explanation:
- Series also supports the
head()method, returning the first n elements. - A Series with a custom index also works normally with
head(), preserving the original index values.
Example 3: Combine with Other Functions
head() is frequently combined with other DataFrame functions for data exploration and analysis.
Example
import pandas as pd
import numpy as np
# Create a larger DataFrame
np.random.seed(42) # Set random seed for reproducibility
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=100),
'value': np.random.randn(100).round(2),
'category': np.random.choice(['A','B','C'], 100)
})
# View data type information
print("Data type information:")
print(df.dtypes)
print()
# View first 10 rows
print("First 10 rows:")
print(df.head(10))
# Sort first, then view first few rows
df_sorted = df.sort_values('value', ascending=False)
print("First 5 rows after sorting by 'value' in descending order:")
print(df_sorted.head())
# View statistical summary of the first few rows
print("Statistical summary of the first 5 rows:")
print(df.head().describe())
Output:
Data type information:
date datetime64
value float64
category object
dtype: object
First 10 rows:
date value category
0 2024-01-01 0.34 A
1 2024-01-02 -0.23 B
2 2024-01-03 0.54 C
3 2024-01-04 -1.58 B
4 2024-01-05 -0.29 C
5 2024-01-06 0.65 A
6 2024-01-07 0.86 A
7 2024-01-08 -0.37 C
8 2024-01-09 0.11 B
9 2024-01-10 0.22 A
First 5 rows after sorting by 'value' in descending order:
date value category
72 2024-03-13 2.87 C
55 2024-02-25 2.32 A
31 2024-02-01 2.21 B
64 2024-03-05 1.71 C
84 2024-03-25 1.55 A
Statistical summary of the first 5 rows:
age score
count 5.0 5.000000
mean 18.0 87.200000
std 1.0 5.403000
min 17.0 78.000000
50% 18.0 88.000000
max 20.0 95.000000
Code Explanation:
head()can be combined withsort_values()to sort first, then view the top rows.- The result returned by
head()is still a DataFrame, so other DataFrame methods can be called on it. describe()can compute statistical summaries for the data returned byhead().
Notes
head()does not modify the original DataFrame or Series; it returns a new object.- When n is less than or equal to 0, an empty DataFrame or Series is returned.
- For large datasets, using
head()first to inspect structure is a good practice. - The returned data preserves the original index values and does not renumber them.
Tip: In Jupyter Notebook or JupyterLab environments, simply entering a DataFrame variable name and executing it will display the result of
head()by default, greatly facilitating data exploration.
Summary
head() is one of the most fundamental and practical functions in Pandas for inspecting data. It allows quick preview of the beginning portion of a dataset, helping us understand key information such as data structure, column names, and data types.
In actual data analysis workflows, we typically follow this βfour-step data inspection routineβ: first use head() to view the basic structure, then use tail() to inspect the end of the data, and finally use info() and describe() to understand the overall data characteristics. This routine is a basic skill that every data analyst should master.
YouTip