YouTip LogoYouTip

Pandas Df Head

Pandas df.head() Function

Image 1: Pandas Common Functions Pandas Common Functions


head() is one of the most commonly used functions in Pandas DataFrame and Series, used to quickly view the beginning portion of a dataset. It returns the first n rows of data, allowing us to understand the basic structure and content of the data without loading the entire dataset.

This function is used very frequently in data analysis tasks, especially when handling large datasets. Typically, we first use head() to inspect the first few rows of data to confirm whether the data structure meets expectations before proceeding with further analysis and processing.


Basic Syntax and Parameters

head() is a member function of DataFrame and Series, invoked using the dot operator .. It does not require any mandatory parameters, but accepts an optional parameter to specify the number of rows to return.

Syntax

DataFrame.head(n=5)
Series.head(n=5)

Parameter Description

Parameter Type Required Description Default
n int Optional Return the first n rows. If n exceeds the total number of rows, all rows are returned. 5

Return Value Description

  • Return Type: If the caller is a DataFrame, a DataFrame is returned; if the caller is a Series, a Series is returned.
  • Number of Rows Returned: At most n rows are returned. If the dataset contains fewer than n rows, all rows are returned.

Examples

Let’s comprehensively master the usage of head() through a series of examples.

Example 1: Basic Usage β€” View the First Few Rows of a DataFrame

First, create a simple DataFrame, then use head() to view the first few rows.

Example

import pandas as pd

# Create a sample DataFrame containing student grade data

data = {
    'name': ['Alice','Bob','Charlie','David','Eve','Frank','Grace','Henry','Iris','Jack'],
    'age': [18,19,17,18,20,19,18,17,19,18],
    'score': [85,92,78,90,88,95,82,76,89,91],
    'grade': ['A','A','B','A','B','A','B','C','B','A']
}

df = pd.DataFrame(data)

# Default: return first 5 rows
print("First 5 rows by default:")
print(df.head())

# Specify returning first 3 rows
print("First 3 rows:")
print(df.head(3))

# Return first 7 rows
print("First 7 rows:")
print(df.head(7))

Output:

First 5 rows by default:
    name  age  score grade
0  Alice   18     85     A
1    Bob   19     92     A
2 Charlie   17     78     B
3  David   18     90     A
4    Eve   20     88     B
First 3 rows:
    name  age  score grade
0  Alice   18     85     A
1    Bob   19     92     A
2 Charlie   17     78     B
First 7 rows:
    name  age  score grade
0  Alice   18     85     A
1    Bob   19     92     A
2 Charlie   17     78     B
3  David   18     90     A
4    Eve   20     88     B
5  Frank   19     95     A
6  Grace   18     82     B

Code Explanation:

  1. A DataFrame with 10 rows of student data was created.
  2. When df.head() is called without arguments, it returns the first 5 rows by default β€” the most common usage.
  3. df.head(3) returns the first 3 rows; the number of rows to view can be adjusted as needed.
  4. When the specified n exceeds the total number of rows, all rows are returned without raising an error.

Example 2: View the First Few Elements of a Series

head() is not only applicable to DataFrames but also to Series objects.

Example

import pandas as pd

# Create a Series containing a sequence of numeric values
s = pd.Series([10,20,30,40,50,60,70,80,90,100])

# View first 3 elements of the Series
print("First 3 elements:")
print(s.head(3))

# Create a Series with custom index
s2 = pd.Series([100,200,300,400,500], index=['a','b','c','d','e'])

print("First 2 elements of a Series with custom index:")
print(s2.head(2))

Output:

First 3 elements:
0    10
1    20
2    30
dtype: int64
First 2 elements of a Series with custom index:
a    100
b    200
dtype: int64

Code Explanation:

  • Series also supports the head() method, returning the first n elements.
  • A Series with a custom index also works normally with head(), preserving the original index values.

Example 3: Combine with Other Functions

head() is frequently combined with other DataFrame functions for data exploration and analysis.

Example

import pandas as pd
import numpy as np

# Create a larger DataFrame
np.random.seed(42)  # Set random seed for reproducibility

df = pd.DataFrame({
    'date': pd.date_range('2024-01-01', periods=100),
    'value': np.random.randn(100).round(2),
    'category': np.random.choice(['A','B','C'], 100)
})

# View data type information
print("Data type information:")
print(df.dtypes)
print()

# View first 10 rows
print("First 10 rows:")
print(df.head(10))

# Sort first, then view first few rows
df_sorted = df.sort_values('value', ascending=False)
print("First 5 rows after sorting by 'value' in descending order:")
print(df_sorted.head())

# View statistical summary of the first few rows
print("Statistical summary of the first 5 rows:")
print(df.head().describe())

Output:

Data type information:
date         datetime64
value             float64
category            object
dtype: object
First 10 rows:
        date  value category
0 2024-01-01   0.34        A
1 2024-01-02  -0.23        B
2 2024-01-03   0.54        C
3 2024-01-04  -1.58        B
4 2024-01-05  -0.29        C
5 2024-01-06   0.65        A
6 2024-01-07   0.86        A
7 2024-01-08  -0.37        C
8 2024-01-09   0.11        B
9 2024-01-10   0.22        A
First 5 rows after sorting by 'value' in descending order:
        date  value category
72 2024-03-13   2.87        C
55 2024-02-25   2.32        A
31 2024-02-01   2.21        B
64 2024-03-05   1.71        C
84 2024-03-25   1.55        A
Statistical summary of the first 5 rows:
       age     score
count  5.0  5.000000
mean  18.0  87.200000
std    1.0   5.403000
min   17.0  78.000000
50%   18.0  88.000000
max   20.0  95.000000

Code Explanation:

  1. head() can be combined with sort_values() to sort first, then view the top rows.
  2. The result returned by head() is still a DataFrame, so other DataFrame methods can be called on it.
  3. describe() can compute statistical summaries for the data returned by head().

Notes

  • head() does not modify the original DataFrame or Series; it returns a new object.
  • When n is less than or equal to 0, an empty DataFrame or Series is returned.
  • For large datasets, using head() first to inspect structure is a good practice.
  • The returned data preserves the original index values and does not renumber them.

Tip: In Jupyter Notebook or JupyterLab environments, simply entering a DataFrame variable name and executing it will display the result of head() by default, greatly facilitating data exploration.


Summary

head() is one of the most fundamental and practical functions in Pandas for inspecting data. It allows quick preview of the beginning portion of a dataset, helping us understand key information such as data structure, column names, and data types.

In actual data analysis workflows, we typically follow this β€œfour-step data inspection routine”: first use head() to view the basic structure, then use tail() to inspect the end of the data, and finally use info() and describe() to understand the overall data characteristics. This routine is a basic skill that every data analyst should master.

Image 2: Pandas Common Functions Pandas Common Functions

← Pandas Df LocPandas Df Sort Values β†’