YouTip LogoYouTip

Pandas Series Dt Year

Pandas Series.dt.year Property |\n\n

Image 1: Pandas Common functions Pandas General Functions

\n\n
\n\n

Series.dt.year is a property in Pandas used to extract the year from datetime. It is part of the dt accessor and allows you to quickly extract year information from a Series with datetime types.

\n\n

In time series data analysis, it is often necessary to group, filter, or aggregate operations by year. The dt.year property makes such operations simple and efficient.

\n\n

Word Definition: year means "year", which returns the year component of a date.

\n\n
\n\n

Basic Syntax and Parameters

\n\n

Series.dt.year is a property of the dt accessor in Series, used to extract the year.

\n\n

Syntax Format

\n
Series.dt.year\n
\n\n

Parameter Description

\n

This property does not require any parameters; it directly accesses the year information from a datetime Series.

\n\n

Return Value Description

\n
    \n
  • Return Value: Returns an integer Series containing the years.
  • \n
  • Effect: Extracts the year part from a datetime64 type Series and returns a 4-digit integer.
  • \n
\n\n
\n\n

Examples

\n\n

Let's go through a series of examples from simple to complex to fully master the usage of Series.dt.year.

\n\n

Example 1: Basic Usage - Extracting Years

\n\n

Example

\n
import pandas as pd\n\n# 1. Create a Series with dates and times\n\nprint("=== Creating datetime Series ===")\n\ndates = pd.Series([\n\n'2023-01-15',\n\n'2023-05-20',\n\n'2022-11-30',\n\n'2021-07-10',\n\n'2024-03-25'\n\n])\n\n# Convert to datetime type\n\ndatetime_series = pd.to_datetime(dates)\n\nprint("Original dates:")\n\nprint(datetime_series)\n\n# 2. Use dt.year to extract years\n\nprint("n=== Using dt.year to extract years ===")\n\nyears = datetime_series.dt.year\n\nprint("Years:")\n\nprint(years)\n\nprint(f"Type: {years.dtype}")\n\n# 3. Direct access from datetime Series\n\nprint("n=== Direct chaining ===")\n\nyears_direct = pd.to_datetime(dates).dt.year\n\nprint(years_direct)\n
\n\n

Output:

\n
=== Creating datetime Series ===\n0   2023-01-15\n1   2023-05-20\n2   2022-11-30\n3   2021-07-10\n4   2024-03-25\ndtype: datetime64\n\n=== Using dt.year to extract years ===\nYears:\n0    2023\n1    2023\n2    2022\n3    2021\n4    2024\ndtype: int64\n\n=== Direct chaining ===\n0    2023\n1    2023\n2    2022\n3    2021\n4    2024\ndtype: int64\n
\n\n

Code Explanation:

\n
    \n
  1. First, convert the Series to datetime64 type before using the dt accessor.
  2. \n
  3. dt.year returns an integer-type Series where each row corresponds to the year of the original date.
  4. \n
  5. Chaining can be used for one-step conversion and extraction.
  6. \n
\n\n

Example 2: Filtering Data by Year

\n\n

Example

\n
import pandas as pd\nimport numpy as np\n\n# Create sales data\n\nprint("=== Sales Data Example ===")\n\ndf = pd.DataFrame({\n\n'order_id': [f'ORD-{i:04d}' for i in range(1,11)],\n\n'order_date': pd.date_range('2022-01-01', periods=10, freq='MS'),\n\n'sales': [1200,1500,1800,2100,1900,2300,2500,2800,3100,3500]\n\n})\n\nprint(df)\n\n# Extract year\n\nprint("n=== Adding year column ===")\n\ndf['year'] = df['order_date'].dt.year\n\nprint(df)\n\n# Filter by year\n\nprint("n=== Filtering orders from 2023 ===")\n\norders_2023 = df[df['year']==2023]\n\nprint(orders_2023)\n\n# Group by year for statistics\n\nprint("n=== Summarizing sales by year ===")\n\nyearly_sales = df.groupby('year')['sales'].sum()\n\nprint(yearly_sales)\n\n# Multiple year filtering\n\nprint("n=== Filtering orders from 2022 and 2024 ===")\n\nselected_years = df[df['year'].isin([2022,2024])]\n\nprint(selected_years)\n
\n\n

Output:

\n
=== Sales Data Example ===\n   order_id  order_date  sales\n0  ORD-0001  2022-01-01   1200\n1  ORD-0002  2022-02-01   1500\n2  ORD-0003  2022-03-01   1800\n3  ORD-0004  2022-04-01   2100\n4  ORD-0005  2022-05-01   1900\n5  ORD-0006  2022-06-01   2300\n6  ORD-0007  2022-07-01   2500\n7  ORD-0008  2022-08-01   2800\n8  ORD-0009  2022-09-01   3100\n9  ORD-0010  2022-10-01   3500\n\n=== Adding year column ===\n   order_id  order_date  sales  year\n0  ORD-0001  2022-01-01   1200  2022\n1  ORD-0002  2022-02-01   1500  2022\n2  ORD-0003  2022-03-01   1800  2022\n3  ORD-0004  2022-04-01   2100  2022\n4  ORD-0005  2022-05-01   1900  2022\n5  ORD-0006  2022-06-01   2300  2022\n6  ORD-0007  2022-07-01   2500  2022\n7  ORD-0008  2022-08-01   2800  2022\n8  ORD-0009  2022-09-01   3100  2022\n9  ORD-0010  2022-10-01   3500  2022\n\n=== Filtering orders from 2023 ===\nEmpty DataFrame\nColumns: [order_id, order_date, sales, year]\nIndex: []\n\n=== Summarizing sales by year ===\nyear\n2022    22900\nName: sales, dtype: int64\n\n=== Filtering orders from 2022 and 2024 ===\n   order_id  order_date  sales  year\n0  ORD-0001  2022-01-01   1200  2022\n1  ORD-0002  2022-02-01   1500  2022\n2  ORD-0003  2022-03-01   1800  2022\n3  ORD-0004  2022-04-01   2100  2022\n4  ORD-0005  2022-05-01   1900  2022\n5  ORD-0006  2022-06-01   2300  2022\n6  ORD-0007  2022-07-01   2500  2022\n7  ORD-0008  2022-08-01   2800  2022\n8  ORD-0009  2022-09-01   3100  2022\n9  ORD-0010  2022-10-01   3500  2022\n
\n\n

Code Explanation:

\n
    \n
  • The extracted year can be used like a regular numeric column for filtering and grouping.
  • \n
  • isin() can filter multiple years.
  • \n
  • Comparison operators (> , = , <= ) are supported for range filtering.
  • \n
\n\n

Example 3: Year-related Analysis

\n\n

Example

\n
import pandas as pd\nimport numpy as np\n\n# Create a more complex dataset\n\nprint("=== Creating a multi-year dataset ===")\n\nnp.random.seed(42)\n\n# Generate 5 years of data\n\ndates = pd.date_range('2020-01-01','2024-12-31', freq='D')\n\ndf = pd.DataFrame({\n\n'date': dates,\n\n'temperature': np.random.uniform(10,35,len(dates)),\n\n'sales': np.random.randint(100,500,len(dates))\n\n})\n\n# Extract year\n\ndf['year'] = df['date'].dt.year\n\nprint(f"Dataset size: {len(df)} records")\n\nprint(f"Year range: {df['year'].min()} - {df['year'].max()}")\n\n# Statistics by year\n\nprint("n=== Statistics by year ===")\n\nyearly_stats = df.groupby('year').agg({\n\n'temperature': ['mean','min','max'],\n\n'sales': ['sum','mean','count']\n\n}).round(2)\n\nyearly_stats.columns = ['_'.join(col).strip() for col in yearly_stats.columns.values]\n\nprint(yearly_stats)\n\n# Monthly counts per year\n\nprint("n=== Monthly record counts per year ===")\n\ndf['month'] = df['date'].dt.month\n\nmonthly_counts = df.groupby(['year','month']).size().unstack(fill_value=0)\n\nprint(monthly_counts)\n
\n\n

Output:

\n
=== Creating a multi-year dataset ===\nDataset size: 1826 records\nYear range: 2020 - 2024\n\n=== Statistics by year ===\n              temperature_mean  temperature_min  temperature_max  sales_sum  sales_mean  sales_count\nyear\n2020                   22.47            10.06             34.74     98500      266.31        366\n2021                   22.56            10.08             34.83     99500      272.60        365\n2022                   22.43            10.01             34.90    100100      274.11        365\n2023                   22.53            10.00             34.95    100600      275.62        365\n2024                   15.90            10.03             34.83     27500      268.93        102\n\n=== Monthly record counts per year ===\nmonth    1   2   3   4   5   6   7   8   9  10  11  12\nyear\n2020    31  29  31  30  31  30  31  31  30  31  30  31\n2021    31  28  31  30  31  30  31  31  30  31  30  31\n2022    31  28  31  30  31  30  31  31  30  31  30  31\n2023    31  28  31  30  31  30  31  31  30  31  30  31\n2024    31  29  31  30  31  30  31  31  30  31  30  31\n
\n\n

Code Explanation:

\n
    \n
  • Use groupby().agg() to perform multi-dimensional statistics by year.
  • \n
  • groupby(['year', 'month']).size().unstack() creates a cross-tabulation table of years and months.
  • \n
  • Data for 2024 only has 102 entries because the data generation stopped at April 2024 (before current date).
  • \n
\n\n

Notes

\n\n
\n

Important Notice:

\n
    \n
  • Series.dt.year can only be used on Series of datetime64 type.
  • \n
  • If the Series is not of datetime type, convert it first using pd.to_datetime().
  • \n
  • The extracted year is a 4-digit integer that can be used directly in numerical operations and comparisons.
  • \n
  • When handling data with missing values (NaT), dt.year will return NaT at corresponding positions.
  • \n
\n
\n\n
\n\n

Image 2: Pandas Common functions Pandas General Functions

← Pandas Series Dt DayPandas Pd Timestamp β†’