\n\n
Series.dt.year is a property in Pandas used to extract the year from datetime. It is part of the dt accessor and allows you to quickly extract year information from a Series with datetime types.
In time series data analysis, it is often necessary to group, filter, or aggregate operations by year. The dt.year property makes such operations simple and efficient.
Word Definition: year means "year", which returns the year component of a date.
\n\n
Basic Syntax and Parameters
\n\nSeries.dt.year is a property of the dt accessor in Series, used to extract the year.
Syntax Format
\nSeries.dt.year\n\n\nParameter Description
\nThis property does not require any parameters; it directly accesses the year information from a datetime Series.
\n\nReturn Value Description
\n- \n
- Return Value: Returns an integer Series containing the years. \n
- Effect: Extracts the year part from a datetime64 type Series and returns a 4-digit integer. \n
\n\n
Examples
\n\nLet's go through a series of examples from simple to complex to fully master the usage of Series.dt.year.
Example 1: Basic Usage - Extracting Years
\n\nExample
\nimport pandas as pd\n\n# 1. Create a Series with dates and times\n\nprint("=== Creating datetime Series ===")\n\ndates = pd.Series([\n\n'2023-01-15',\n\n'2023-05-20',\n\n'2022-11-30',\n\n'2021-07-10',\n\n'2024-03-25'\n\n])\n\n# Convert to datetime type\n\ndatetime_series = pd.to_datetime(dates)\n\nprint("Original dates:")\n\nprint(datetime_series)\n\n# 2. Use dt.year to extract years\n\nprint("n=== Using dt.year to extract years ===")\n\nyears = datetime_series.dt.year\n\nprint("Years:")\n\nprint(years)\n\nprint(f"Type: {years.dtype}")\n\n# 3. Direct access from datetime Series\n\nprint("n=== Direct chaining ===")\n\nyears_direct = pd.to_datetime(dates).dt.year\n\nprint(years_direct)\n\n\nOutput:
\n=== Creating datetime Series ===\n0 2023-01-15\n1 2023-05-20\n2 2022-11-30\n3 2021-07-10\n4 2024-03-25\ndtype: datetime64\n\n=== Using dt.year to extract years ===\nYears:\n0 2023\n1 2023\n2 2022\n3 2021\n4 2024\ndtype: int64\n\n=== Direct chaining ===\n0 2023\n1 2023\n2 2022\n3 2021\n4 2024\ndtype: int64\n\n\nCode Explanation:
\n- \n
- First, convert the Series to datetime64 type before using the dt accessor. \n
dt.yearreturns an integer-type Series where each row corresponds to the year of the original date. \n- Chaining can be used for one-step conversion and extraction. \n
Example 2: Filtering Data by Year
\n\nExample
\nimport pandas as pd\nimport numpy as np\n\n# Create sales data\n\nprint("=== Sales Data Example ===")\n\ndf = pd.DataFrame({\n\n'order_id': [f'ORD-{i:04d}' for i in range(1,11)],\n\n'order_date': pd.date_range('2022-01-01', periods=10, freq='MS'),\n\n'sales': [1200,1500,1800,2100,1900,2300,2500,2800,3100,3500]\n\n})\n\nprint(df)\n\n# Extract year\n\nprint("n=== Adding year column ===")\n\ndf['year'] = df['order_date'].dt.year\n\nprint(df)\n\n# Filter by year\n\nprint("n=== Filtering orders from 2023 ===")\n\norders_2023 = df[df['year']==2023]\n\nprint(orders_2023)\n\n# Group by year for statistics\n\nprint("n=== Summarizing sales by year ===")\n\nyearly_sales = df.groupby('year')['sales'].sum()\n\nprint(yearly_sales)\n\n# Multiple year filtering\n\nprint("n=== Filtering orders from 2022 and 2024 ===")\n\nselected_years = df[df['year'].isin([2022,2024])]\n\nprint(selected_years)\n\n\nOutput:
\n=== Sales Data Example ===\n order_id order_date sales\n0 ORD-0001 2022-01-01 1200\n1 ORD-0002 2022-02-01 1500\n2 ORD-0003 2022-03-01 1800\n3 ORD-0004 2022-04-01 2100\n4 ORD-0005 2022-05-01 1900\n5 ORD-0006 2022-06-01 2300\n6 ORD-0007 2022-07-01 2500\n7 ORD-0008 2022-08-01 2800\n8 ORD-0009 2022-09-01 3100\n9 ORD-0010 2022-10-01 3500\n\n=== Adding year column ===\n order_id order_date sales year\n0 ORD-0001 2022-01-01 1200 2022\n1 ORD-0002 2022-02-01 1500 2022\n2 ORD-0003 2022-03-01 1800 2022\n3 ORD-0004 2022-04-01 2100 2022\n4 ORD-0005 2022-05-01 1900 2022\n5 ORD-0006 2022-06-01 2300 2022\n6 ORD-0007 2022-07-01 2500 2022\n7 ORD-0008 2022-08-01 2800 2022\n8 ORD-0009 2022-09-01 3100 2022\n9 ORD-0010 2022-10-01 3500 2022\n\n=== Filtering orders from 2023 ===\nEmpty DataFrame\nColumns: [order_id, order_date, sales, year]\nIndex: []\n\n=== Summarizing sales by year ===\nyear\n2022 22900\nName: sales, dtype: int64\n\n=== Filtering orders from 2022 and 2024 ===\n order_id order_date sales year\n0 ORD-0001 2022-01-01 1200 2022\n1 ORD-0002 2022-02-01 1500 2022\n2 ORD-0003 2022-03-01 1800 2022\n3 ORD-0004 2022-04-01 2100 2022\n4 ORD-0005 2022-05-01 1900 2022\n5 ORD-0006 2022-06-01 2300 2022\n6 ORD-0007 2022-07-01 2500 2022\n7 ORD-0008 2022-08-01 2800 2022\n8 ORD-0009 2022-09-01 3100 2022\n9 ORD-0010 2022-10-01 3500 2022\n\n\nCode Explanation:
\n- \n
- The extracted year can be used like a regular numeric column for filtering and grouping. \n
isin()can filter multiple years. \n- Comparison operators (> , = , <= ) are supported for range filtering. \n
Example 3: Year-related Analysis
\n\nExample
\nimport pandas as pd\nimport numpy as np\n\n# Create a more complex dataset\n\nprint("=== Creating a multi-year dataset ===")\n\nnp.random.seed(42)\n\n# Generate 5 years of data\n\ndates = pd.date_range('2020-01-01','2024-12-31', freq='D')\n\ndf = pd.DataFrame({\n\n'date': dates,\n\n'temperature': np.random.uniform(10,35,len(dates)),\n\n'sales': np.random.randint(100,500,len(dates))\n\n})\n\n# Extract year\n\ndf['year'] = df['date'].dt.year\n\nprint(f"Dataset size: {len(df)} records")\n\nprint(f"Year range: {df['year'].min()} - {df['year'].max()}")\n\n# Statistics by year\n\nprint("n=== Statistics by year ===")\n\nyearly_stats = df.groupby('year').agg({\n\n'temperature': ['mean','min','max'],\n\n'sales': ['sum','mean','count']\n\n}).round(2)\n\nyearly_stats.columns = ['_'.join(col).strip() for col in yearly_stats.columns.values]\n\nprint(yearly_stats)\n\n# Monthly counts per year\n\nprint("n=== Monthly record counts per year ===")\n\ndf['month'] = df['date'].dt.month\n\nmonthly_counts = df.groupby(['year','month']).size().unstack(fill_value=0)\n\nprint(monthly_counts)\n\n\nOutput:
\n=== Creating a multi-year dataset ===\nDataset size: 1826 records\nYear range: 2020 - 2024\n\n=== Statistics by year ===\n temperature_mean temperature_min temperature_max sales_sum sales_mean sales_count\nyear\n2020 22.47 10.06 34.74 98500 266.31 366\n2021 22.56 10.08 34.83 99500 272.60 365\n2022 22.43 10.01 34.90 100100 274.11 365\n2023 22.53 10.00 34.95 100600 275.62 365\n2024 15.90 10.03 34.83 27500 268.93 102\n\n=== Monthly record counts per year ===\nmonth 1 2 3 4 5 6 7 8 9 10 11 12\nyear\n2020 31 29 31 30 31 30 31 31 30 31 30 31\n2021 31 28 31 30 31 30 31 31 30 31 30 31\n2022 31 28 31 30 31 30 31 31 30 31 30 31\n2023 31 28 31 30 31 30 31 31 30 31 30 31\n2024 31 29 31 30 31 30 31 31 30 31 30 31\n\n\nCode Explanation:
\n- \n
- Use
groupby().agg()to perform multi-dimensional statistics by year. \n groupby(['year', 'month']).size().unstack()creates a cross-tabulation table of years and months. \n- Data for 2024 only has 102 entries because the data generation stopped at April 2024 (before current date). \n
Notes
\n\n\n\n\nImportant Notice:
\n\n
\n- \n
Series.dt.yearcan only be used on Series of datetime64 type.- If the Series is not of datetime type, convert it first using
\npd.to_datetime().- The extracted year is a 4-digit integer that can be used directly in numerical operations and comparisons.
\n- When handling data with missing values (NaT),
\ndt.yearwill return NaT at corresponding positions.
\n\n
YouTip