YouTip LogoYouTip

Pandas Pd To Datetime

[![Image 1: Python math module](#) Pandas Common Functions](#) * * * `pd.to_datetime()` is a function in the Pandas library used to **convert data into datetime type**. It can convert data of various formats such as strings and Unix timestamps into Pandas' datetime64 type. Handling date and time is a common task in data analysis. After converting to datetime type, it becomes convenient to perform operations like extracting dates, time calculations, and timezone handling. **Word Definition**: `to_datetime` means "convert to datetime", which converts various time data formats into standard datetime objects. * * * ## Basic Syntax and Parameters `pd.to_datetime()` is a top-level function in the Pandas library used to convert various time data formats into datetime types. ### Syntax Format pd.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=False, format=None, unit='ns') ### Parameter Description | Parameter | Type | Description | | --- | --- | --- | | arg | integer, float, string, datetime, list, Series | Data to be converted to datetime type. | | errors | string | Error handling method: 'raise' (default) throws exception; 'coerce' converts to NaT; 'ignore' returns original data. | | dayfirst | boolean | If True, the day comes before the month (e.g., 01/02/2023 represents February 2nd). | | utc | boolean | If True, returns datetime in UTC timezone. | | format | string | Specifies the date-time format, such as '%Y-%m-%d'. | | unit | string | When arg is numeric, specifies the unit: 'D', 's', 'ms', 'us', 'ns'. | ### Return Value Description * **Return Value**: Returns a Series with datetime64 type. * **Effect**: Converts input data into Pandas' datetime type for convenient date-time operations. * * * ## Examples Let's go through a series of examples from simple to complex to fully master the usage of `pd.to_datetime()`. ### Example 1: Basic Usage - Convert String to Datetime ## Example import pandas as pd # 1. Create a Series of datetime strings dates = pd.Series([ '2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05' ]) print("=== Original Series (Type:", dates.dtype,")===") print(dates) # 2. Use pd.to_datetime() to convert to datetime type result = pd.to_datetime(dates) print("n=== After pd.to_datetime() (Type:", result.dtype,")===") print(result) # 3. Support multiple formats dates_mixed = pd.Series([ '2023-01-01', '2023/02/03', '01.04.2023', '2023Year 05 Month 06 Day' ]) print("n=== Multiple formats auto-parsed ===") print(pd.to_datetime(dates_mixed)) # 4. Include time information dates_with_time = pd.Series([ '2023-01-01 12:30:45', '2023-02-03 08:15:30' ]) result_datetime = pd.to_datetime(dates_with_time) print("n=== Include time information ===") print(result_datetime) **Output:** === Original Series (Type: object )=== 0 2023-01-01 1 2023-01-02 2 2023-01-03 3 2023-01-04 4 2023-01-05 dtype: object === pd.to_datetime() conversion (Type: datetime64 )=== 0 2023-01-01 00:00:00 1 2023-01-02 00:00:00 2 2023-01-03 00:00:00 3 2023-01-04 00:00:00 4 2023-01-05 00:00:00 === Multiple formats auto-parsed === 0 2023-01-01 00:00:00 1 2023-02-03 00:00:00 2 2023-04-01 00:00:00 3 2023-05-06 00:00:00 === Include time information === 0 2023-01-01 12:30:45 1 2023-02-03 08:15:30 **Code Explanation:** 1. `pd.to_datetime()` can automatically recognize many common date-time formats. 2. Even if only the date part is provided, the time part will default to 00:00:00. 3. It can handle date formats with different separators like Chinese characters, dots, and slashes. ### Example 2: Handling Different Date Formats and the dayfirst Parameter For different regional date formats, you can use the `dayfirst` parameter or the `format` parameter. ## Example import pandas as pd # 1. European-style dates (month/day/year) europe_dates = pd.Series(['01/02/2023','02/03/2023','03/04/2023']) print("=== European-style dates (01/02/2023 interpreted as Feb 2nd) ===") print("Default (month/day/year):", pd.to_datetime(europe_dates).dt.day.tolist()) print("dayfirst=True:", pd.to_datetime(europe_dates, dayfirst=True).dt.day.tolist()) # 2. Use format parameter to specify exact format print("n=== Using format parameter ===") result = pd.to_datetime('2023-06-15 14:30:00', format='%Y-%m-%d %H:%M:%S') print(f"Conversion result: {result}") # 3. Complex format parsing complex_dates = pd.Series(['2023Year 12 Month 25 Day','15/08/2023','2023-05-01']) print("n=== Mixed format auto-parsing ===") print(pd.to_datetime(complex_dates, dayfirst=True)) **Output:** === European-style dates (01/02/2023 interpreted as Feb 2nd) === Default: [1, 2, 3] dayfirst=True: [2, 3, 4] === Using format parameter === Conversion result: 2023-06-15 14:30:00 === Mixed format auto-parsing === 0 2023-12-25 00:00:00 1 2023-08-15 00:00:00 2 2023-05-01 00:00:00 **Code Explanation:** * The `format` parameter explicitly tells Pandas how to parse dates, avoiding ambiguity. * Auto-parsing is powerful, but when there's ambiguity (like interpreting 01/02/2023), you need to use either `dayfirst` or `format` parameters to clarify. ### Example 3: Converting Unix Timestamps to Datetime Unix timestamps exported from databases or systems can be converted using the `unit` parameter. ## Example import pandas as pd # 1. Unix timestamps (seconds) timestamps = pd.Series([1672531200,1672617600,1672704000]) print("=== Unix Timestamps (seconds) ===") print(timestamps.values) # 2. Convert to datetime result = pd.to_datetime(timestamps, unit='s') print("n=== After pd.to_datetime(..., unit='s') ===") print(result) # 3. Millisecond Unix timestamps timestamps_ms = pd.Series([1672531200000,1672617600000,1672704000000]) result_ms = pd.to_datetime(timestamps_ms, unit='ms') print("n=== Millisecond timestamps ===") print(result_ms) # 4. Convert from fixed origin print("n=== Calculated from 2023-01-01 ===") days = pd.Series([0,1,2,3,4]) result_origin = pd.to_datetime(days, origin='2023-01-01', unit='D') print(result_origin) **Output:** === Unix Timestamps (seconds) === === After pd.to_datetime(..., unit='s') === 0 2023-01-01 00:00:00 1 2023-01-02 00:00:00 2 2023-01-03 00:00:00 === Millisecond timestamps === 0 2023-01-01 00:00:00 1 2023-01-02 00:00:00 2 2023-01-03 00:00:00 === Calculated from 2023-01-01 === 0 2023-01-01 1 2023-01-02 2 2023-01-03 3 2023-01-04 4 2023-01-05 **Code Explanation:** * `unit='s'` indicates that the numeric values are in seconds (Unix timestamp). * `unit='ms'` indicates millisecond timestamps. * The `origin` parameter can specify a starting time for calculating relative times. ### Example 4: errors Parameter Handling Invalid Dates ## Example import pandas as pd import numpy as np # 1. Series containing invalid dates mixed_dates = pd.Series(['2023-01-01','2023-02-30','invalid','2023-03-15']) print("=== Series with invalid dates ===") print(mixed_dates) # 2. errors='raise' (default) - throw exception print("n=== errors='raise' ===") try: pd.to_datetime(mixed_dates, errors='raise') except Exception as e: print(f"Exception: {type(e).__name__}") # 3. errors='coerce' - convert invalid dates to NaT print("n=== errors='coerce' ===") result = pd.to_datetime(mixed_dates, errors='coerce') print(result) # 4. errors='ignore' - keep original print("n=== errors='ignore' ===") result_ignore = pd.to_datetime(mixed_dates, errors='ignore') print(result_ignore) print(f"Type: {result_ignore.dtype}") **Output:** === Series with invalid dates === 0 2023-01-01 1 2023-02-30 2 invalid 3 2023-03-15 === errors='raise' === Exception: OutOfBoundsDatetime === errors='coerce' === 0 2023-01-01 00:00:00 1 NaT 2 NaT 3 2023-03-15 00:00:00 === errors='ignore' === 0 2023-01-01 1 2023-02-30 2 invalid 3 2023-03-15 dtype: object **Code Explanation:** * `errors='coerce'` converts invalid dates and unparseable values to NaT (Not a Time, equivalent to missing values). * This is very useful when dealing with messy data, ensuring continuous data processing. * * * ## Notes > **Important Note:** > > * If data contains values outside the datetime range, an `OutOfBoundsDatetime` exception will be raised. > * The `format` parameter clearly defines the date format. It is recommended to use it when processing large amounts of data to improve performance. > * The converted datetime objects can use the `.dt` accessor for rich date-time operations. > * Timezone handling requires installing the `pytz` library. * * Pandas Common Functions](#)
← Pandas Pd Value CountsPandas Pd Qcut β†’