Pandas Excel
Pandas provides rich Excel file operation functions, helping us conveniently read and write `.xls` and `.xlsx` files, supporting complex operations such as multiple sheets, indexing, and column selection, making it an essential tool in data analysis.
| **Operation** | **Method** | **Description** |
| --- | --- | --- |
| Read Excel file | `pd.read_excel()` | Read Excel file and return DataFrame |
| Write DataFrame to Excel | `DataFrame.to_excel()` | Write DataFrame to Excel file |
| Load Excel file | `pd.ExcelFile()` | Load Excel file and access multiple sheets |
| Write multiple sheets using ExcelWriter | `pd.ExcelWriter()` | Write multiple DataFrames to different sheets in the same Excel file |
### `pd.read_excel() - Read Excel File`
pd.read_excel() method is used to read data from an Excel file and load it as a DataFrame. It supports reading .xls and .xlsx format files.
The syntax is as follows:
pandas.read_excel(io, sheet_name=0, *, header=0, names=None, index_col=None, usecols=None, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, parse_dates=False, date_parser=, date_format=None, thousands=None, decimal='.', comment=None, skipfooter=0, storage_options=None, dtype_backend=, engine_kwargs=None)
Parameter Description:
io: This is a required parameter that specifies the path or file object of the Excel file to read.
sheet_name=0: Specifies the worksheet name or index to read. Default is 0, which means the first worksheet.
header=0: Specifies the row to use as column names. Default is 0, which means the first row.
names=None: A list to specify column names. If provided, it will override the column names in the file.
index_col=None: Specifies the column to use as row index. Can be the column name or number.
usecols=None: Specifies the columns to read. Can be a list of column names or column indices.
dtype=None: Specifies the data type of columns. Can be in dictionary format, with keys as column names and values as data types.
engine=None: Specifies the parsing engine. Default is None, pandas will automatically select.
converters=None: A dictionary of functions for converting data.
true_values=None: Specifies values that should be treated as boolean True.
false_values=None: Specifies values that should be treated as boolean False.
skiprows=None: Specifies the number of rows to skip or a list of rows to skip.
nrows=None: Specifies the number of rows to read.
na_values=None: Specifies values that should be treated as missing values.
keep_default_na=True: Specifies whether to parse default missing values (such as NaN) as NA.
na_filter=True: Specifies whether to convert data to NA.
verbose=False: Specifies whether to output detailed progress information.
parse_dates=False: Specifies whether to parse dates.
date_parser=: Function used to parse dates.
date_format=None: Specifies the date format.
thousands=None: Specifies the thousands separator.
decimal='.': Specifies the decimal character.
comment=None: Specifies the comment character.
skipfooter=0: Specifies the number of rows to skip at the end of the file.
storage_options=None: A dictionary of parameters for cloud storage.
dtype_backend=: Specifies the data type backend.
engine_kwargs=None: An additional dictionary of parameters to pass to the engine.
This article uses tutorial_pandas_data.xlsx as an example, you can download tutorial_pandas_data.xlsx to test.
Example
import pandas as pd
# Read data.xlsx file
df = pd.read_excel('tutorial_pandas_data.xlsx')
# Print the read DataFrame
print(df)
The output is:
Table 1 Unnamed: 1 Unnamed: 2
0 Name Age City
1 Alice 25 New York
2 Bob 30 Los Angeles
3 Charlie 35
YouTip