-- Learn not just technology, but dreams!\\\\nHome HTML JAVASCRIPT CSS VUE REACT PYTHON3 JAVA C C++ C# AI GO SQL LINUX VS CODE BOOTSTRAP GIT Local Bookmarks \\\\n Pandas Tutorial\\\\nPandas Tutorial\\\\nPandas Introduction\\\\nPandas Installation\\\\nPandas Series\\\\nPandas DataFrame\\\\n\\\\nData Read/Write\\\\nPandas Data Read/Write\\\\nPandas CSV\\\\nPandas Excel\\\\nPandas JSON\\\\nPandas Read SQL\\\\nPandas Read HTML\\\\nPandas Parquet / Feather\\\\nPandas Data Export\\\\nPandas Data Cleaning\\\\nPandas Common Functions\\\\nPandas Correlation Analysis\\\\nPandas Data Sorting and Aggregation\\\\nPandas Data Visualization\\\\nPandas Advanced Features\\\\nPandas Performance Optimization\\\\nPandas Stock Data Analysis\\\\nPandas Index Details\\\\nPandas Multi-level Index\\\\nPandas Data Types\\\\nPandas Category Type\\\\n\\\\nData Processing Core\\\\nPandas Data Selection\\\\nPandas Filtering and Conditional Query\\\\nPandas Missing Value Handling\\\\nPandas Duplicate Data Handling\\\\nPandas String Operations\\\\nPandas Date and Time\\\\nPandas Time Series Analysis\\\\nPandas apply / map / applymap\\\\nPandas Data Merging\\\\nPandas Data Concatenation\\\\nPandas Data Reshaping\\\\nPandas Grouping Operations\\\\nPandas Window Functions\\\\n\\\\nReference Manual\\\\nPandas Common Functions\\\\nPandas Input/Output API\\\\nPandas Series API Manual\\\\nPandas DataFrame API Manual\\\\nPandas Arrays\\\\nPandas Index Objects\\\\nPandas DateOffset Objects\\\\nPandas Quiz\\\\n\\\\nStatistics and Cases\\\\nPandas Descriptive Statistics\\\\nPandas Sampling and Random Data\\\\nPandas Data Binning\\\\nPandas Processing Large Files\\\\nPandas and NumPy Integration\\\\nPandas Visualization\\\\nPandas E-commerce Data\\\\nPandas User Behavior\\\\n Pandas Window Functions\\\\nPandas Input/Output API Manual \\\\nPandas pd.read_html() Function\\\\n\\\\n Pandas Common Functions\\\\n\\\\nread_html() is a function in the pandas library used to parse HTML tables. It can read table data from web pages or HTML files and convert it into DataFrames.\\\\n\\\\nTable data on web pages is a very important data source. Many public data (such as stock information, statistical data, etc.) are presented in the form of HTML tables. read_html() uses the lxml and BeautifulSoup libraries to parse HTML and can automatically extract all tables or specific tables from a page.\\\\n\\\\nBasic Syntax and Parameters\\\\nSyntax Format\\\\npandas.read_html(io, match='.+', flavor=None, header=None, index_col=None,\\\\n skiprows=None, attrs=None, parse_dates=False, thousands=',',\\\\n decimal='.', converters=None, ...)\\\\nParameter Description\\\\nParameter\\\\tType\\\\tDescription\\\\tDefault Value\\\\nio\\\\tstr, path object, file-like object\\\\tHTML file path, URL, or string\\\\tRequired\\\\nmatch\\\\tstr, regex\\\\tUse regular expressions to match text content in the table\\\\t'.+'\\\\nflavor\\\\tstr\\\\tParser: 'lxml', 'html5lib', 'bs4'\\\\tNone\\\\nheader\\\\tint, list of int\\\\tRow number(s) to use as column names\\\\tNone\\\\nindex_col\\\\tint, str\\\\tColumn to use as row index\\\\tNone\\\\nskiprows\\\\tint, list, slice\\\\tNumber of rows or specific rows to skip\\\\tNone\\\\nattrs\\\\tdict\\\\tHTML tag attributes, used to filter tables\\\\tNone\\\\nparse_dates\\\\tbool, list\\\\tWhether to parse date columns\\\\tFalse\\\\nReturn Value\\\\nReturn type: list of DataFrames\\\\nReturns a list of DataFrames. The number of DataFrames returned corresponds to the number of tables on the page.\\\\nIf no matching tables are found, an empty list is returned.\\\\nExamples\\\\n\\\\nThrough the following examples, you will fully master the various uses of read_html().\\\\n\\\\nExample 1: Reading Tables from a Local HTML File\\\\n\\\\nFirst, create an HTML file containing a table, then use read_html() to read it.\\\\n\\\\nInstance\\\\nimport pandas as pd\\\\n\\\\n# Create an IncludeTable HTML file\\\\nhtml_content = '''\\\\n\\\\n Employee Information Table\\\\n\\\\n

Pandas pd-read-html() functions

\\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n

Name	Age	City	Salary
Tom	28	Beijing	8000
Jerry	35	Shanghai	12000
Mike	42	Guangzhou	15000
Lucy	26	Shenzhen	7000

\\\\n\\\\n

DepartmentList

\\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n

Department	Count
Technical Department	50
Sales Department	30

\\\\n\\\\n'''\\\\n\\\\n# Set HTML Write to file\\\\nwith open('tables.html', 'w', encoding='utf-8') as f:\\\\n f.write(html_content)\\\\n\\\\n# Use read_html to read all Tables\\\\n# io: HTML File path (required)\\\\ntables = pd.read_html('tables.html')\\\\n\\\\n# View read results\\\\nprint(f"Found a total of {len(tables)} Tables")\\\\nprint()\\\\n\\\\n# Iterate over all Tables\\\\nfor i, df in enumerate(tables):\\\\n print(f"--- Table {i+1} ---")\\\\n print(df)\\\\n print()\\\\n\\\\nExpected Output:\\\\n\\\\nFound a total of 2 Tables\\\\n\\\\n--- Table 1 ---\\\\n Name Age City Salary\\\\n0 Tom 28 Beijing 8000\\\\n1 Jerry 35 Shanghai 12000\\\\n2 Mike 42 Guangzhou 15000\\\\n3 Lucy 26 Shenzhen 7000\\\\n\\\\n--- Table 2 ---\\\\n Department Count\\\\n0 Technical Department 50\\\\n1 Sales Department 30\\\\n\\\\nCode Analysis:\\\\n\\\\nread_html() returns a list of DataFrames, where each table corresponds to one DataFrame.\\\\nBy default, all tables are read.\\\\nThe first row is automatically recognized as column names (because of the th tags).\\\\nExample 2: Using attrs and match to Filter Tables\\\\n\\\\nWhen a page has multiple tables, you can use attributes or text matching to filter the required tables.\\\\n\\\\nInstance\\\\nimport pandas as pd\\\\n\\\\n# Create an HTML file with attributes\\\\nhtml_with_attrs = '''\\\\n\\\\n \\\\n \\\\n \\\\n \\\\n

name	age
Tom	28
Jerry	35

\\\\n\\\\n \\\\n \\\\n \\\\n \\\\n

product	price
A	100
B	200

\\\\n\\\\n \\\\n \\\\n

Total

\\\\n\\\\n'''\\\\n\\\\nwith open('tables_attrs.html', 'w', encoding='utf-8') as f:\\\\n f.write(html_with_attrs)\\\\n\\\\n# Example 2a: Filter by id attribute using attrs\\\\n# Read id="employees" Table of\\\\ntables_by_id = pd.read_html('tables_attrs.html', attrs={'id': 'employees'})\\\\nprint("Filter by id:")\\\\nprint(tables_by_id)\\\\nprint()\\\\n\\\\n# Example 2b: Filter by class attribute using attrs\\\\n# Read class="data-table" all Tables of\\\\ntables_by_class = pd.read_html('tables_attrs.html', attrs={'class': 'data-table'})\\\\nprint("Filter by class (found {} Tables):".format(len(tables_by_class)))\\\\nfor i, df in enumerate(tables_by_class):\\\\n print(f"Table {i+1}:")\\\\n print(df)\\\\n print()\\\\n\\\\n# Example 2c: use match filterIncludespecificText Table\\\\n# match Use regular expressions to match text in Tables\\\\ntables_by_text = pd.read_html('tables_attrs.html', match='Tom')\\\\nprint("Include 'Tom' Text Table:")\\\\nprint(tables_by_text)\\\\n\\\\nExpected Output:\\\\n\\\\nFilter by id:\\\\n name age\\\\n0 Tom 28\\\\n1

YouTip

Pandas Pd Read Html

Pandas pd-read-html() functions

DepartmentList

📂 Categories