YouTip LogoYouTip

Pandas Index

Pandas Index |

\\n

Index is the core component of Pandas data structures, determining how data is organized and accessed. Pandas supports multiple types of indexes, from simple RangeIndex to complex MultiIndex. This section will detail the usage of various index types.

\\n
\\n

Basic Concepts of Index

\\n

Index is similar to a primary key in a database or row numbers in Excel, used to uniquely identify each row of data. DataFrame has row index (index) and column index (columns).

\\n

Example

import pandas as pd\\n\\n# Create simple Index's DataFrame(Default uses RangeIndexοΌ‰\\ndf = pd.DataFrame({\\n    "Name": ["Zhang San","Li Si","Wang Wu"],\\n    "Age": [25,30,28],\\n    "City": ["Beijing","Shanghai","Guangzhou"]\\n})\\n\\nprint("DataFrame info:")\\nprint(f"lineIndex: {df.index.tolist()}")\\nprint(f"columnIndex: {df.columns.tolist()}")\\nprint("n Data:")\\nprint(df)
\\n
\\n

RangeIndex

\\n

RangeIndex is the default integer index, similar to Python's range(n), starting from 0 and incrementing.

\\n

Creation and Usage

\\n

Example

import pandas as pd\\n\\n# Create RangeIndex\\nidx = pd.RangeIndex(start=0, stop=10, step=1)\\nprint(f"RangeIndex: {idx}")\\nprint(f"Type: {type(idx)}")\\n\\n# DataFrame Default uses RangeIndex\\ndf = pd.DataFrame({"A": [1,2,3]}, index=range(3))\\nprint(f"n Default Index Type: {type(df.index)}")\\n\\n# Also returns RangeIndex after resetting Index\\ndf_reset = df.reset_index()\\nprint(f"Index Type after reset: {type(df_reset.index)}")

Characteristics of RangeIndex

\\n
    \\n
  • Minimum memory usage
  • \\n
  • Supports default integer position access
  • \\n
  • Easy conversion to other index types
  • \\n
\\n
\\n

Index Type Conversion

\\n

Index supports conversion between multiple types.

\\n

Converting to Other Index Types

\\n

Example

import pandas as pd\\n\\n# Create Example DataFrame\\ndf = pd.DataFrame({"Value": [1,2,3,4]}, index=[10,20,30,40])\\n\\n# Convert to Index type\\nprint("Original Index:", df.index)\\nprint("IndexType:",type(df.index))\\n\\n# Convert to column/table\\nidx_list = df.index.tolist()\\nprint(f"Convert to column/table: {idx_list}")\\n\\n# Convert to NumPy array\\nidx_array = df.index.values\\nprint(f"Convert to array: {idx_array}")\\n\\n# Reset Index to RangeIndex\\ndf = df.reset_index(drop=True)\\nprint(f"Reset to RangeIndex: {df.index}")

\\n

Setting Custom Index

\\n

You can use DataFrame columns to set the row index.

\\n

Using set_index

\\n

Example

import pandas as pd\\n\\n# Create DataFrame\\ndf = pd.DataFrame({\\n    "Student ID": ["S001","S002","S003","S004"],\\n    "Name": ["Zhang San","Li Si","Wang Wu","Zhao Liu"],\\n    "Score": [85,92,78,90]\\n})\\n\\nprint("Original data: ")\\nprint(df)\\nprint()\\n\\n# Set"Student ID"columnSet as Index\\ndf1 = df.set_index("Student ID")\\nprint("Set Student ID as Index:")\\nprint(df1)\\nprint()\\n\\n# Set Multiple Indexes (Creates MultiIndex)\\ndf2 = df.set_index(["Student ID","Name"])\\nprint("Set Multiple Indexes:")\\nprint(df2)

Using index Parameter When Creating

\\n

Example

import pandas as pd\\n\\n# Specify Index directly when creating DataFrame\\ndf = pd.DataFrame(\\n    {"Name": ["Zhang San","Li Si","Wang Wu"],"Age": [25,30,28]},\\n    index=["A001","A002","A003"]\\n)\\n\\nprint(df)\\nprint(f"n Index: {df.index.tolist()}")\\n\\n# Use DatetimeIndex\\ndates = pd.date_range("2024-01-01", periods=3, freq="D")\\ndf_date = pd.DataFrame({"Value": [100,200,300]}, index=dates)\\n\\nprint("n Use Date Index:")\\nprint(df_date)\\nprint(f"IndexType: {type(df_date.index)}")

\\n

Index Operations

\\n

Resetting Index

\\n

Example

import pandas as pd\\n\\n# Create with custom Index's DataFrame\\ndf = pd.DataFrame(\\n    {"Name": ["Zhang San","Li Si"],"Score": [85,92]},\\n    index=["A001","A002"]\\n)\\n\\n# Reset Index to default's RangeIndex\\ndf_reset = df.reset_index()\\nprint("Reset Index:")\\nprint(df_reset)\\n\\n# drop Parameter: Whether to drop the original index column\\ndf_reset2 = df.reset_index(drop=True)\\nprint("n Reset and Drop Original Index:")\\nprint(df_reset2)
\\n

Reindexing

\\n

Examples

\\n
import pandas as pd\\n\\n# Create DataFrame\\ndf = pd.DataFrame({"A": [1,2,3],"B": [4,5,6]}, index=[1,2,3])\\n\\n# Reindex (Change Index Order):\\ndf_reindex = df.reindex([1,2,3,4,5])\\nprint("Reindex (fill missing with NaN):")\\nprint(df_reindex)\\n\\n# Use fill_value to fill missing values\\ndf_reindex2 = df.reindex([1,2,3,4,5], fill_value=0)\\nprint("n Reindex (fill with 0):")\\nprint(df_reindex2)
\\n

Index Level Operations

\\n

Examples

\\n
import pandas as pd\\n\\n# Create a MultiIndex DataFrame\\ndf = pd.DataFrame({\\n    "Chinese": [85,92,78],\\n    "Math": [90,88,95]\\n}, index=pd.MultiIndex.from_tuples(\\n    [("Grade 10","AClass"),("Grade 10","BClass"),("Grade 11","AClass")],\\n    names=["Grade","Class"]\\n))\\n\\nprint("Multi-Level Index DataFrame:")\\nprint(df)\\nprint()\\n\\n# Get Outer Index\\nprint(f"Outer Index (Grade): {df.index.get_level_values(0).tolist()}")\\nprint(f"Inner Index (Class): {df.index.get_level_values(1).tolist()}")
\\n
\\n

Index Attributes and Methods

\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
Attribute/MethodDescriptionExample
index.tolist()Convert to Python listdf.index.tolist()
index.valuesConvert to NumPy arraydf.index.values
index.unique()Get unique valuesdf.index.unique()
index.is_uniqueCheck if index is uniquedf.index.is_unique
index.astype()Convert index typedf.index.astype(str)
\\n

Index Attribute Usage Examples

\\n

Example

import pandas as pd\\n\\n# Create Example DataFrame\\ndf = pd.DataFrame(\\n    {"Value": [1,2,3,4]},\\n    index=["a","b","c","d"]\\n)\\n\\n# View Index Attributes\\nprint(f"IndexIs unique?: {df.index.is_unique}")\\nprint(f"IndexLength: {len(df.index)}")\\nprint(f"IndexData Type: {df.index.dtype}")\\n\\n# Convert to string type\\nstr_index = df.index.astype(str)\\nprint(f"Type After Conversion: {str_index.dtype}")

\\n

Practical: Using Index to Improve Query Efficiency

\\n

In actual data analysis, properly setting indexes can significantly improve query efficiency.

\\n

Example

import pandas as pd\\n\\n# Simulated business data\\ndf = pd.DataFrame({\\n    "Order ID": range(1000),\\n    "Customer ID": [f"C{i%100:03d}"for i in range(1000)],\\n    "Product": [f"Product{i%20}"for i in range(1000)],\\n    "Amount": [round(i * 1.5,2)for i in range(1000)]\\n})\\n\\n# Set frequently queried column as Index\\ndf_indexed = df.set_index(["Customer ID","Product"])\\n\\n# Use Index for fast lookup (similar to database primary key query)\\nresult = df_indexed.loc[("C001","Product1")]\\nprint("Use Index to query a single customer'sProduct:")\\nprint(result)\\n\\n# Group Statistics by Customer ID\\nprint("n Total Spending per Customer:")\\ncustomer_total = df.groupby("Customer ID").sum()\\nprint(customer_total.head(10))
\\n
\\n

Important Notes

\\n

1. Index values must be unique

\\n

If index values are not unique, certain operations (such as loc lookup) will return multiple matching rows.

\\n

2. Indexes can have names

\\n

Naming indexes can improve code readability: df.index.name = "Student ID"

\\n

3. Indexes follow data operations

\\n

Slicing, filtering, and other operations preserve the index. Pay attention to the correspondence between index and data.

\\n
\\n

Index is key to Pandas performance. Setting frequently queried columns as indexes can significantly improve lookup speed, but too many indexes will increase write overhead, requiring careful trade-offs.

\\n
← Pandas DtypePandas Data Export β†’