YouTip LogoYouTip

Pandas Loc Iloc

Data selection is one of the most common operations in Pandas. Understanding the differences and use cases of `loc`, `iloc`, and `at` will help you process data more efficiently. * * * ## Difference Between loc and iloc | Feature | loc | iloc | | --- | --- | --- | | Indexing Method | Label-based | Integer-based | | Slicing | Includes end position | Does not include end position | | Single Value | Returns scalar | Returns scalar | | Recommended Scenario | When explicit index labels exist | When selecting by position | ## Examples import pandas as pd # Create sample DataFrame df = pd.DataFrame({ "Name": ["Zhang San","Li Si","Wang Wu","Zhao Liu","Qi Qian"], "Age": [25,30,28,35,22], "City": ["Beijing","Shanghai","Guangzhou","Shenzhen","Hangzhou"] }, index=[1,3,5,7,9])# Note: Index is not continuous print("DataFrame:") print(df) print() # loc: Use label-based indexing (includes end position) print("df.loc[1:5] (label slicing, includes 5):") print(df.loc[1:5]) print() # iloc: Use position-based indexing (does not include end position) print("df.iloc[0:2] (position slicing, does not include 2):") print(df.iloc[0:2]) * * * ## Usage of loc ### Selecting Rows ## Examples import pandas as pd df = pd.DataFrame({ "Name": ["Zhang San","Li Si","Wang Wu","Zhao Liu"], "Age": [25,30,28,35], "City": ["Beijing","Shanghai","Guangzhou","Shenzhen"] }, index=["a","b","c","d"]) # Select single row (returns Series) print("Select one row:") print(df.loc) print() # Select multiple rows print("Select multiple rows:") print(df.loc[["a","c"]]) print() # Slice selection (includes start and end) print("Slice selection:") print(df.loc["a":"c"]) ### Selecting Columns ## Examples import pandas as pd df = pd.DataFrame({ "Name": ["Zhang San","Li Si","Wang Wu"], "Age": [25,30,28], "City": ["Beijing","Shanghai","Guangzhou"] }, index=["a","b","c"]) # Select single column print("Select single column:") print(df.loc[:,"Name"]) print() # Select multiple columns print("Select multiple columns:") print(df.loc[:,["Name","City"]]) print() # Slice selection for columns print("Slice selection for columns:") print(df.loc[:,"Name":"City"]) ### Selecting Specific Rows and Columns (Recommended Approach) ## Examples import pandas as pd df = pd.DataFrame({ "Name": ["Zhang San","Li Si","Wang Wu","Zhao Liu"], "Age": [25,30,28,35], "City": ["Beijing","Shanghai","Guangzhou","Shenzhen"] }, index=["a","b","c","d"]) # Select specific rows and columns print("Select single value:") print(df.loc["a","Name"])# Returns "Zhang San" print(type(df.loc["a","Name"]))# Type is str print() # Select multiple rows and columns print("Select subset:") print(df.loc[["a","c"],["Name","City"]]) print() # Conditional selection print("Rows where age > 28:") print(df.loc[df>28]) * * * ## Usage of iloc ### Selection by Position ## Examples import pandas as pd df = pd.DataFrame({ "Name": ["Zhang San","Li Si","Wang Wu","Zhao Liu"], "Age": [25,30,28,35], "City": ["Beijing","Shanghai","Guangzhou","Shenzhen"] }) # Select row 0 print("Select row 0:") print(df.iloc) print() # Select first 3 rows print("Select first 3 rows:") print(df.iloc[:3]) print() # Select specific rows print("Select specific rows:") print(df.iloc[[0,2,3]]) print() # Negative indexing (from end) print("Select last row:") print(df.iloc) print() print("Select last 3 rows:") print(df.iloc[-3:]) ### Selecting Columns ## Examples import pandas as pd df = pd.DataFrame({ "Name": ["Zhang San","Li Si","Wang Wu"], "Age": [25,30,28], "City": ["Beijing","Shanghai","Guangzhou"] }) # Select column 0 print("Column 0:") print(df.iloc[:,0]) print() # Select columns 1 and 2 print("Select multiple columns:") print(df.iloc[:,[1,2]]) print() # Slice selection print("Slice selection for columns:") print(df.iloc[:,0:2]) * * * ## at and iat (Getting Single Values) `at` and `iat` are accessors specifically designed for getting/setting single values, and are faster than `loc` and `iloc`. ## Examples import pandas as pd import time import numpy as np # Create large DataFrame for performance testing df = pd.DataFrame(np.random.randn(1000,10), columns=[f"col_{i}"for i in range(10)]) # at: Get single value (label-based indexing) print(f"at: {df.at[0, 'col_0']}") # iat: Get single value (position-based indexing) print(f"iat: {df.iat[0, 0]}") # Performance comparison n =10000 start =time.time() for _ in range(n): _ = df.iloc[0,0] print(f"iloc time: {time.time() - start:.4f}s") start =time.time() for _ in range(n): _ = df.iat[0,0] print(f"iat time: {time.time() - start:.4f}s") * * * ## Conditional Selection Using boolean conditions to filter data is one of the most common operations. ## Examples import pandas as pd df = pd.DataFrame({ "Name": ["Zhang San","Li Si","Wang Wu","Zhao Liu"], "Age": [25,30,28,35], "City": ["Beijing","Shanghai","Guangzhou","Beijing"], "Salary": [12000,15000,11000,18000] }) # Single condition print("Employees with age > 28:") print(df[df>28]) print() # Multiple conditions (using & | ~) print("Beijing and salary > 12000:") print(df[(df=="Beijing")&(df>12000)]) print() # isin filtering print("City is Beijing or Shanghai:") print(df[df.isin(["Beijing","Shanghai"])]) print() # String contains print("Name contains 'San':") print(df[df.str.contains("San")]) * * * ## Notes **1. Slicing includes end position** `loc` slicing includes the end position, while `iloc` does not. **2. Non-existent index will raise an error** When using `loc`, if the index does not exist, a KeyError will be raised. You can use `loc` with `reindex` as a workaround. **3. Prefer using `loc`** Because `loc` is more readable and less prone to errors, unless you need to select by position. > `at`/`iat` are the fastest ways to access single values, while `loc`/`iloc` are used for selecting multiple values. Choosing the appropriate method can improve code performance and readability.**
← Pandas Missing DataPandas Dtype β†’