Pandas Apply
apply, map and applymap are the three functions in Pandas used for data transformation, which can perform flexible element-wise or batch operations on DataFrames or Series.
* * *
## Series.map
`map` is a method of Series, used to transform each element in a Series.
### Basic Usage
## Example
import pandas as pd
# Create Series
s = pd.Series([1,2,3,4,5])
print("Original data:")
print(s)
print()
# Use function
print("Each element * 2:")
print(s.map(lambda x: x * 2))
print()
# Use dictionary mapping
mapping ={1: "A",2: "B",3: "C",4: "D",5: "E"}
print("Use dictionary mapping:")
print(s.map(mapping))
print()
# Use Series mapping
mapping_series = pd.Series(["A","B","C","D","E"], index=[1,2,3,4,5])
print("Use Series mapping:")
print(s.map(mapping_series))
### Handling Missing Values
## Example
import pandas as pd
import numpy as np
s = pd.Series([1,2, np.nan,4,5])
print("Contains NaN:")
print(s)
print()
# map skips NaN by default
print("map processing (skip NaN):")
print(s.map(lambda x: x * 2 if pd.notna(x)else -1))
* * *
## DataFrame.applymap
`applymap` is a method of DataFrame that applies a function to each element (Note: Pandas 2.0+ recommends using `DataFrame.map` instead).
## Example
import pandas as pd
import numpy as np
df = pd.DataFrame({
"A": [1,2,3],
"B": [4,5,6],
"C": [7,8,9]
})
print("Original data:")
print(df)
print()
# Multiply each element by 2
print("Each element * 2:")
print(df.applymap(lambda x: x * 2))
print()
# Keep 2 decimal places
print("Keep 2 decimal places:")
print(df.applymap(lambda x: round(x,2)))
> applymap operates element-wise, which may be slower for large datasets. If you only need to operate on numeric columns, consider vectorized operations or apply with the axis parameter.
* * *
## DataFrame.apply
`apply` is the most flexible method, allowing functions to be applied along an axis.
### Apply by Column
## Example
import pandas as pd
import numpy as np
df = pd.DataFrame({
"A": [1,2,3,4,5],
"B": [10,20,30,40,50],
"C": [100,200,300,400,500]
})
print("Original data:")
print(df)
print()
# Default axis=0, apply by column
print("Sum of each column:")
print(df.apply(sum))
print()
print("Maximum of each column:")
print(df.apply(max))
### Apply by Row
## Example
import pandas as pd
import numpy as np
df = pd.DataFrame({
"A": [1,2,3],
"B": [10,20,30],
"C": [100,200,300]
})
print("Original data:")
print(df)
print()
# axis=1, apply by row
print("Sum of each row:")
print(df.apply(sum, axis=1))
print()
# Maximum minus minimum of each row
print("Range of each row:")
print(df.apply(lambda x: x.max() - x.min(), axis=1))
### Using aggfunc Aggregation
## Example
import pandas as pd
import numpy as np
df = pd.DataFrame({
"A": [1,2,3],
"B": [10,20,30]
})
# Apply multiple functions at once
print("Sum and mean at the same time:")
print(df.apply([sum, np.mean]))
print()
# Return multiple values
result = df.apply(lambda x: pd.Series({
"sum": x.sum(),
"mean": x.mean(),
"max": x.max()
}, index=["sum","mean","max"]))
print("Return multiple values:")
print(result)
* * *
## Series.apply
Series can also use apply, which has similar functionality to map but is more flexible.
## Example
import pandas as pd
import numpy as np
s = pd.Series([1,4,9,16,25])
print("Original data:")
print(s)
print()
# Square root
print("Square root:")
print(s.apply(np.sqrt))
print()
# Conditional return
print("Conditional judgment:")
print(s.apply(lambda x: "large"if x >10 else"small"))
* * *
## Performance Comparison
## Example
import pandas as pd
import numpy as np
import time
# Create large dataset
n =100000
s = pd.Series(np.random.randn(n))
# Test map vs apply
func =lambda x: x * 2 + 1
start =time.time()
result1 = s.map(func)
map_time =time.time() - start
start =time.time()
result2 = s.apply(func)
apply_time =time.time() - start
# Vectorization (fastest)
start =time.time()
result3 = s * 2 + 1
vec_time =time.time() - start
print(f"map time: {map_time:.4f}s")
print(f"apply time: {apply_time:.4f}s")
print(f"vectorization time: {vec_time:.4f}s")
print("nConclusion: Prefer vectorized operations for best performance")
* * *
## Practical Application: Data Transformation
## Example
import pandas as pd
import numpy as np
# Create sample DataFrame
df = pd.DataFrame({
"name": ["Zhang San","Li Si","Wang Wu","Zhao Liu"],
"age": [25,30,28,35],
"salary": [12000,15000,11000,18000],
"department": ["Tech","Sales","Tech","Operations"]
})
print("Original data:")
print(df)
print()
# Use apply for row-level calculations
def calculate(row):
"""Calculate annual income and after-tax salary"""
annual = row * 12
tax = annual * 0.1 if annual >120000 else annual * 0.05
after_tax = annual - tax
return pd.Series({
"annual_income": annual,
"tax": tax,
"after_tax": after_tax
})
result = df.apply(calculate, axis=1)
df_result = pd.concat([df, result], axis=1)
print("Calculation results:")
print(df_result)
* * *
## Choosing Between Them
| Method | Applicable Objects | Scenarios | Performance |
| --- | --- | --- | --- |
| `map` | Series | Element-wise conversion, dictionary mapping | Fast |
| `applymap` | DataFrame | Element-wise conversion (non-numeric columns) | Slow |
| `apply` | Series/DataFrame | Row/column aggregation, custom functions | Medium |
> Don't use apply/map when vectorized operations (direct operators) can be used; don't use apply when map can be used.
YouTip