YouTip LogoYouTip

Pandas Apply

apply, map and applymap are the three functions in Pandas used for data transformation, which can perform flexible element-wise or batch operations on DataFrames or Series. * * * ## Series.map `map` is a method of Series, used to transform each element in a Series. ### Basic Usage ## Example import pandas as pd # Create Series s = pd.Series([1,2,3,4,5]) print("Original data:") print(s) print() # Use function print("Each element * 2:") print(s.map(lambda x: x * 2)) print() # Use dictionary mapping mapping ={1: "A",2: "B",3: "C",4: "D",5: "E"} print("Use dictionary mapping:") print(s.map(mapping)) print() # Use Series mapping mapping_series = pd.Series(["A","B","C","D","E"], index=[1,2,3,4,5]) print("Use Series mapping:") print(s.map(mapping_series)) ### Handling Missing Values ## Example import pandas as pd import numpy as np s = pd.Series([1,2, np.nan,4,5]) print("Contains NaN:") print(s) print() # map skips NaN by default print("map processing (skip NaN):") print(s.map(lambda x: x * 2 if pd.notna(x)else -1)) * * * ## DataFrame.applymap `applymap` is a method of DataFrame that applies a function to each element (Note: Pandas 2.0+ recommends using `DataFrame.map` instead). ## Example import pandas as pd import numpy as np df = pd.DataFrame({ "A": [1,2,3], "B": [4,5,6], "C": [7,8,9] }) print("Original data:") print(df) print() # Multiply each element by 2 print("Each element * 2:") print(df.applymap(lambda x: x * 2)) print() # Keep 2 decimal places print("Keep 2 decimal places:") print(df.applymap(lambda x: round(x,2))) > applymap operates element-wise, which may be slower for large datasets. If you only need to operate on numeric columns, consider vectorized operations or apply with the axis parameter. * * * ## DataFrame.apply `apply` is the most flexible method, allowing functions to be applied along an axis. ### Apply by Column ## Example import pandas as pd import numpy as np df = pd.DataFrame({ "A": [1,2,3,4,5], "B": [10,20,30,40,50], "C": [100,200,300,400,500] }) print("Original data:") print(df) print() # Default axis=0, apply by column print("Sum of each column:") print(df.apply(sum)) print() print("Maximum of each column:") print(df.apply(max)) ### Apply by Row ## Example import pandas as pd import numpy as np df = pd.DataFrame({ "A": [1,2,3], "B": [10,20,30], "C": [100,200,300] }) print("Original data:") print(df) print() # axis=1, apply by row print("Sum of each row:") print(df.apply(sum, axis=1)) print() # Maximum minus minimum of each row print("Range of each row:") print(df.apply(lambda x: x.max() - x.min(), axis=1)) ### Using aggfunc Aggregation ## Example import pandas as pd import numpy as np df = pd.DataFrame({ "A": [1,2,3], "B": [10,20,30] }) # Apply multiple functions at once print("Sum and mean at the same time:") print(df.apply([sum, np.mean])) print() # Return multiple values result = df.apply(lambda x: pd.Series({ "sum": x.sum(), "mean": x.mean(), "max": x.max() }, index=["sum","mean","max"])) print("Return multiple values:") print(result) * * * ## Series.apply Series can also use apply, which has similar functionality to map but is more flexible. ## Example import pandas as pd import numpy as np s = pd.Series([1,4,9,16,25]) print("Original data:") print(s) print() # Square root print("Square root:") print(s.apply(np.sqrt)) print() # Conditional return print("Conditional judgment:") print(s.apply(lambda x: "large"if x >10 else"small")) * * * ## Performance Comparison ## Example import pandas as pd import numpy as np import time # Create large dataset n =100000 s = pd.Series(np.random.randn(n)) # Test map vs apply func =lambda x: x * 2 + 1 start =time.time() result1 = s.map(func) map_time =time.time() - start start =time.time() result2 = s.apply(func) apply_time =time.time() - start # Vectorization (fastest) start =time.time() result3 = s * 2 + 1 vec_time =time.time() - start print(f"map time: {map_time:.4f}s") print(f"apply time: {apply_time:.4f}s") print(f"vectorization time: {vec_time:.4f}s") print("nConclusion: Prefer vectorized operations for best performance") * * * ## Practical Application: Data Transformation ## Example import pandas as pd import numpy as np # Create sample DataFrame df = pd.DataFrame({ "name": ["Zhang San","Li Si","Wang Wu","Zhao Liu"], "age": [25,30,28,35], "salary": [12000,15000,11000,18000], "department": ["Tech","Sales","Tech","Operations"] }) print("Original data:") print(df) print() # Use apply for row-level calculations def calculate(row): """Calculate annual income and after-tax salary""" annual = row * 12 tax = annual * 0.1 if annual >120000 else annual * 0.05 after_tax = annual - tax return pd.Series({ "annual_income": annual, "tax": tax, "after_tax": after_tax }) result = df.apply(calculate, axis=1) df_result = pd.concat([df, result], axis=1) print("Calculation results:") print(df_result) * * * ## Choosing Between Them | Method | Applicable Objects | Scenarios | Performance | | --- | --- | --- | --- | | `map` | Series | Element-wise conversion, dictionary mapping | Fast | | `applymap` | DataFrame | Element-wise conversion (non-numeric columns) | Slow | | `apply` | Series/DataFrame | Row/column aggregation, custom functions | Medium | > Don't use apply/map when vectorized operations (direct operators) can be used; don't use apply when map can be used.
← Pandas ConcatPandas Datetime β†’