YouTip LogoYouTip

Pandas Series Str Len

[![Image 1: Pandas Common Functions](#) Pandas Common Functions](#) * * * `Series.str.len()` is a function in Pandas used to calculate string length. In data processing, we often need to know the length information of strings, such as validating input length, filtering texts of specific lengths, or analyzing the distribution of text lengths. The `len()` function can return the number of characters in each string element. **Word Definition**: `len` is an abbreviation for "length", indicating the number of characters in a string. * * * ## Basic Syntax and Parameters `str.len()` is a string accessor method of Series, so you first need a Series containing strings, then call it through the `.str` accessor. ### Syntax Format Series.str.len() ### Parameter Description * **Parameter**: No parameters required. This function does not require any arguments and can be called directly. ### Function Description * **Return Value**: Returns an integer Series representing the character count of each string. * **Effect**: Calculates the number of characters in each string element of the Series (including spaces and special characters). * **Note**: For non-string elements, it returns the length of their string representation. * * * ## Examples Let's go through a series of examples from simple to complex to fully master the usage of `str.len()`. ### Example 1: Basic Usage - Calculating String Length ## Example import pandas as pd # Create a Series with strings of different lengths s = pd.Series(['hello','tutorial','python','ai','data']) # Use str.len() to calculate the length of each string result = s.str.len() print("Original Series:") print(s) print("nString Length:") print(result) **Output Result:** Original Series:0 hello 1 tutorial 2 python 3 ai 4 data dtype: objectString Length:0 51 62 63 24 4 **Code Explanation:** 1. `s.str.len()` calculates the number of characters in each string. 2. 'hello' has 5 characters, returning 5. 3. 'tutorial' has 6 characters, returning 6. 4. Empty strings return a length of 0. ### Example 2: String Length Including Spaces `len()` counts all characters, including spaces. ## Example import pandas as pd # Create a Series with spaces s = pd.Series(['hello world','tutorial python','a b c',' space ']) # Calculate string lengths result = s.str.len() print("Original Series:") print(s) print("nString Length (including spaces):") print(result) **Output Result:** Original Series:0 hello world 1 tutorial python 2 a b c 3 space dtype: objectString Length (including spaces):0 111 132 53 8 **Code Explanation:** * 'hello world' has 11 characters (including the space in the middle). * ' space ' has 8 characters (including two spaces before and after). * Spaces are counted as characters. ### Example 3: Filtering Strings of Specific Length `len()` is often combined with boolean indexing to filter data. ## Example import pandas as pd # Create a product name Series products = pd.Series(['iPhone','Samsung Galaxy','MacBook Pro','iPad','Dell XPS']) # Filter products with names longer than 8 characters long_names = products[products.str.len()>8] # Filter products with names less than or equal to 5 characters short_names = products[products.str.len()<=5] print("All Products:") print(products) print("nProducts with names longer than 8:") print(long_names) print("nProducts with names less than or equal to 5:") print(short_names) **Output Result:** All Products:0 iPhone 1 Samsung Galaxy2 MacBook Pro3 iPad 4 Dell XPS dtype: objectProducts with names longer than 8:1 Samsung Galaxy2 MacBook Pro4 Dell XPS dtype: objectProducts with names less than or equal to 5:0 iPhone 3 iPad dtype: object **Code Explanation:** * `products.str.len() > 8` returns a Boolean Series indicating whether the length is greater than 8. * Boolean indexing is used to filter product names that meet the criteria. * This is a common operation in data filtering. ### Example 4: Statistical Distribution of Lengths `len()` can be used for statistical analysis. ## Example import pandas as pd # Create a Series with sentence lengths sentences = pd.Series([ 'Hello', 'Hello world', 'Pandas is powerful', 'Data science with machine learning', 'AI' ]) # Calculate the length of each sentence lengths = sentences.str.len() print("Sentence List:") print(sentences) print("nCharacter Length of Each Sentence:") print(lengths) print("nStatistical Information:") print(f"Shortest Length: {lengths.min()}") print(f"Longest Length: {lengths.max()}") print(f"Average Length: {lengths.mean():.2f}") print(f"Total Length: {lengths.sum()}") **Output Result:** Sentence List:0 Hello1 Hello world 2 Pandas is powerful 3 Data science with machine learning 4 AI dtype: objectCharacter Length of Each Sentence:0 51 112 183 294 2 dtype: int64 Statistical Information:Shortest Length: 2Longest Length: 29Average Length: 13.00Total Length: 65 **Code Explanation:** * `lengths.min()` returns the shortest string length. * `lengths.max()` returns the longest string length. * `lengths.mean()` returns the average length. * `lengths.sum()` returns the total number of characters. ### Example 5: Handling Mixed Data Types `len()` also calculates the string representation length of non-string elements like numbers. ## Example import pandas as pd import numpy as np # Create a Series with mixed types s = pd.Series(['hello',12345,'tutorial', np.nan,'py']) # Calculate lengths result = s.str.len() print("Original Series:") print(s) print("nString Length:") print(result) **Output Result:** Original Series:0 hello 1 123452 tutorial 3 NaN4 py dtype: objectString Length:0 5.01 5.02 6.03 NaN4 2.0 **Code Explanation:** * Number 12345 is converted to string '12345', with a length of 5. * NaN values return NaN (no error). * The result is of float type because NaN is a float. ### Example 6: Combining with split to Count Words You can combine `len()` with `split()` to count words. ## Example import pandas as pd # Create a Series with sentences sentences = pd.Series([ 'hello world', 'tutorial python tutorial', 'pandas data analysis', 'machine learning' ]) # Split first, then calculate length to get word count word_counts = sentences.str.split().str.len() print("Sentence List:") print(sentences) print("nWord Counts:") print(word_counts) **Output Result:** Sentence List:0 hello world 1 tutorial python tutorial 2 pandas data analysis 3 machine learning dtype: objectWord Counts:0 21 32 33 2 dtype: int64 **Code Explanation:** * `str.split()` splits the sentence into a list of words. * `.str.len()` calculates the length of the list (i.e., the number of words). * This is very useful when counting words in text. * * * ## Notes * `str.len()` calculates character count, not byte count. * For Chinese characters, each character counts as one character. * Special characters like spaces and tabs are also included in the count. * If the Series contains NaN values, it returns NaN (without error). * For non-string elements (like numbers), they are first converted to strings before calculating the length. * This function returns a new Series without modifying the original data. * * Pandas Common Functions](#)
← Pandas Pd TimestampPandas Series Str Contains β†’