Pandas Series Str Split
[ Pandas Common Functions](#)\\n\\n* * *\\n\\n`Series.str.split()` is a function in Pandas used to split strings.\\n\\nIn the data processing workflow, we often need to split a string into multiple parts, such as splitting a sentence into words, or splitting comma-separated values into a list. The `split()` function can split a string into multiple parts based on a specified delimiter.\\n\\n**Word Definition**: `split` means "to separate or divide", indicating the splitting of a string into multiple parts.\\n\\n* * *\\n\\n## Basic Syntax and Parameters\\n\\n`str.split()` is a string accessor method of Series, so you need to have a Series containing strings first, and then call it via the `.str` accessor.\\n\\n### Syntax Format\\n\\nSeries.str.split(pat=None, n=-1, expand=False)\\n### Parameter Description\\n\\n| Parameter | Type | Required | Description | Default Value |\\n| --- | --- | --- | --- | --- |\\n| pat | str | Optional | Delimiter, can be a string or a regular expression. Defaults to whitespace characters. | None (whitespace characters) |\\n| n | int | Optional | Number of splits. -1 means no limit, splitting into as many parts as possible. | -1 |\\n| expand | bool | Optional | Whether to expand the result into a DataFrame. Defaults to False (returns Series of lists). | False |\\n\\n### Function Description\\n\\n* **Return Value**: By default, returns a Series containing lists, where each element in the list is a split substring. When `expand=True`, it returns a DataFrame.\\n* **Effect**: Splits each string into multiple parts based on the specified delimiter.\\n* **Note**: Uses whitespace characters as the delimiter by default, but a custom delimiter can be specified.\\n\\n* * *\\n\\n## Examples\\n\\nLet's thoroughly master the usage of `str.split()` through a series of examples ranging from simple to complex.\\n\\n### Example 1: Basic Usage - Split by Whitespace Character\\n\\n## Instance\\n\\nimport pandas as pd\\n\\n# Create a Series containing sentences\\n\\n s = pd.Series(['hello world','tutorial python','pandas data analysis'])\\n\\n# Split by whitespace (default behavior)\\n\\n result = s.str.split()\\n\\nprint("Original Series:")\\n\\nprint(s)\\n\\nprint("nResult after splitting:")\\n\\nprint(result)\\n\\n**Output Result:**\\n\\nOriginal Series:0 hello world 1 tutorial python 2 pandas data analysis dtype: objectResult after splitting:0 [hello, world]1 [tutorial, python]2 [pandas, data, analysis]\\n**Code Analysis:**\\n\\n1. `s.str.split()` splits the string by whitespace characters (spaces, tabs, etc.) by default.\\n2. It returns a Series containing lists, where each list contains the split substrings.\\n3. 'pandas data analysis' is split into three parts.\\n\\n### Example 2: Split by Specified Delimiter\\n\\nYou can specify a custom delimiter via the `pat` parameter.\\n\\n## Instance\\n\\nimport pandas as pd\\n\\n# Create a Series containing comma-separated values\\n\\n s = pd.Series(['apple,banana,orange','dog,cat,bird','red,green,blue'])\\n\\n# Split by comma\\n\\n result = s.str.split(',')\\n\\nprint("Original Series:")\\n\\nprint(s)\\n\\nprint("nby commaResult after splitting:")\\n\\nprint(result)\\n\\n**Output Result:**\\n\\nOriginal Series:0 apple,banana,orange 1 dog,cat,bird 2 red,green,blue dtype: objectResult after splitting:0 [apple, banana, orange]1 [dog, cat, bird]2 [red, green, blue]\\n**Code Analysis:**\\n\\n* `s.str.split(',')` splits each string by comma.\\n* This is a common method for processing CSV data or comma-separated values.\\n\\n### Example 3: Limit the Number of Splits\\n\\nYou can limit the number of splits using the `n` parameter.\\n\\n## Instance\\n\\nimport pandas as pd\\n\\ns = pd.Series(['a,b,c,d,e','1,2,3,4,5'])\\n\\n# Limit to 2 parts\\n\\n result_2 = s.str.split(',', n=2)\\n\\n# Limit to 1 part (i.e., split only once)\\n\\n result_1 = s.str.split(',', n=1)\\n\\nprint("Original Series:")\\n\\nprint(s)\\n\\nprint("nLimit split to 2 parts:")\\n\\nprint(result_2)\\n\\nprint("nLimit split to 1 part:")\\n\\nprint(result_1)\\n\\n**Output Result:**\\n\\nOriginal Series:0 a,b,c,d,e 1 1,2,3,4,5 dtype: objectLimit split to 2 parts:0 [a, b, c,d,e]1 [1, 2, 3,4,5]Limit split to 1 part:0 [a, b,c,d,e]1 [1, 2,3,4,5]\\n**Code Analysis:**\\n\\n* `n=2` means splitting into a maximum of 2 parts, with the last part containing all the remaining content.\\n* `n=1` means splitting only at the first delimiter, resulting in 2 parts.\\n\\n### Example 4: Expand into DataFrame\\n\\nWhen `expand=True`, the split results will be expanded into a DataFrame.\\n\\n## Instance\\n\\nimport pandas as pd\\n\\n# Create a Series containing comma-separated values\\n\\n s = pd.Series(['apple,banana,orange','dog,cat,bird','red,green,blue'])\\n\\n# Expand into a DataFrame\\n\\n result = s.str.split(',', expand=True)\\n\\nprint("Original Series:")\\n\\nprint(s)\\n\\nprint("nExpand into a DataFrame:")\\n\\nprint(result)\\n\\nprint("nDataFrame Type:",type(result))\\n\\n**Output Result:**\\n\\nOriginal Series:0 apple,banana,orange 1 dog,cat,bird 2 red,green,blue dtype: objectExpand into a DataFrame: 0 1 20 apple banana orange 1 dog cat bird 2 red green blue DataFrame Type: <class 'pandas.core.frame.DataFrame'>\\n**Code Analysis:**\\n\\n* `expand=True` expands the split results into a DataFrame.\\n* Each column represents a position after the split.\\n* This is very useful when processing structured data.\\n\\n### Example 5: Split Using Regular Expressions\\n\\n`split()` also supports using regular expressions as delimiters.\\n\\n## Instance\\n\\nimport pandas as pd\\n\\n# Create a Series containing different delimiters\\n\\n s = pd.Series(['hello-world','tutorial_python','pandas#tutorial'])\\n\\n# Split by regular expressionοΌmatch -γ_γ# Any of them)\\n\\n result = s.str.split(r'[-_#]')\\n\\nprint("Original Series:")\\n\\nprint(s)\\n\\nprint("nSplit by regular expression:")\\n\\nprint(result)\\n\\n**Output Result:**\\n\\nOriginal Series:0 hello-world 1 tutorial_python 2 pandas#tutorial dtype: objectSplit by regular expression:0 [hello, world]1 [tutorial, python]2 [pandas, tutorial]\\n**Code Analysis:**\\n\\n* `r'[-_#]'` is a regular expression that matches any one of the characters '-', '_', or '#'.\\n* It can handle multiple different delimiters simultaneously.\\n\\n### Example 6: Processing Real-World Data\\n\\nIn practical applications, `split()` is often used in combination with other functions.\\n\\n## Instance\\n\\nimport pandas as pd\\n\\n# Simulated data extracted from logs\\n\\n logs = pd.Series([\\n\\n'2024-01-01 10:30:45 ERROR Connection failed',\\n\\n'2024-01-01 10:31:12 INFO User logged in',\\n\\n'2024-01-01 10:32:00 WARNING Memory usage high'\\n\\n])\\n\\n# Split log content\\n\\n log_parts = logs.str.split(r's+', expand=True)\\n\\nprint("Original log:")\\n\\nprint(logs)\\n\\nprint("nDataFrame after splitting:")\\n\\nprint(log_parts)\\n\\n# Rename columns\\n\\n log_parts.columns=['timestamp','level','message']\\n\\nprint("nResult after renaming:")\\n\\nprint(log_parts)\\n\\n**Output Result:**\\n\\nOriginal log:0 2024-01-01 10:30:45 ERROR Connection failed 1 2024-01-01 10:31:12 INFO User logged in2 2024-01-01 10:32:00 WARNING Memory usage high dtype: objectDataFrame after splitting: 0 1 20 2024-01-01 10:30:45 ERROR Connection failed 1 2024-01-01 10:31:12 INFO User logged in2 2024-01-01 10:32:00 WARNING Memory usage high Result after renaming: timestamp level message 0 2024-01-01 10:30:45 ERROR Connection failed 1 2024-01-01 10:31:12 INFO User logged in2 2024-01-01 10:32:00 WARNING Memory usage high\\n**Code Analysis:**\\n\\n* `r's+'` matches one or more whitespace characters, splitting the log into multiple parts.\\n* `expand=True` expands the split results into a DataFrame.\\n* By renaming the columns, you can easily access and process each part.\\n\\n* * *\\n\\n## Notes\\n\\n* `str.split()` uses whitespace characters as the delimiter by default.\\n* When `expand=False`, it returns a Series containing lists.\\n* When `expand=True`, it returns a DataFrame, where the number of columns depends on the longest split result.\\n* If the string does not contain the delimiter, the returned list will have only one element (the original
YouTip