Pandas Series.str.contains() Function |
\n\n\n\n\n\n
Series.str.contains() is a function in Pandas used to check whether strings contain a specified substring.
In data processing, we often need to filter and sort based on text content, such as finding records containing specific keywords or filtering texts that match certain patterns. The contains() function can check if each string element contains a specified substring or regular expression pattern.
Word Definition: contains means "to include", indicating checking if a string contains specified content.
\n\n
Basic Syntax and Parameters
\n\nstr.contains() is a string accessor method of Series, so you first need a Series containing strings, then call it via the .str accessor.
Syntax Format
\nSeries.str.contains(pat, case=True, regex=True, na=None)\n\nParameter Description
\n\n| Parameter | \nType | \nRequired | \nDescription | \nDefault Value | \n
|---|---|---|---|---|
| pat | \nstr | \nRequired | \nThe pattern to search for, which can be a normal string or a regular expression. | \n- | \n
| case | \nbool | \nOptional | \nWhether to distinguish between uppercase and lowercase letters. Default is True (case-sensitive). | \nTrue | \n
| regex | \nbool | \nOptional | \nWhether to treat the pat parameter as a regular expression. Default is True. | \nTrue | \n
| na | \nobject | \nOptional | \nThe value returned when an element is NaN. Default is None (returns NaN). | \nNone | \n
Function Description
\n\n- \n
- Return Value: Returns a boolean Series indicating whether each element contains the specified pattern. \n
- Effect: Checks each string element in the Series and returns True or False. \n
- Note: By default, it uses regular expression matching, which can match more complex patterns. \n
\n\n
Examples
\n\nLet's go through a series of examples from simple to complex to fully master the usage of str.contains().
Example 1: Basic Usage - Check if Substring is Included
\n\nExample
\nimport pandas as pd\n\n# Create a Series with text\n\ns = pd.Series(['apple','banana','grape','pineapple','orange'])\n\n# Check if it contains 'ap'\n\nresult = s.str.contains('ap')\n\nprint("Original Series:")\nprint(s)\nprint("nContains 'ap':")\nprint(result)\n\n\nOutput Result:
\nOriginal Series:\n0 apple\n1 banana\n2 grape\n3 pineapple\n4 orange\ndtype: object\nContains 'ap':\n0 True\n1 False\n2 True\n3 True\n4 False\ndtype: bool\n\n\nCode Explanation:
\n- \n
s.str.contains('ap')checks if each string contains the substring 'ap'. \n- 'apple' contains 'ap', returns True. \n
- 'banana' does not contain 'ap', returns False. \n
- 'grape' contains 'ap', returns True. \n
- 'pineapple' contains 'ap', returns True. \n
Example 2: Using Regular Expressions for Matching
\n\ncontains() defaults to using regular expressions, which can match more complex patterns.
Example
\nimport pandas as pd\n\n# Create a Series with text\n\ns = pd.Series(['hello123','world456','tutorial789','python','test123'])\n\n# Match strings starting with a letter followed by digits using regex\n\nresult = s.str.contains(r'^+d+$')\n\nprint("Original Series:")\nprint(s)\nprint("nPattern matching letters followed by digits:")\nprint(result)\n\n\nOutput Result:
\nOriginal Series:\n0 hello123\n1 world456\n2 tutorial789\n3 python\n4 test123\ndtype: object\nPattern matching letters followed by digits:\n0 True\n1 True\n2 True\n3 False\n4 True\ndtype: bool\n\n\nCode Explanation:
\n- \n
r'^+d+$'is a regular expression that matches strings starting with letters and ending with digits. \n^indicates the start of the string,$indicates the end of the string. \n- 'python' does not contain digits, so it does not match. \n
Example 3: Case-Insensitive Matching
\n\nBy setting case=False, you can achieve case-insensitive matching.
Example
\nimport pandas as pd\n\n# Create a Series with different cases of text\n\ns = pd.Series(['Apple','APPLE','apple','Banana','APPLE'])\n\n# Case-sensitive matching\n\nresult_case = s.str.contains('APPLE')\n\n# Case-insensitive matching\n\nresult_nocase = s.str.contains('APPLE', case=False)\n\nprint("Original Series:")\nprint(s)\nprint("nCase-sensitive matching 'APPLE':")\nprint(result_case)\nprint("nCase-insensitive matching 'APPLE':")\nprint(result_nocase)\n\n\nOutput Result:
\nOriginal Series:\n0 Apple\n1 APPLE\n2 apple\n3 Banana\n4 APPLE\ndtype: object\nCase-sensitive matching 'APPLE':\n0 False\n1 True\n2 False\n3 False\n4 True\ndtype: bool\nCase-insensitive matching 'APPLE':\n0 True\n1 True\n2 True\n3 False\n4 True\ndtype: bool\n\n\nCode Explanation:
\n- \n
case=True(default) is case-sensitive, only matching exactly 'APPLE'. \ncase=Falseis case-insensitive, 'Apple', 'APPLE', and 'apple' all match. \n
Example 4: Handling Missing Values
\n\nYou can specify how to handle NaN values using the na parameter.
Example
\nimport pandas as pd\nimport numpy as np\n\n# Create a Series with NaN values\n\ns = pd.Series(['apple','banana', np.nan,'grape',None])\n\n# Default handling of NaN (returns NaN)\n\nresult_default = s.str.contains('ap')\n\n# Treat NaN as False\n\nresult_na_false = s.str.contains('ap', na=False)\n\nprint("Original Series:")\nprint(s)\nprint("nDefault handling (returns NaN):")\nprint(result_default)\nprint("nTreat NaN as False:")\nprint(result_na_false)\n\n\nOutput Result:
\nOriginal Series:\n0 apple\n1 banana\n2 NaN\n3 grape\n4 None\ndtype: object\nDefault handling (returns NaN):\n0 True\n1 False\n2 NaN\n3 True\n4 NaN\ndtype: bool\nTreat NaN as False:\n0 True\n1 False\n2 False\n3 True\n4 False\ndtype: bool\n\n\nCode Explanation:
\n- \n
- By default, NaN and None return NaN. \n
- Setting
na=Falsetreats NaN as non-matching. \n - This is useful in actual data processing to avoid issues caused by NaN during filtering. \n
Example 5: Filtering Data
\n\ncontains() is often combined with boolean indexing to filter data.
Example
\nimport pandas as pd\n\n# Create a simulated product data Series\n\nproducts = pd.Series([\n'iPhone 14 Pro',\n'Samsung Galaxy S23',\n'iPhone 13',\n'Google Pixel 7',\n'iPad Pro',\n'MacBook Air',\n'Dell XPS 15'\n])\n\n# Filter products containing 'iPhone'\n\niphone_products = products[products.str.contains('iPhone')]\n\n# Filter products not containing 'i' (case-sensitive)\n\nno_i_products = products[~products.str.contains('i')]\n\nprint("All products:")\nprint(products)\nprint("nProducts containing 'iPhone':")\nprint(iphone_products)\nprint("nProducts not containing uppercase 'I':")\nprint(no_i_products)\n\n\nOutput Result:
\nAll products:\n0 iPhone 14 Pro\n1 Samsung Galaxy S23\n2 iPhone 13\n3 Google Pixel 7\n4 iPad Pro\n5 MacBook Air\n6 Dell XPS 15\ndtype: object\nProducts containing 'iPhone':\n0 iPhone 14 Pro\n2 iPhone 13\ndtype: object\nProducts not containing uppercase 'I':\n6 Dell XPS 15\ndtype: object\n\n\nCode Explanation:
\n- \n
products[products.str.contains('iPhone')]filters out products containing 'iPhone'. \n~products.str.contains('i')uses~to negate, filtering out products not containing 'i'. \n- This is a common operation in data filtering. \n
\n\n
Notes
\n\n- \n
str.contains()defaults to using regular expressions (regex=True). \n- If you want to match a normal string (not as a regular expression), set
regex=False. \n - By default, it is case-sensitive; use
case=Falsefor case-insensitive matching. \n - By default, NaN values return NaN; use the
naparameter to specify the return value. \n - This function returns a boolean Series, which can be directly used for boolean indexing to filter data. \n
\n\n
YouTip