Python Re Sub
## Python `re.sub()` Method
The `re.sub()` function in Python's `re` (regular expression) module is used to **replace occurrences** of a pattern within a string.
It searches the target string for all substrings that match the specified regular expression pattern and replaces them with a replacement string or the output of a replacement function.
> **Etymology**: The `sub` in `re.sub` stands for **substitute**.
---
## Syntax and Parameters
### Syntax
```python
re.sub(pattern, repl, string, count=0, flags=0)
```
### Parameter Descriptions
* **`pattern`**: The regular expression pattern to search for.
* **`repl`**: The replacement. This can be either a string or a callable function.
* **`string`**: The original input string to search and modify.
* **`count`**: The maximum number of pattern occurrences to be replaced. The default value is `0`, which means all occurrences will be replaced.
* **`flags`**: Optional regex flags (such as `re.IGNORECASE`, `re.MULTILINE`, etc.) to modify matching behavior.
### Return Value
* Returns a new string with the matched patterns replaced. If the pattern is not found, the original string is returned unchanged.
---
## Code Examples
### Example 1: Basic Replacement
This example demonstrates how to replace a plain text substring with another string.
```python
import re
text = "Python is good, Python is great"
# Replace 'Python' with 'Java'
result = re.sub(r'Python', 'Java', text)
print(result)
```
**Expected Output:**
```text
Java is good, Java is great
```
---
### Example 2: Limiting Replacement Count
By using the `count` parameter, you can control how many matches are replaced.
```python
import re
text = "Python is good, Python is great"
# Replace only the first occurrence of 'Python'
result = re.sub(r'Python', 'Java', text, count=1)
print(result)
```
**Expected Output:**
```text
Java is good, Python is great
```
---
### Example 3: Replacing with Regular Expressions
You can use regular expression patterns to match dynamic content, such as masking sensitive digits.
```python
import re
text = "My phone number is 138-1234-5678"
# Replace the middle 4 digits of the phone number with ****
result = re.sub(r'\d{4}', '****', text, count=1)
print(result)
```
**Expected Output:**
```text
My phone number is 138-****-5678
```
---
### Example 4: Using Backreferences (Group Referencing)
You can use captured groups in your pattern and reference them in the replacement string using `\g` or `\group_id` (e.g., `\1`, `\2`).
```python
import re
text = "2024-04-02"
# Swap the order of Year, Month, and Day (YYYY-MM-DD to DD/MM/YYYY)
result = re.sub(r'(\d{4})-(\d{2})-(\d{2})', r'\3/\2/\1', text)
print(result)
```
**Expected Output:**
```text
02/04/2024
```
---
### Example 5: Using a Function as the Replacement
If `repl` is a function, it is called for every non-overlapping occurrence of `pattern`. The function takes a single match object argument and must return a replacement string.
```python
import re
text = "3 apples, 5 bananas, 8 oranges"
# Increment each number found in the string by 1
def add_one(match):
# match.group() retrieves the matched string
return str(int(match.group()) + 1)
result = re.sub(r'\d+', add_one, text)
print(result)
```
**Expected Output:**
```text
4 apples, 6 bananas, 9 oranges
```
---
## Considerations and Best Practices
1. **Raw Strings (`r'...'`)**: Always use raw strings for regular expression patterns (e.g., `r'\d+'`) to prevent Python's string parser from misinterpreting backslashes as escape characters.
2. **Immutability**: Strings in Python are immutable. `re.sub()` does not modify the original string in place; instead, it returns a brand-new string.
3. **Performance**: If you need to perform the same substitution repeatedly in a loop, consider compiling the regular expression first using `re.compile()` and then calling the `sub()` method on the compiled pattern object:
```python
pattern = re.compile(r'\d+')
result = pattern.sub('NUMBER', text)
```
4. **`re.subn()` Alternative**: If you need to know how many substitutions were actually made, use `re.subn()`. It returns a tuple containing `(new_string, number_of_substitutions_made)`.
YouTip