Regexp Assertions
## Regular Expressions - Assertions \n\n### \n\n---\n\n## Regular Expressions - Assertions\n\nAssertions are metacharacters in regular expressions used to specify the position of a match. They do not match any actual characters but instead match the positions between characters.\n\nSuppose we want to find all numbers that follow a price in a long text, rather than finding all numbers. A regular expression might match every number, but by using assertions, you can precisely specify: **I only want to match numbers that immediately follow a price**.\n\nThis article will guide you through the four core types of assertions in regular expressions: positive lookahead, negative lookahead, positive lookbehind, and negative lookbehind.\n\n---\n\n### Characteristics of Assertions\n\n1. **Zero Width**: They do not occupy any character positions.\n2. **Condition Checking**: They only check whether certain conditions are met.\n3. **No Impact on Matching Results**: They serve solely as constraints for matching.\n\n---\n\n## Examples\n\n// Example: Match "foo" followed by "bar"\n\n```javascript\nconst regex = /foo(?=bar)/;\nconsole.log(regex.test("foobar")); // true\nconsole.log(regex.test("food")); // false\n\n* * *\n\n## Types of Assertions\n\nRegular expressions divide assertions into two main categories, further subdivided into four types:\n\n| Assertion Type | Regular Syntax | Alternate Name | Check Direction | Expected Condition | Plain Language Explanation |\n|----------------------|----------------|----------------|-------------------|--------------------|----------------------------|\n| **Positive Lookahead** | `(?=pattern)` | Positive Lookahead | Rightward (Forward) | **Presence** of `pattern` | I'm looking for a position where the **right side must** be... |\n| **Negative Lookahead** | `(?!pattern)` | Negative Lookahead | Rightward (Forward) | **Absence** of `pattern` | I'm looking for a position where the **right side must not** be... |\n| **Positive Lookbehind** | `(?<=pattern)` | Positive Lookbehind | Leftward (Backward) | **Presence** of `pattern` | I'm looking for a position where the **left side must** be... |\n| **Negative Lookbehind** | `(? {\n console.log(`Password '${pwd}' is valid: ${validatePassword(pwd)}`);\n});\n// Output:\n// Password 'Weak' is valid: false\n// Password 'strong123' is valid: false\n// Password 'STRONG123' is valid: false\n// Password 'Strong123' is valid: true\n\n#### Example: Python\n\n```python\nimport re\ndef validate_password(password):\n # Use multiple positive lookaheads to separately check each condition\n pattern = r'^(?=.*)(?=.*)(?=.*d).{8,}$'\n # ^ Start of string\n # (?=.*) Asserts that there must be at least one uppercase letter to the right\n # (?=.*) Similarly, asserts that there must be at least one lowercase letter to the right\n # (?=.*d) Asserts that there must be at least one digit to the right\n # .{8,} Main expression: matches any character at least 8 times (minimum length requirement)\n # $ End of string\n if re.match(pattern, password):\n return True\n else:\n return False\n# Test data\npasswords = ["Weak", "strong123", "STRONG123", "Strong123"]\nfor pwd in passwords:\n print(f"Password '{pwd}' is valid: {validate_password(pwd)}")\n# Output:\n# Password 'Weak' is valid: False # Insufficient length and missing digit\n# Password 'strong123' is valid: False # Missing uppercase letter\n# Password 'STRONG123' is valid: False # Missing lowercase letter\n# Password 'Strong123' is valid: True # Meets all requirements\n\n* * *\n\n## Negative Lookahead `(?!...)`\n\n**Negative lookahead** is the opposite of positive lookahead. It matches a position where, after it (to the right), a specified pattern `...` **must not** immediately follow.\n\n### Syntax and Parameters\n\n- **Syntax**: `(?!pattern)`\n- **Purpose**: Checks whether the current position's right side **does not match** `pattern`. If it doesn't match, the assertion succeeds.\n- **Key Feature**: Also zero-width.\n\n### Code Example: Finding Words Not Ending in "ing"\n\nIn a sentence, find all words that do not end with "ing".\n\n#### Example: JavaScript\n\n// Example 1: Negative lookahead\nconst text1 = "playing swimming run walk jumping sing";\nconst pattern1 = /bw+(?<!ing)b/g;\nconsole.log(text1.match(pattern1));\n// ['run', 'walk', 'sing']\n\n// Example 2: Match words not ending in "q" (more semantically correct)\nconst text2 = "I like faq apple Iraq you banana q";\nconst pattern2 = /bw*[^qW]b/g;\n// Meaning:\n// [^qW] The last character of the word is not "q"\n// This is more aligned with the real-world requirement of "not ending in q"\nconsole.log(text2.match(pattern2));\n// ['I', 'like', 'apple', 'you', 'banana']\n\n// Example 3: Match filenames not ending in ".js"\nconst files = "index.js app.ts config.json main.js readme.md";\nconst pattern3 = /bw+.(?!jsb)w+b/g;\n// Meaning: After the dot, it cannot be "js"\nconsole.log(files.match(pattern3));\n// ['app.ts', 'config.json', 'readme.md']\n* * *\n\n## Positive Lookbehind `(?<=...)`\n\n**Positive lookbehind** is used to match a position where, before it (to the left), a specified pattern `...` must immediately precede.\n\n_Note: Not all programming languages' regex engines support lookbehind. JavaScript fully supports it starting from ES2018, while Python's `re` module supports it._\n\n### Syntax and Parameters\n\n- **Syntax**: `(?<=pattern)`\n- **Purpose**: Checks whether the current position's left side matches `pattern`. If it does, the assertion succeeds.\n- **Key Feature**: Zero-width. The `pattern` must have a fixed length (cannot use quantifiers like `*` or `+`, which may vary in some implementations).\n\n### Code Example: Extracting Amounts After Currency Symbols\n\nExtract the numbers following dollar or pound symbols, excluding the symbols themselves.\n\n#### Example: JavaScript\n\n// Example 1: Match amounts after `$` or `Β£`\nconst text1 = "Price: $199, Β£89, Β₯1200";\nconst pattern1 =/(?<=$|Β£)d+/g;\n// Meaning:\n// (?<=$|Β£) The left side must be either `$` or `Β£`\n// d+ Matches the number itself\nconsole.log(text1.match(pattern1));\n// ['199', '89']\n\n// Example 2: Match numbers after a colon\nconst text2 = "port:8080 pid:1234 uid:1000";\nconst pattern2 =/(?<=:)d+/g;\n// The left side must be a colon\nconsole.log(text2.match(pattern2));\n// ['8080', '1234', '1000']\n\n// Example 3: Match minor version numbers within a major version number\nconst text3 = "v1.2 v2.15 v10.3";\nconst pattern3 =/(?<=vd+.)d+/g;\n// The left side must be "v" plus a number plus a period\nconsole.log(text3.match(pattern3));\n// ['2', '15', '3']\n\n// Example 4: Match usernames after "@" (excluding "@")\nconst text4 = "hello @alice and @bob_smith";\nconst pattern4 =/(?<=@)w*/g;\nconsole.log(text4.match(pattern4));\n// ['alice', 'bob_smith']\n\n// Example 5: Match chapter numbers in Chinese texts like "Chapter X"\nconst text5 = "Chapter 1 Chapter 12 Chapter 3";\nconst pattern5 =/(?<=Line)d+(?=Chapter)/g;\n// Both sides are constrained simultaneously, making it clearer\nconsole.log(text5.match(pattern5));\n// ['1', '12', '3']\n* * *\n\n## Negative Lookbehind `(?<!...)`\n\n**Negative lookbehind** is the reverse of positive lookbehind. It matches a position where, before it (to the left), a specified pattern `...` **must not** immediately precede.\n\n### Syntax and Parameters\n\n- **Syntax**: `(?<!pattern)`\n- **Purpose**: Checks whether the current position's left side **does not match** `pattern`.\n- **Key Feature**: Zero-width, typically requiring `pattern` to have a fixed length.\n\n### Code Example: Finding Non-Negative Integers\n\nIn a text, match all integers that are not negative (i.e., numbers without a leading `-` sign).\n\n#### Example: JavaScript\n\nconst text = "Today's temperature is -5 degrees, tomorrow it will rise to 3 degrees, the day after tomorrow the temperature is -1 degree, and the indoor temperature remains at 22 degrees.";\n// Negative lookbehind\n// Matches numbers, but requires no leading minus sign\nconst pattern =/(?<!-)bd+b/g;\nconst matches = text.match(pattern);\nconsole.log("Non-negative integers:", matches);\n// Output: ['3', '22']\n// -5 and -1 fail the assertion because they have a leading minus sign\n* * *\n\n## Comprehensive Applications and Flowchart\n\nTo better understand how these four types of assertions work, we can use the following flowchart to illustrate a complex matching process involving multiple assertions:\n\n!(#)\n\n**Flowchart Explanation**: This diagram shows how the regular expression engine processes assertions during matching. At its core, an **assertion is an independent checkpoint**. At the current position where the main expression attempts to match, the engine checks the context to the left or right based on the type of assertion. Only when all assertion conditions are satisfied can the match proceed further and "consume" characters; otherwise, the engine backtracks to try other possibilities or declares the match unsuccessful.\n\n- Positive lookahead\n- Negative lookahead\n- Positive lookbehind\n- Negative lookbehind\n- Comprehensive practical application\n- Interactive testing\n\nText: The price is 100 yuan.\n\nRegular Expression: `/d+(?=Yuan)/g`\n\nMatch numbers that are immediately followed by "yuan".\n\nMatch successful β ["100"]\n\nResult: The price is 100 yuan.\n\nText: Windows 10 and Linux systems.\n\nRegular Expression: `/w+(?=d+)/g`\n\nMatch words that are immediately followed by a number.\n\nMatch successful β ["windows1"]\n\nResult: Windows 10 and Linux systems.\n\nText: Hello world.\n\nRegular Expression: `/w+(?=s)/g`\n\nMatch words that are immediately followed by a space.\n\nMatch successful β ["hello"]\n\nResult: Hello world.
YouTip