YouTip LogoYouTip

Regexp Metachar

# Regular Expressions – Metacharacters ## Regular Expressions - Metacharacters Metacharacters in regular expressions are characters with special meanings. They do not represent literal values but are used to control the matching pattern. * * * ## Basic Metacharacters `.` (Dot) * Matches any single character except the newline character (`n`) Example: `a.b` matches "aab", "a1b", "a b", etc. `^` (Caret) * Matches the start position of a string * Example: `^abc` matches strings that start with "abc" `$` (Dollar Sign) * Matches the end position of a string * Example: `xyz$` matches strings that end with "xyz" `` (Backslash) * Escape character, makes the following character lose its special meaning * Example: `.` matches a literal dot instead of any character * * * ## Character Class Metacharacters `[]` (Square Brackets) * Defines a character set, matches any one character within the set * Example: `` matches any one vowel `[^]` (Negated Character Class) * Matches any character not in the square brackets * Example: `[^0-9]` matches any non-digit character `-` (Hyphen) * Represents a range within a character class * Example: `` matches any lowercase letter * * * ## Quantifier Metacharacters `*` (Asterisk) * Matches the preceding sub-expression zero or more times * Example: `ab*c` matches "ac", "abc", "abbc", etc. `+` (Plus Sign) * Matches the preceding sub-expression one or more times * Example: `ab+c` matches "abc", "abbc" but not "ac" `?` (Question Mark) * Matches the preceding sub-expression zero or one time * Example: `colou?r` matches "color" and "colour" `{n}` (Curly Braces) * Matches exactly n times * Example: `a{3}` matches "aaa" `{n,}` * Matches at least n times * Example: `a{2,}` matches "aa", "aaa", etc. `{n,m}` * Matches between n and m times * Example: `a{2,4}` matches "aa", "aaa", "aaaa" * * * ## Grouping and Alternation Metacharacters `()` (Parentheses) * Defines a sub-expression or capturing group * Example: `(ab)+` matches "ab", "abab", etc. `|` (Vertical Bar) * Represents an "OR" relationship * Example: `cat|dog` matches "cat" or "dog" * * * ## Special Character Class Metacharacters `d` * Matches any digit, equivalent to `` `D` * Matches any non-digit, equivalent to `[^0-9]` `w` * Matches any word character (letter, digit, underscore), equivalent to `` `W` * Matches any non-word character, equivalent to `[^a-zA-Z0-9_]` `s` * Matches any whitespace character (space, tab, newline, etc.) `S` * Matches any non-whitespace character * * * ## Boundary Matching Metacharacters `b` * Matches a word boundary * Example: `bcatb` matches "cat" but not "category" `B` * Matches a non-word boundary * Example: `BcatB` matches "cat" in "scattered" but not the standalone "cat" * * * ## Other Metacharacters `n` * Matches a newline character `t` * Matches a tab character `r` * Matches a carriage return character `f` * Matches a form feed character `v` * Matches a vertical tab character * * * ## Greedy vs. Non-Greedy Quantifiers By default, quantifiers (`*`, `+`, `?`, `{}`) are greedy and match as many characters as possible. Adding `?` after a quantifier makes it non-greedy (lazy): * `*?`: Zero or more times, but as few as possible * `+?`: One or more times, but as few as possible * `??`: Zero or one time, but as few as possible * `{n,m}?`: Between n and m times, but as few as possible Example: `` matches HTML tags without crossing tag boundaries * * * ## Positive and Negative Lookahead `(?=...)` (Positive Lookahead) * Matches a position followed by a specific pattern * Example: `Windows(?=95|98)` matches "Windows" followed by 95 or 98 `(?!...)` (Negative Lookahead) * Matches a position not followed by a specific pattern * Example: `Windows(?!95|98)` matches "Windows" not followed by 95 or 98 `(?<=...)` (Positive Lookbehind) * Matches a position preceded by a specific pattern * Example: `(?<=95|98)Windows` matches "Windows" preceded by 95 or 98 `(?<!...)` (Negative Lookbehind) * Matches a position not preceded by a specific pattern * Example: `(?<!95|98)Windows` matches "Windows" not preceded by 95 or 98 ### Example Next, we analyze a regular expression for matching email addresses, as shown in the following image: ## Example var str = "abcd test@.com 1234"; var patt1 = /b[w.%+-]+@[w.-]+.{2,6}b/g; document.write(str.match(patt1)); The highlighted text below is the matched expression: test@.com !(#) [Try it Β»](#) !(#) The following table contains a complete list of metacharacters and their behavior in regular expression context: | Character | Description | | --- | --- | | | Marks the next character as a special character, a literal character, a backreference, or an octal escape. For example, 'n' matches the character "n". 'n' matches a newline character. The sequence '' matches "" and "(" matches "(". | | ^ | Matches the start position of the input string. If the RegExp object's Multiline property is set, ^ also matches the position after 'n' or 'r'. | | $ | Matches the end position of the input string. If the RegExp object's Multiline property is set, $ also matches the position before 'n' or 'r'. | | * | Matches the preceding sub-expression zero or more times. For example, zo* matches "z" and "zoo". * is equivalent to {0,}. | | + | Matches the preceding sub-expression one or more times. For example, 'zo+' matches "zo" and "zoo", but not "z". + is equivalent to {1,}. | | ? | Matches the preceding sub-expression zero or one time. For example, "do(es)?" matches "do" or "does". ? is equivalent to {0,1}. | | {n} | n is a non-negative integer. Matches exactly n times. For example, 'o{2}' does not match the 'o' in "Bob", but matches the two o's in "food". | | {n,} | n is a non-negative integer. Matches at least n times. For example, 'o{2,}' does not match the 'o' in "Bob", but matches all o's in "foooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'. | | {n,m} | m and n are non-negative integers, where n <= m. Matches at least n and at most m times. For example, "o{1,3}" will match the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Note that there cannot be a space between the comma and the two numbers. | | ? | When this character follows any other quantifier (*, +, ?, {n}, {n,}, {n,m}), the match is non-greedy. Non-greedy mode matches as few characters as possible, while the default greedy mode matches as many as possible. For example, for the string "oooo", 'o+?' matches a single "o", while 'o+' matches all 'o's. | | . | Matches any single character except the newline character (n, r). To match any character including 'n', use a pattern like "**(.|n)**". | | (pattern) | Matches pattern and captures the match. The captured match can be accessed from the resulting Matches collection, using SubMatches collection in VBScript and $0…$9 properties in JScript. To match parentheses characters, use '(' or ')'. | | (?:pattern) | Matches pattern but does not capture the match, i.e., it is a non-capturing match and is not stored for later use. This is useful when combining parts of a pattern with the "or" character (|). For example, 'industr(?:y|ies)' is a more concise expression than 'industry|industries'. | | (?=pattern) | Positive lookahead, matches a search string at any point where a string matching pattern begins. This is a non-capturing match, i.e., the match does not need to be captured for later use. For example, "Windows(?=95|98|NT|2000)" matches "Windows" in "Windows2000", but not in "Windows3.1". Lookahead does not consume characters, meaning after a match occurs, the next match search starts immediately after the last match, not after the character containing the lookahead. | | (?!pattern) | Negative lookahead, matches a search string at any point where a string not matching pattern begins. This is a non-capturing match, i.e., the match does not need to be captured for later use. For example, "Windows(?!95|98|NT|2000)" matches "Windows" in "Windows3.1", but not in "Windows2000". Lookahead does not consume characters, meaning after a match occurs, the next match search starts immediately after the last match, not after the character containing the lookahead. | | (?<=pattern) | Positive lookbehind, similar to positive lookahead but in the opposite direction. For example, "`(?<=95|98|NT|2000)Windows`" matches "`Windows`" in "`2000Windows`", but not in "`3.1Windows`". | | (?<!pattern) | Negative lookbehind, similar to negative lookahead but in the opposite direction. For example, "`(?<!95|98|NT|2000)Windows`" matches "`Windows`" in "`3.1Windows`", but not in "`2000Windows`". | | x|y | Matches x or y. For example, 'z|food' matches "z" or "food". '(z|f)ood' matches "zood" or "food". | | | Character set. Matches any one character contained. For example, '' matches 'a' in "plain". | | [^xyz] | Negated character set. Matches any character not contained. For example, '[^abc]' matches 'p', 'l', 'i', 'n' in "plain". | | | Character range. Matches any character in the specified range. For example, '' matches any lowercase letter from 'a' to 'z'. | | [^a-z] | Negated character range. Matches any character not in the specified range. For example, '[^a-z]' matches any character not between 'a' and 'z'. | | (#) | Matches a word boundary, i.e., the position between a word and a space. For example, 'erb' matches 'er' in "never", but not in "verb". | | (#) | Matches a non-word boundary. 'erB' matches 'er' in "verb", but not in "never". | | cx | Matches the control character indicated by x. For example, cM matches a Control-M or carriage return. The value of x must be A-Z or a-z. Otherwise, c is treated as a literal 'c' character. | | d | Matches a digit character. Equivalent to . | | D | Matches a non-digit character. Equivalent to [^0-9]. | | f | Matches a form feed character. Equivalent to x0c and cL. | | n | Matches a newline character. Equivalent to x0a and cJ. | | r | Matches a carriage return character. Equivalent to x0d and cM. | | s | Matches any whitespace character, including space, tab, form feed, etc. Equivalent to . | | S | Matches any non-whitespace character. Equivalent to [^ fnrtv]. | | t | Matches a tab character. Equivalent to x09 and cI. | | v | Matches a vertical tab character. Equivalent to x0b and cK. | | w | Matches a word character (letter, digit, underscore). Equivalent to ''. | | W | Matches a non-word character. Equivalent to '[^A-Za-z0-9_]'. | | xn | Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, 'x41' matches "A". 'x041' is equivalent to 'x04' & "1". ASCII encoding can be used in regular expressions. | | num | Matches num, where num is a positive integer. A backreference to the captured match. For example, '(.)1' matches two consecutive identical characters. | | n | Identifies an octal escape value or a backreference. If n is preceded by at least n captured sub-expressions, n is a backreference. Otherwise, if n is an octal digit (0-7), n is an octal escape value. | | nm | Identifies an octal escape value or a backreference. If nm is preceded by at least nm captured sub-expressions, nm is a backreference. If nm is preceded by at least n captures, n is a backreference followed by the literal m. If none of the previous conditions are met, and n and m are octal digits (0-7), nm matches the octal escape value nm. | | nml | Matches the octal escape value nml if n is an octal digit (0-3), and m and l are octal digits (0-7). | | un | Matches n, where n is a Unicode character represented by four hexadecimal digits. For example, u00A9 matches the copyright symbol (?). |
← Regexp OperatorRegexp Syntax β†’