Regular Expressions - Examples
The simplest form of a regular expression is a single literal character that matches itself in the search string.
For example, a single-character pattern like a will always match the letter a wherever it appears in the search string. Here are some examples of single-character regular expression patterns:
The regular expression will exactly match the string "abc", regardless of where it appears in the text. It will only succeed if it matches "abc" completely.
Please note that this is a very simple example because the string "abc" itself contains no special metacharacters or patterns. Therefore, you can use it directly as a regular expression without needing additional escaping or quantifiers.
Wildcard .
The dot . matches any single character except the newline characters n and r. The following regular expression matches aac, abc, acc, adc, and so on, as well as a1c, a2c, a-c, and a#c:
To match a string that contains a filename where the dot . is part of the input string, place a backslash character before the dot in the regular expression. For example, the following regular expression matches filename.ext:
/filename.ext/
Quantifier *
Matches the preceding element zero or more times:
/ab*c/
This expression can match strings like "b", "ab", "aab", "aaab", and so on.
Quantifier +
Matches the preceding element one or more times:
/ab+c/
This expression can match strings like "ab", "aab", "aaab", and so on, but not "b".
Quantifier ?
Matches the preceding element zero or one time:
/colou?r/
This expression can match "color" or "colour".
These are some common examples of single-character pattern matching, which can be used for flexible matching and searching of strings with different patterns.
Please note that the syntax of regular expressions may vary between programming languages and tools, so please refer to the relevant documentation when using them in practice.
A Reasonable Username Regular Expression
Usernames can contain the following types of characters:
- 1. 26 uppercase and lowercase English letters, represented as
a-zA-Z. - 2. Digits, represented as
0-9. - 3. Underscore, represented as
_. - 4. Hyphen, represented as
-.
A username consists of one or more letters, digits, underscores, and hyphens, so the + quantifier is needed to indicate 1 or more occurrences.
Based on these conditions, the username expression can be:
+
Example
var str = "abc123-_def";
var patt = /+/;
document.write(str.match(patt));
The following marked text is the matched expression:
abc123-_def
If the hyphen is not needed, it becomes:
+
Example
var str = "abc123def";
var str2 = "abc123_def";
var patt = /+/;
document.write(str.match(patt));
document.write(str2.match(patt));
The following marked text is the matched expression:
abc123def
abc123_def
The following regular expression is used to match iframe tags:
/<iframe(()*?)</iframe>/
Matching other tags can be done by replacing iframe.
Match a div tag with id="mydiv":
/<div id="mydiv"(()*?)</div>/
Match all img tags:
Example
/<img.*?src="(.*?)".*?/?>/gi
To create a list of matching character groups, place one or more individual characters inside square brackets . When characters are enclosed in brackets, the list is called a "bracket expression." As in any other location, ordinary characters inside brackets represent themselvesβthat is, they match themselves once in the input text. Most special characters lose their meaning when they appear inside a bracket expression. However, there are some exceptions:
- If the
]character is not the first item, it ends a list. To match the]character in the list, place it first, immediately after the opening[. - The
character continues to function as an escape character. To match thecharacter, use\.
Characters enclosed in a bracket expression match only a single character at that position in the regular expression. The following regular expression matches Chapter 1, Chapter 2, Chapter 3, Chapter 4, and Chapter 5:
/Chapter /
Please note that the position of the word Chapter and the following space is fixed relative to the characters inside the brackets. The bracket expression specifies only the set of characters that match the single character position immediately following the word Chapter and the space. This is the ninth character position.
To use a range instead of individual characters to represent a matching character group, use a hyphen - to separate the starting character and the ending character of the range. The character values of the individual characters determine the relative order within the range. The following regular expression includes a range expression, which is equivalent to the list shown in the brackets above.
/Chapter /
When specifying a range in this way, both the starting value and the ending value are included in the range. Note that it is also important that, in Unicode sort order, the starting value must come before the ending value.
To include a hyphen in a bracket expression, use one of the following methods:
- Escape it with a backslash:
- Place the hyphen at the beginning or end of the bracket list. The following expression matches all lowercase letters and hyphens:
- Create a range where the starting character value is less than the hyphen, and the ending character value is equal to or greater than the hyphen. The following two regular expressions both meet this requirement:
[!-~]
To find all characters not in the list or range, place a caret ^ at the beginning of the list. If the caret character appears anywhere else in the list, it matches itself. The following regular expression matches any digit and character other than 1, 2, 3, 4, or 5:
/Chapter [^12345]/
In the example above, the expression matches any digit and character other than 1, 2, 3, 4, or 5 at the ninth position. Thus, for example, Chapter 7 is a match, and Chapter 9 is also a match.
The expression above can be represented using a hyphen -:
/Chapter [^1-5]/
A typical use of bracket expressions is to specify a match for any uppercase or lowercase letter or any digit. The following expression specifies such a match:
//
Alternation uses the | character to allow a choice between two or more alternative options. For example, the chapter title regular expression can be expanded to return matches broader than the chapter title range. However, this is not as simple as you might think. Alternation matches the largest expression on either side of the | character.
You might think that the following expression matches Chapter or Section at the beginning and end of a line, followed by one or two digits:
/^Chapter|Section {0,1}$/
Unfortunately, the regular expression above either matches the word Chapter at the beginning of the line or matches the word Section at the end of the line followed by any digits. If the input string is Chapter 22, the expression above only matches the word Chapter. If the input string is Section 22, the expression matches Section 22.
To make the regular expression easier to control, you can use parentheses to limit the scope of the alternation, that is, ensure it applies only to the two words Chapter and Section. However, parentheses are also used to create subexpressions and may capture them for later use, as discussed in the section on backreferences. By adding parentheses at the appropriate position in the regular expression above, the expression can match Chapter 1 or Section 3.
The following regular expression uses parentheses to group Chapter and Section so that the expression works correctly:
/^(Chapter|Section) {0,1}$/
Although these expressions work normally, the parentheses around Chapter|Section will also capture either of the two matching words for later use. Since there is only one set of parentheses in the expression above, there is only one captured "submatch."
In the example above, you only need to use parentheses to group the choice between the words Chapter and Section. To prevent the match from being saved for future use, place ?: before the regular expression pattern inside the parentheses. The following modification provides the same capability without saving the submatch:
/^(?:Chapter|Section) {0,1}$/
In addition to the ?: metacharacter, two other non-capturing metacharacters create what are called "lookahead" matches. Positive lookahead is specified using ?=, which matches the search string at the starting point of the match of the regular expression pattern in the parentheses. Negative lookahead is specified using ?!, which matches the search string at the starting point of a string that does not match the regular expression pattern.
For example, suppose you have a document that contains references to Windows 3.1, Windows 95, Windows 98, and Windows NT. Furthermore, suppose you need to update the document to change all references to Windows 95, Windows 98, and Windows NT to Windows 2000. The following regular expression (an example of positive lookahead) matches Windows 95, Windows 98, and Windows NT:
/Windows(?=95 |98 |NT )/
After finding a match, the search for the next match continues immediately after the matched text (excluding the characters in the lookahead). For example, if the expression above matches Windows 98, the search will continue after Windows rather than after 98.
Below are some regular expression examples:
| Regular Expression | Description |
|---|---|
/b(+) 1b/gi |
A word repeated consecutively. |
/(w+)://([^/:]+)(:d*)?([^# ]*)/ |
Matches a URL parsed into protocol, domain, port, and relative path. |
/^(?:Chapter|Section) {0,1}$/ |
Locates the position of chapters. |
// |
26 letters from a to z plus a hyphen. |
/terb/ |
Can match chapter, but not terminal. |
/Bapt/ |
Can match chapter, but not aptitude. |
/Windows(?=95 |98 |NT )/ |
Can match Windows95 or Windows98 or WindowsNT. After finding a match, the next search starts after Windows. |
/^s*$/ |
Matches an empty line. |
/d{2}-d{5}/ |
Validates an ID number consisting of two digits, a hyphen, and five digits. |
<+.*?>(*?)</*?> |
Matches HTML tags. |
| Regular Expression | Description |
|---|---|
hello |
Matches {hello} |
gray|grey |
Matches {gray, grey} |
gr(a|e)y |
Matches {gray, grey} |
gry |
Matches {gray, grey} |
bbble |
Matches {babble, bebble, bibble, bobble, bubble} |
at|ot |
Matches {bat, cat, hat, mat, nat, oat, pat, Pat, ot} |
colou?r |
Matches {color, colour} |
rege(x(es)?|xps?) |
Matches {regex, regexes, regexp, regexps} |
go*gle |
Matches {ggle, gogle, google, gooogle, goooogle, ...} |
go+gle |
Matches {gogle, google, gooogle, goooogle, ...} |
g(oog)+le |
Matches {google, googoogle, googoogoogle, googoogoogoogle, ...} |
z{3} |
Matches {zzz} |
z{3,6} |
Matches {zzz, zzzz, zzzzz, zzzzzz} |
z{3,} |
Matches {zzz, zzzz, zzzzz, ...} |
rainf**k |
Matches {Brainf**k, brainf**k} |
d |
Matches {0,1,2,3,4,5,6,7,8,9} |
1d{10} |
Matches 11 digits starting with 1 |
|d|3 |
Matches integers in the range 2 to 36 |
Hellonworld |
Matches Hello followed by a newline, followed by world |
d+(.dd)? |
Contains a positive integer or a floating-point number with two decimal places. |
[^*@#] |
Excludes the three special symbols *, @, # |
//[^rn]* |
Matches comments starting with // |
^dog |
Matches starting with "dog" |
dog$ |
Matches ending with "dog" |
^dog$ |
Is exactly "dog" |
YouTip