YouTip LogoYouTip

Perl Regular Expressions

A regular expression describes a pattern for string matching. It can be used to check if a string contains a certain substring, replace matching substrings, or extract substrings that meet certain conditions from a string. Perl's regular expression functionality is extremely powerful, arguably the most powerful among commonly used languages. Many languages have referenced Perl's regular expressions when designing their own regex support. Perl's regular expressions have three main forms: matching, substitution, and translation: * Matching: m// (can be abbreviated as //, omitting the m) * Substitution: s/// * Translation: tr/// These three forms are generally used in conjunction with **=~** or **!~**. =~ indicates a match, while !~ indicates a non-match. * * * ## Matching Operator The matching operator `m//` is used to match a string statement or a regular expression. For example, to match "run" in the scalar `$bar`, the code is as follows: ## Example ```perl #!/usr/bin/perl $bar = "I am tutorial site. welcome to tutorial site."; if($bar =~ /run/){ print "First matchn"; }else{ print "First non-matchn"; } $bar = "run"; if($bar =~ /run/){ print "Second matchn"; }else{ print "Second non-matchn"; } Executing the above program, the output is: First match Second match ### Pattern Matching Modifiers Pattern matching has some common modifiers, as shown in the table below: | Modifier | Description | | --- | --- | | i | Ignore case in the pattern | | m | Multiline mode | | o | Compile pattern only once | | s | Single-line mode, where "." matches "n" (by default, it does not) | | x | Ignore whitespace in the pattern | | g | Global match | | cg | After a global match fails, allow searching for the next match | * * * ## Regular Expression Variables After processing, Perl stores the matched values in three special variable names: * **$`:** The string before the matched part * **$&:** The matched string * **$':** The remaining string after the match If you put these three variables together, you will get the original string. Example: ## Example ```perl #!/usr/bin/perl $string = "welcome to tutorial site."; $string =~ m/run/; print "String before match: $`n"; print "Matched string: $&n"; print "String after match: $'n"; Executing the above program, the output is: String before match: welcome to Matched string: run String after match: oob site. * * * ## Substitution Operator The substitution operator `s///` is an extension of the matching operator, used to replace a specified string with a new string. The basic format is: ```perl s/PATTERN/REPLACEMENT/; `PATTERN` is the matching pattern, and `REPLACEMENT` is the replacement string. For example, we replace "google" with "tutorial" in the following string: ## Example ```perl #!/usr/bin/perl $string = "welcome to google site."; $string =~ s/google/tutorial/; print "$stringn"; Executing the above program, the output is: welcome to tutorial site. ### Substitution Operation Modifiers The substitution operation modifiers are shown in the table below: | Modifier | Description | | --- | --- | | i | If "i" is added to the modifier, the regex will become case-insensitive, meaning "a" and "A" are the same. | | m | By default, the start "^" and end "$" of a regex only apply to the entire string. If "m" is added to the modifier, the start and end will refer to each line of the string: the beginning of each line is "^", and the end is "$". | | o | The expression is executed only once. | | s | If "s" is added to the modifier, the default "." which represents any character except the newline will become any character, including the newline! | | x | If this modifier is added, whitespace characters in the expression will be ignored unless they are escaped. | | g | Replace all matching strings. | | e | Treat the replacement string as an expression | * * * ## Translation Operator The following are modifiers related to the translation operator: | Modifier | Description | | --- | --- | | c | Transliterate all characters not found in the search list | | d | Delete all characters found in the search list | | s | Squeeze repeated characters in the output to a single character | The following example converts all lowercase letters in the variable `$string` to uppercase: ## Example ```perl #!/usr/bin/perl $string = 'welcome to tutorial site.'; $string =~ tr/a-z/A-Z/; print "$stringn"; Executing the above program, the output is: WELCOME TO TUTORIAL SITE. The following example uses `/s` to delete duplicate characters in the variable `$string`: ## Example ```perl #!/usr/bin/perl $string = 'tutorial'; $string =~ tr/a-z/a-z/s; print "$stringn"; Executing the above program, the output is: runob More examples: ```perl $string =~ tr/d/ /c; # Replace all non-digit characters with spaces $string =~ tr/t //d; # Delete tabs and spaces $string =~ tr/0-9/ /cs # Replace other characters between digits with a single space. * * * ## More Regular Expression Rules | Expression | Description | | --- | --- | | . | Matches any character except a newline | | x? | Matches 0 or 1 occurrence of the string x | | x* | Matches 0 or more occurrences of the string x, but matches the minimum possible number of times | | x+ | Matches 1 or more occurrences of the string x, but matches the minimum possible number of times | | .* | Matches 0 or more occurrences of any character | | .+ | Matches 1 or more occurrences of any character | | {m} | Matches exactly m occurrences of the specified string | | {m,n} | Matches the specified string occurring more than m times and less than n times | | {m,} | Matches the specified string occurring m or more times | | [] | Matches characters within the brackets | | [^] | Matches characters not within the brackets | | | Matches all digit characters | | | Matches all lowercase letter characters | | [^0-9] | Matches all non-digit characters | | [^a-z] | Matches all non-lowercase letter characters | | ^ | Matches the character at the beginning of the string | | $ | Matches the character at the end of the string | | d | Matches a digit character, same as syntax | | d+ | Matches multiple digit strings, same as + syntax | | D | Non-digit, otherwise same as d | | D+ | Non-digit, otherwise same as d+ | | w | Alphanumeric string, same as syntax | | w+ | Same as + syntax | | W | Non-alphanumeric string, same as [^a-zA-Z0-9_] syntax | | W+ | Same as [^a-zA-Z0-9_]+ syntax | | s | Whitespace, same as syntax | | s+ | Same as + | | S | Non-whitespace, same as [^ntrf] syntax | | S+ | Same as [^ntrf]+ | | b | Matches a word boundary (between a word character and a non-word character) | | B | Matches a non-word boundary | | a|b|c | Matches a string that contains character a, or character b, or character c | | abc | Matches a string containing "abc" | | (pattern) | This symbol remembers the found string, which is a very practical syntax. The string found in the first () becomes the variable $1 or 1, the string found in the second () becomes the variable $2 or 2, and so on. | | /pattern/i | The parameter i indicates to ignore English case, meaning when matching strings, case is not considered. | | | If you want to find a special character like "*" in the pattern, you need to add a before this character to make the special character ineffective. | ### More References Regular Expressions: [ Perl Regular Expressions: [https://perldoc.perl.org/perlre#Regular-Expressions](https://perldoc.perl.org/perlre#Regular-Expressions)
← Perl Socket ProgrammingPerl Error Handling β†’