Scala Regular Expressions
Scala supports regular expressions through the **Regex** class in the scala.util.matching package. The following example demonstrates using a regular expression to find the word **Scala** :
## Instance
import scala.util.matching.Regex
object Test {
def main(args: Array){
val pattern ="Scala".r
val str ="Scala is Scalable and cool"
println(pattern findFirstIn str)
}
}
Executing the above code, the output result is:
$ scalac Test.scala $ scala TestSome(Scala)
The instance uses the r() method of the String class to construct a Regex object.
Then use the findFirstIn method to find the first match.
If you need to view all matching items, you can use the findAllIn method.
You can use the mkString( ) method to join the strings of the regular expression matching results, and you can use a pipe (|) to set different patterns:
## Instance
import scala.util.matching.Regex
object Test {
def main(args: Array){
val pattern =new Regex("(S|s)cala")// The first letter can be uppercase S or lowercase s
val str ="Scala is scalable and cool"
println((pattern findAllIn str).mkString(","))// Use comma , to join the returned results
}
}
Executing the above code, the output result is:
$ scalac Test.scala $ scala TestScala,scala
If you need to replace the matched text with a specified keyword, you can use the **replaceFirstIn( )** method to replace the first match, and use the **replaceAllIn( )** method to replace all matches, as shown in the following instance:
## Instance
object Test {
def main(args: Array){
val pattern ="(S|s)cala".r
val str ="Scala is scalable and cool"
println(pattern replaceFirstIn(str, "Java"))
}
}
Executing the above code, the output result is:
$ scalac Test.scala $ scala TestJava is scalable and cool
* * *
## Regular Expressions
Scala's regular expressions inherit the syntax rules of Java, and Java mostly uses the rules of the Perl language.
The following table lists some commonly used regular expression rules:
| Expression | Matching Rule |
| --- | --- |
| ^ | Matches the position at the beginning of the input string. |
| $ | Matches the position at the end of the input string. |
| . | Matches any single character except "\r\n". |
| [...] | Character set. Matches any one of the contained characters. For example, "" matches "a" in "plain". |
| [^...] | Reverse character set. Matches any character not contained. For example, "[^abc]" matches "p", "l", "i", "n" in "plain". |
| \\A | Matches the position at the beginning of the input string (no multiline support) |
| \\z | End of string (similar to $, but not affected by the multiline processing option) |
| \\Z | End of string or end of line (not affected by the multiline processing option) |
| re* | Repeats zero or more times |
| re+ | Repeats one or more times |
| re? | Repeats zero or one time |
| re{ n} | Repeats n times |
| re{ n,} |
| re{ n, m} | Repeats n to m times |
| a|b | Matches a or b |
| (re) | Matches re, and captures the text into an automatically named group |
| (?: re) | Matches re, does not capture the matched text, and does not assign a group number to this group |
| (?> re) | Greedy subexpression |
| \\w | Matches letters, digits, or underscores |
| \\W | Matches any character that is not a letter, digit, underscore, or Chinese character |
| \\s | Matches any whitespace character, equivalent to [\t\n\r\f] |
| \\S | Matches any character that is not a whitespace character |
| \\d | Matches digits, similar to |
| \\D | Matches any non-digit character |
| \\G | Beginning of the current search |
| \\n | Newline character |
| \\b | Usually a word boundary position, but represents a backspace if used inside a character class |
| \\B | Matches a position that is not the beginning or end of a word |
| \\t | Tab character |
| \\Q | Start quote: **\Q(a+b)*3\E** can match the text "(a+b)*3". |
| \\E | End quote: **\Q(a+b)*3\E** can match the text "(a+b)*3". |
* * *
## Regular Expression Instances
| Instance | Description |
| --- | --- |
| . | Matches any single character except "\r\n". |
| uby | Matches "Ruby" or "ruby" |
| rub | Matches "ruby" or "rube" |
| | Matches lowercase vowels: aeiou |
| | Matches any digit, similar to |
| | Matches any ASCII lowercase letter |
| | Matches any ASCII uppercase letter |
| | Matches digits, uppercase and lowercase letters |
| [^aeiou] | Matches characters other than aeiou |
| [^0-9] | Matches characters other than digits |
| \\d | Matches digits, similar to: |
| \\D | Matches non-digits, similar to: [^0-9] |
| \\s | Matches whitespace, similar to: [ \t\r\n\f] |
| \\S | Matches non-whitespace, similar to: [^ \t\r\n\f] |
| \\w | Matches letters, digits, underscores, similar to: |
| \\W | Matches non-letters, non-digits, non-underscores, similar to: [^A-Za-z0-9_] |
| ruby? | Matches "rub" or "ruby": y is optional |
| ruby* | Matches "rub" plus 0 or more y's. |
| ruby+ | Matches "rub" plus 1 or more y's. |
| \\d{3} | Matches exactly 3 digits. |
| \\d{3,} | Matches 3 or more digits. |
| \\d{3,5} | Matches 3, 4, or 5 digits. |
| \\D\\d+ | No group: + repeats \d |
| (\\D\\d)+/ | Grouped: + repeats \D\d pair |
| (uby(, )?)+ | Matches "Ruby", "Ruby, ruby, ruby", etc. |
Note that each character in the table above uses two backslashes. This is because in Java and Scala, the backslash in a string is an escape character. So if you want to output \, you need to write \\ in the string to get a single backslash. See the following instance:
## Instance
import scala.util.matching.Regex
object Test {
def main(args: Array){
val pattern =new Regex("abl\\d+")
val str ="ablaw is able1 and cool"
println((pattern findAllIn str).mkString(","))
}
}
Executing the above code, the output result is:
$ scalac Test.scala $ scala Test able1
YouTip