Julia Regexes
Regular expressions (regular expression) describe a pattern for string matching, which can be used to check whether a string contains a certain substring, replace the matched substring, or extract substrings that meet certain conditions from a string, etc.
Julia has Perl-compatible regular expressions (regexes).
Julia's regular expressions have three forms, namely matching, substitution, and transformation:
* Matching: m// (can also be abbreviated as //, omitting m)
* Substitution: s///
* Transformation: tr///
These three forms are generally used with **=~** or **!~**, where =~ means match and !~ means no match.
In Julia, regular expressions are input with the r prefix:
## Example
julia> re = r"^s*(?:#|$)"
r"^s*(?:#|$)"
julia> typeof(re)
Regex
To check if a regular expression matches a string, use occursin:
## Example
julia> occursin(r"^s*(?:#|$)","not a comment")
false
julia> occursin(r"^s*(?:#|$)","# a comment")
true
As you can see, **occursin** only returns true or false, indicating whether the given regular expression appears in that string. However, usually we don't just want to know if the string matches, but also want to understand how it matches. To capture matching information, you can use the match function instead:
## Example
julia> match(r"^s*(?:#|$)","not a comment")
julia> match(r"^s*(?:#|$)","# a comment")
RegexMatch("#")
If the regular expression does not match the given string, match returns nothingβa special value that doesn't print anything in the interactive prompt. Besides not printing, it is a completely normal value that can be tested in code:
## Example
m = match(r"^s*(?:#|$)", line)
if m === nothing
println("not a comment")
else
println("blank or comment")
end
If the regular expression matches, the return value of match is a RegexMatch object. These objects record how the expression matched, including the substring matched by the pattern and any captured substrings. The example above only captured the matched portion of the substring, but perhaps we want to capture any non-empty text after the comment character. We can do this:
## Example
julia> m = match(r"^s*(?:#s*(.*?)s*$|$)","# a comment ")
RegexMatch("# a comment ",1="a comment")
When calling match, you can optionally specify the index to start searching from. For example:
## Example
julia> m = match(r"","aaaa1aaaa2aaaa3",1)
RegexMatch("1")
julia> m = match(r"","aaaa1aaaa2aaaa3",6)
RegexMatch("2")
julia> m = match(r"","aaaa1aaaa2aaaa3",11)
RegexMatch("3")
You can extract the following information from a RegexMatch object:
* The entire matched substring: m.match
* Captured substrings as a string array: m.captures
* The offset at the start of the entire match: m.offset
* The offsets of captured substrings as a vector: m.offsets
When a capture does not match, m.captures contains nothing instead of a substring at that position; additionally, the offset in m.offsets is 0 (recall that Julia's indexing starts at 1, so a zero offset for a string is invalid). Here are two somewhat contrived examples:
## Example
julia> m = match(r"(a|b)(c)?(d)","acd")
RegexMatch("acd",1="a",2="c",3="d")
julia> m.match
"acd"
julia> m.captures
3-element Vector{Union{Nothing, SubString{String}}}:
"a"
"c"
"d"
julia> m.offset
1
julia> m.offsets
3-element Vector{Int64}:
1
2
3
julia> m = match(r"(a|b)(c)?(d)","ad")
RegexMatch("ad",1="a",2=nothing,3="d")
julia> m.match
"ad"
julia> m.captures
3-element Vector{Union{Nothing, SubString{String}}}:
"a"
nothing
"d"
julia> m.offset
1
julia> m.offsets
3-element Vector{Int64}:
1
0
2
It is convenient to have captures returned as an array so that you can use destructuring syntax to bind them to local variables. For convenience, the RegexMatch object implements an iterator method that passes through to the captures field, so you can directly destructure the match object:
## Example
julia> first, second, third = m; first
"a"
Access to captures can also be achieved by indexing the RegexMatch object with the number or name of the capture group:
## Example
julia> m=match(r"(?d+):(?d+)","12:45")
RegexMatch("12:45", hour="12", minute="45")
julia> m[:minute]
"45"
julia> m
"45"
When using replace, you can reference captures in the replacement string by using n to reference the n-th capture group
YouTip