YouTip LogoYouTip

Julia Regexes

Regular expressions (regular expression) describe a pattern for string matching, which can be used to check whether a string contains a certain substring, replace the matched substring, or extract substrings that meet certain conditions from a string, etc. Julia has Perl-compatible regular expressions (regexes). Julia's regular expressions have three forms, namely matching, substitution, and transformation: * Matching: m// (can also be abbreviated as //, omitting m) * Substitution: s/// * Transformation: tr/// These three forms are generally used with **=~** or **!~**, where =~ means match and !~ means no match. In Julia, regular expressions are input with the r prefix: ## Example julia> re = r"^s*(?:#|$)" r"^s*(?:#|$)" julia> typeof(re) Regex To check if a regular expression matches a string, use occursin: ## Example julia> occursin(r"^s*(?:#|$)","not a comment") false julia> occursin(r"^s*(?:#|$)","# a comment") true As you can see, **occursin** only returns true or false, indicating whether the given regular expression appears in that string. However, usually we don't just want to know if the string matches, but also want to understand how it matches. To capture matching information, you can use the match function instead: ## Example julia> match(r"^s*(?:#|$)","not a comment") julia> match(r"^s*(?:#|$)","# a comment") RegexMatch("#") If the regular expression does not match the given string, match returns nothingβ€”a special value that doesn't print anything in the interactive prompt. Besides not printing, it is a completely normal value that can be tested in code: ## Example m = match(r"^s*(?:#|$)", line) if m === nothing println("not a comment") else println("blank or comment") end If the regular expression matches, the return value of match is a RegexMatch object. These objects record how the expression matched, including the substring matched by the pattern and any captured substrings. The example above only captured the matched portion of the substring, but perhaps we want to capture any non-empty text after the comment character. We can do this: ## Example julia> m = match(r"^s*(?:#s*(.*?)s*$|$)","# a comment ") RegexMatch("# a comment ",1="a comment") When calling match, you can optionally specify the index to start searching from. For example: ## Example julia> m = match(r"","aaaa1aaaa2aaaa3",1) RegexMatch("1") julia> m = match(r"","aaaa1aaaa2aaaa3",6) RegexMatch("2") julia> m = match(r"","aaaa1aaaa2aaaa3",11) RegexMatch("3") You can extract the following information from a RegexMatch object: * The entire matched substring: m.match * Captured substrings as a string array: m.captures * The offset at the start of the entire match: m.offset * The offsets of captured substrings as a vector: m.offsets When a capture does not match, m.captures contains nothing instead of a substring at that position; additionally, the offset in m.offsets is 0 (recall that Julia's indexing starts at 1, so a zero offset for a string is invalid). Here are two somewhat contrived examples: ## Example julia> m = match(r"(a|b)(c)?(d)","acd") RegexMatch("acd",1="a",2="c",3="d") julia> m.match "acd" julia> m.captures 3-element Vector{Union{Nothing, SubString{String}}}: "a" "c" "d" julia> m.offset 1 julia> m.offsets 3-element Vector{Int64}: 1 2 3 julia> m = match(r"(a|b)(c)?(d)","ad") RegexMatch("ad",1="a",2=nothing,3="d") julia> m.match "ad" julia> m.captures 3-element Vector{Union{Nothing, SubString{String}}}: "a" nothing "d" julia> m.offset 1 julia> m.offsets 3-element Vector{Int64}: 1 0 2 It is convenient to have captures returned as an array so that you can use destructuring syntax to bind them to local variables. For convenience, the RegexMatch object implements an iterator method that passes through to the captures field, so you can directly destructure the match object: ## Example julia> first, second, third = m; first "a" Access to captures can also be achieved by indexing the RegexMatch object with the number or name of the capture group: ## Example julia> m=match(r"(?d+):(?d+)","12:45") RegexMatch("12:45", hour="12", minute="45") julia> m[:minute] "45" julia> m "45" When using replace, you can reference captures in the replacement string by using n to reference the n-th capture group
← Python Bank SystemJulia Mathematical Functions β†’