YouTip LogoYouTip

Julia Strings

\n

Julia Strings

\n\n

A string is a finite sequence composed of zero or more characters. It is the data type used in programming languages to represent text.

\n\n

In Julia, single quotes ' are typically used to create individual characters, while double quotes " or triple quotes """ are used to create strings. For example:

\n\n
\nc = 'x' str = ""  = """ "",Contains a single quote """\n\n
\n\n

String Type Characteristics

\n
    \n
  • The built-in concrete type for strings (and string literals) in Julia is String.
  • \n
  • All string types in Julia are subtypes of the abstract type AbstractString.
  • \n
  • Julia has excellent types for representing single characters through AbstractChar. Char is a built-in subtype of AbstractChar that represents any Unicode character as a 32-bit primitive type (based on UTF-8 encoding).
  • \n
  • Julia strings are immutable - the value of any AbstractString object cannot be changed.
  • \n
\n\n
\n\n

Characters

\n

Individual characters are represented by Char values.

\n\n

Char is a 32-bit primitive type that can be converted to its corresponding integer value (Unicode code point):

\n\n
\njulia> c = 'x'\n\n'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)\n\njulia> typeof(c)\n\nChar\n\njulia> c = Int('x')\n\n120\n\njulia> typeof(c)\n\nInt64\n\n
\n\n

We can also convert integer values to Char:

\n\n
\njulia> Char(97)\n\n'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)\n\njulia> Char(120)\n\n'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)\n\n
\n\n

Char values can be compared and used in limited arithmetic operations:

\n\n
\njulia> 'A'  'A' <= 'a'  'A' <= 'X'  'x' - 'a'\n\n23\n\njulia> 'A' + 1\n\n'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)\n\n
\n\n
\n\n

Strings

\n

Strings in Julia can be declared using double quotes " or triple double quotes """. If you need to use quotes within a string, you can use triple double quotes as shown below:

\n\n
\njulia> str = ""\n\n""\n\njulia>  = """ "",Contains a single quote """\n\n" "",Contains a single quote "\n\n
\n\n

If a string is too long, we can use backslash to split it:

\n\n
\njulia> "This is a long \nline"\n\n"This is a long line"\n\n
\n\n

Julia strings can be indexed like arrays to read specific characters or substrings. Indexing starts at 1 or begin, and ends at end:

\n\n
\njulia> str = ""\n\n""\n\njulia> str\n\n'R': ASCII/Unicode U+0052 (category Lu: Letter, uppercase)\n\njulia> str\n\n'R': ASCII/Unicode U+0052 (category Lu: Letter, uppercase)\n\njulia> str\n\n'U': ASCII/Unicode U+0055 (category Lu: Letter, uppercase)\n\njulia> str\n\n'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)\n\njulia> str\n\n'O': ASCII/Unicode U+004F (category Lu: Letter, uppercase)\n\n
\n\n

We can use range indexing to extract substrings:

\n\n
\njulia> str = ""\n\n""\n\njulia> str[begin:end]\n\n""\n\njulia> str[begin:end-1]\n\n"RUNOO"\n\njulia> str[2:5]\n\n"UNOO"\n\n
\n\n

Additionally, str and str[k:k] produce different results. The former retrieves a character at the specified index (Char type), while the latter reads a substring containing only one character:

\n\n
\njulia> str\n\n'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)\n\njulia> str[6:6]\n\n"B"\n\n
\n\n

Range slicing can also be achieved using the **SubString** method:

\n\n
\njulia> str = "long string"\n\n"long string"\n\njulia> substr = SubString(str, 1, 4)\n\n"long"\n\njulia> typeof(substr)\n\nSubString{String}\n\n
\n\n
\n\n

String Concatenation

\n

We can use the string() method to concatenate multiple strings:

\n\n
\njulia> greet = "Hello"\n\n"Hello"\n\njulia> whom = "world"\n\n"world"\n\njulia> string(greet, ", ", whom, ".n")\n\n"Hello, world.n"\n\n
\n\n
\n\n

Interpolation

\n

Concatenating strings can sometimes be cumbersome. To reduce redundant calls to string or repeated multiplication, Julia allows string interpolation using $ similar to Perl:

\n\n
\njulia> "$greet, $whom.n"\n\n"Hello, world.n"\n\n
\n\n

This is more readable and convenient, and equivalent to the string concatenation above - the system rewrites this single line string literal into a parameterized string concatenation string(greet, ", ", whom, ".n").

\n\n

After $, the shortest complete expression is treated as the value to be inserted into the string. Therefore, you can use parentheses to insert any expression into a string:

\n\n
\njulia> "1 + 2 = $(1 + 2)"\n\n"1 + 2 = 3"\n\n
\n\n

Both concatenation and interpolation call string to convert objects to string form. However, string actually just returns the output of print, so new types should add print or show methods rather than string methods.

\n\n

Most non-AbstractString objects are converted to strings closely matching their textual representation:

\n\n
\njulia> v = [1,2,3]\n\n3-element Vector{Int64}:\n\n1\n\n2\n\n3\n\njulia> "v: $v"\n\n"v: [1, 2, 3]"\n\n
\n\n

string is the identity for AbstractString and AbstractChar values, so they are inserted into strings as themselves without quotes or escaping:

\n\n
\njulia> c = 'x'\n\n'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)\n\njulia> "hi, $c"\n\n"hi, x"\n\n
\n\n

To include literal $ in a string literal, escape it with a backslash:

\n\n
\njulia> print("I have $100 in my account.n")\n\nI have $100 in my account.\n\n
\n\n
\n\n

Unicode and UTF-8

\n

Julia fully supports Unicode characters and strings.

\n\n

In character literals, Unicode codes can be represented using Unicode u and U escape sequences, as well as all standard C escape sequences. These can also be used to write string literals:

\n\n
\njulia> s = "u2200 x u2203 y"\n\n"βˆ€ x βˆƒ y"\n\n
\n\n

Whether these Unicode characters are displayed as escapes or special characters depends on your terminal's locale settings and Unicode support. String literals are encoded in UTF-8. UTF-8 is a variable-length encoding, meaning not all characters are encoded using the same number of bytes (code units). In UTF-8, ASCII characters (those below 0x80(128)) are encoded using single bytes as in ASCII; characters 0x80 and above use up to 4 bytes for encoding.

\n\n
\n\n

Triple-Quoted String Literals

\n

Triple quotes """...""" provide convenience for creating longer and more complex strings. Line breaks, quotes, and indentation can be used freely inside triple quotes without special handling:

\n\n
\njulia> str = """\n\nHello,\n\nworld.\n\n"""\n\n" Hello,n world.n"\n\n
\n\n
\n\n

String Comparison

\n

We can compare strings in dictionary order using comparison operators:

\n\n
\njulia> "abracadabra"  "abracadabra" == "xylophone"\n\nfalse\n\njulia> "Hello, world." != "Goodbye, world."\n\ntrue\n\njulia> "1 + 2 = 3" == "1 + 2 = $(1 + 2)"\n\ntrue\n\n
\n\n

You can use findfirst and findlast functions to search for specific character indices:

\n\n
\njulia> findfirst(isequal('o'), "xylophone")\n\n4\n\njulia> findlast(isequal('o'), "xylophone")\n\n7\n\njulia> findfirst(isequal('z'), "xylophone")\n\n
\n\n

You can use findnext and findprev functions with a third parameter to search for characters at given offsets:

\n\n
\njulia> findnext(isequal('o'), "xylophone", 1)\n\n4\n\njulia> findnext(isequal('o'), "xylophone", 5)\n\n7\n\njulia> findprev(isequal('o'), "xylophone", 5)\n\n4\n\njulia> findnext(isequal('o'), "xylophone", 8)\n\n
\n\n

You can use the occursin function to check if a substring exists in a string:

\n\n
\njulia> occursin("world", "Hello, world.")\n\ntrue\n\njulia> occursin("o", "Xylophon")\n\ntrue\n\njulia> occursin("a", "Xylophon")\n\nfalse\n\njulia> occursin('o', "Xylophon")\n\ntrue\n\n
\n\n

The last example shows that occursin can also be used to search for character literals.

\n\n

There are also two convenient string functions: repeat and join:

\n\n
\njulia> repeat(".:Z:.", 10)\n\n".:Z:..:Z:..:Z:..:Z:..:Z:..:Z:..:Z:..:Z:..:Z:..:Z:."\n\njulia> join(["apples", "bananas", "pineapples"], ", ", " and ")\n\n"apples, bananas and pineapples"\n\n
\n\n

Other useful functions include:

\n
    \n
  • firstindex(str) - Gives the smallest (byte) index that can be used to index into str (for strings this is always 1, but not necessarily for other containers).
  • \n
  • lastindex(str) - Gives the largest (byte) index that can be used to index into str.
  • \n
  • length(str) - Number of characters in str.
  • \n
  • length(str, i, j) - Number of valid character indices from i to j in str.
  • \n
  • ncodeunits(str) - Number of code units in the string.
  • \n
  • codeunit(str, i) - Gives the code unit value at index i in string str.
  • \n
  • thisind(str, i) - Given an arbitrary index of a string, finds the first index at the current character boundary.
  • \n
  • nextind(str, i, n=1) - Finds the start of the nth character after index i.
  • \n
  • prevind(str, i, n=1) - Finds the start of the nth character before index i.
  • \n
\n\n
\n\n

Raw String Literals

\n

Raw strings without interpolation and escaping can be expressed using the non-standard string literal form raw"...". Raw string literals generate ordinary String objects that contain the enclosed content exactly as input without interpolation or escaping. This is useful for strings containing code or markup where " or " are special characters in other languages.

\n\n

Exceptions are that quotes still need escaping. For example, raw""" is equivalent to """. To express all strings, backslashes must also be escaped, but only when they appear immediately before a quote.

\n\n
\njulia> println(raw" "")  "\n\n
\n\n

Note that the first two backslashes are displayed literally because they are not before a quote. However, the next backslash character escapes the following backslash; and since these backslashes appear before a quote, the final backslash escapes a quote.

\n
← Vue3 Composition ApiJulia Basic Operators β†’