Data types refer to a broad system used to declare different types of variables or functions.
\\n\\nThe type of a variable determines the storage space it occupies and how the stored bit pattern is interpreted.
\\n\\nThe most fundamental data types in R are mainly three:
\\n\\n- \\n
- Numeric \\n
- Logical \\n
- Text \\n
Numeric constants are mainly of two types:
\\n\\n| General form | \\n123 -0.125 | \\n
| Scientific notation | \\n1.23e2 -1.25E-1 | \\n
The logical type is often called Boolean in many other programming languages, with constant values only being TRUE and FALSE.
\\n\\nNote: R is case-sensitive. true or True cannot represent TRUE.
\\n\\nThe most intuitive data type is the text type. Text is what is commonly referred to as a string in other languages, with constants enclosed in double quotes. In R, text constants can be enclosed in either single or double quotes, for example:
\\n\\nExample
\\n\\n\\n\\n''==""
\\nTRUE
\\n
Regarding variable definition in R, unlike the syntax rules in some strongly typed languages where you need to set a name and data type for a variable, whenever you use the assignment operator in R, you are actually defining a new variable:
\\n\\nExample
\\n\\n\\n\\na =1
\\nb <- TRUE
\\nb ="abc"
\\n
By object type, there are the following 6 types (these types will be introduced in detail later):
\\n\\n- \\n
- Vector \\n
- List \\n
- Matrix \\n
- Array \\n
- Factor \\n
- Data Frame \\n
Vector
\\n\\nVectors are often provided in the standard libraries of specialized programming languages like Java, Rust, and C#. This is because vectors are indispensable tools in mathematical operationsβthe most common vectors are two-dimensional vectors, which are inevitably used in planar coordinate systems.
\\n\\nFrom a data structure perspective, a vector is a linear list and can be considered an array.
\\n\\nThe existence of vectors as a type in R makes vector operations easier:
\\n\\nExample
\\n\\n\\n\\n> a =c(3, 4)
\\n> b =c(5, 0)
\\n> a + b
\\n8 4
\\n>
\\n
c() is a function for creating vectors.
\\n\\nHere, adding two two-dimensional vectors results in a new two-dimensional vector (8, 4). If you perform an operation between a two-dimensional vector and a three-dimensional vector, it will lose mathematical meaning. Although it won't stop running, you will get a warning.
\\n\\nI suggest you avoid this situation out of habit.
\\n\\nEach element in a vector can be accessed individually using an index:
\\n\\nExample
\\n\\n\\n\\n> a =c(10, 20, 30, 40, 50)
\\n> a
\\n20
\\n
Note: The "index" in R does not represent an offset, but rather which position it is, meaning it starts from 1!
\\n\\nR can also conveniently extract a part of a vector:
\\n\\nExample
\\n\\n\\n\\n> a[1:4]# Extract items 1 to 4, including items 1 and 4
\\n10 20 30 40
\\n> a[c(1, 3, 5)]# Extract items 1, 3, 5
\\n10 30 50
\\n> a[c(-1, -5)]# Remove items 1 and 5
\\n20 30 40
\\n
These three partial extraction methods are the most commonly used.
\\n\\nVectors support scalar operations:
\\n\\nExample
\\n\\n\\n\\n>c(1.1, 1.2, 1.3)-0.5
\\n0.6 0.7 0.8
\\n> a =c(1,2)
\\n> a ^2
\\n1 4
\\n
The commonly used mathematical operation functions mentioned earlier, such as sqrt and exp, can also be used for scalar operations on vectors.
\\n\\nAs a linear list structure, "vectors" should have some common linear list processing functions, and R indeed has these functions:
\\n\\nVector sorting:
\\n\\nExample
\\n\\n\\n\\n> a =c(1, 3, 5, 2, 4, 6)
\\n>sort(a)
\\n1 2 3 4 5 6
\\n>rev(a)
\\n6 4 2 5 3 1
\\n>order(a)
\\n1 4 2 5 3 6
\\n> a[order(a)]
\\n1 2 3 4 5 6
\\n
The order() function returns an index vector after sorting the vector.
\\n\\nVector Statistics
\\n\\nR has very complete statistical functions:
\\n\\n| Function Name | \\nMeaning | \\n
|---|---|
| sum | \\nSum | \\n
| mean | \\nMean | \\n
| var | \\nVariance | \\n
| sd | \\nStandard Deviation | \\n
| min | \\nMinimum | \\n
| max | \\nMaximum | \\n
| range | \\nRange (a two-dimensional vector, maximum and minimum) | \\n
Vector statistics example:
\\n\\nExample
\\n\\n\\n\\n>sum(1:5)
\\n15
\\n>sd(1:5)
\\n1.581139
\\n>range(1:5)
\\n1 5
\\n
Vector Generation
\\n\\nVectors can be generated using the c() function, or using the min:max operator to generate a continuous sequence.
\\n\\nIf you want to generate an arithmetic sequence with gaps, you can use the seq function:
\\n\\n\\n\\n\\n> seq(1, 9, 2) 1 3 5 7 9
\\n
seq can also generate an arithmetic sequence from m to n, you just need to specify m, n, and the length of the sequence:
\\n\\n\\n\\n\\n> seq(0, 1, length.out=3) 0.0 0.5 1.0
\\n
rep stands for repeat, and can be used to generate a repeating number sequence:
\\n\\n\\n\\n\\n> rep(0, 5) 0 0 0 0 0
\\n
NA and NULL are often used in vectors. Here is an introduction to these two terms and their differences:
\\n\\n- \\n
- NA represents "missing", NULL represents "non-existent". \\n
- NA is like a placeholder, representing that there is no value here, but the position exists. \\n
- NULL represents that the data does not exist. \\n
Example illustration:
\\n\\nExample
\\n\\n\\n\\n>length(c(NA, NA, NULL))
\\n2
\\n>c(NA, NA, NULL, NA)
\\nNA NA NA
\\n
Obviously, NULL has no meaning in a vector.
\\n\\n\\n\\n
Logical Type
\\n\\nLogical vectors are mainly used for logical operations on vectors, for example:
\\n\\nExample
\\n\\n\\n\\n>c(11, 12, 13)>12
\\nFALSE FALSE TRUE
\\n
The which function is a very common logical vector processing function, which can be used to filter the indices of the data we need:
\\n\\nExample
\\n\\n\\n\\n> a =c(11, 12, 13)
\\n> b = a >12
\\n>print(b)
\\nFALSE FALSE TRUE
\\n>which(b)
\\n3
\\n
For example, we need to filter data greater than or equal to 60 and less than 70 from a linear list:
\\n\\nExample
\\n\\n\\n\\n>vector=c(10, 40, 78, 64, 53, 62, 69, 70)
\\n>print(vector[which(vector>=60&vector<70)])
\\n64 62 69
\\n
Similar functions include all and any:
\\n\\nExample
\\n\\n\\n\\n>all(c(TRUE, TRUE, TRUE))
\\nTRUE
\\n>all(c(TRUE, TRUE, FALSE))
\\nFALSE
\\n>any(c(TRUE, FALSE, FALSE))
\\nTRUE
\\n>any(c(FALSE, FALSE, FALSE))
\\nFALSE
\\n
all() is used to check if all elements in a logical vector are TRUE, any() is used to check if a logical vector contains any TRUE.
\\n\\n\\n\\n
String
\\n\\nThe string data type itself is not complex; here we focus on introducing string operation functions:
\\n\\nExample
\\n\\n\\n\\n>toupper("")# Convert to uppercase
\\n""
\\n>tolower("")# Convert to lowercase
\\n""
\\n>nchar("Chinese", type="bytes")# Count byte length
\\n4
\\n>nchar("Chinese", type="char")# Count total number of characters
\\n2
\\n>substr("123456789", 1, 5)# Extract substring, from 1 to 5
\\n"12345"
\\n>substring("1234567890", 5)# Extract substring, from 5 to the end
\\n"567890"
\\n>as.numeric("12")# Convert string to number
\\n12
\\n>as.character(12.34)# Convert number to string
\\n"12.34"
\\n>strsplit("2019;10;1", ";")# Split string by delimiter
\\n[]
\\n"2019""10""1"
\\n>gsub("/", "-", "2019/10/1")# Replace in string
\\n"2019-10-1"
\\n
On Windows computers, the GBK encoding standard is used, so one Runoob Tutorial R Data Types character is two bytes. If running on a UTF-8 encoded computer, the byte length of a single Runoob Tutorial R Data Types character should be 3.
\\n\\nR supports regular expressions in Perl language format:
\\n\\nExample
\\n\\n\\n\\n>gsub("[[:alpha:]]+", "$", "Two words")
\\n"$ $"
\\n
For more string content, refer to: R Language String Introduction.
\\n\\n\\n\\n
Matrix
\\n\\nR provides a matrix type for the study of linear algebra. This data structure is similar to a two-dimensional array in other languages, but R provides language-level matrix operation support.
\\n\\nFirst, let's look at matrix generation:
\\n\\nExample
\\n\\n\\n\\n>vector=c(1, 2, 3, 4, 5, 6)
\\n>matrix(vector, 2, 3)
\\n[,1][,2][,3]
\\n[1,]1 3 5
\\n[2,]2 4 6
\\n
The matrix initialization content is passed by a vector, and then you need to specify how many rows and columns the matrix has.
\\n\\nThe values in the vector are filled into the matrix column by column. If you want to fill by row, you need to specify the byrow attribute:
\\n\\nExample
\\n\\n\\n\\n>matrix(vector, 2, 3, byrow=TRUE)
\\n[,1][,2][,3]
\\n[1,]1 2 3
\\n[2,]4 5 6
\\n
Each value in the matrix can be accessed directly:
\\n\\nExample
\\n\\n\\n\\n> m1 =matrix(vector, 2, 3, byrow=TRUE)
\\n> m1[1,1]# Row 1, Column 1
\\n1
\\n> m1[1,3]# Row 1, Column 3
\\n3
\\n
Each column and row in a matrix in R can be named, and this process is done in batch using a string vector:
\\n\\nExample
\\n\\n\\n\\n>colnames(m1)=c("x", "y", "z")
\\n>rownames(m1)=c("a", "b")
\\n> m1
\\nx y z
\\na 1 2 3
\\nb 4 5 6
\\n> m1["a", ]
\\nx y z
\\n1 2 3
\\n
The arithmetic operations on matrices are basically the same as on vectors. They can be performed with scalars or with corresponding elements of matrices of the same size.
\\n\\nMatrix multiplication:
\\n\\nExample
\\n\\n\\n\\n> m1 =matrix(c(1, 2), 1, 2)
\\n> m2 =matrix(c(3, 4), 2, 1)
\\n> m1 %*% m2
\\n[,1]
\\n[1,]11
\\n
Inverse matrix:
\\n\\nExample
\\n\\n\\n\\n> A =matrix(c(1, 3, 2, 4), 2, 2)
\\n>solve(A)
\\n[,1][,2]
\\n[1,]-2.0 1.0
\\n[2,]1.5-0.5
\\n
The solve() function is used to solve linear algebra equations. The basic usage is solve(A,b), where A is the coefficient matrix of the equation system, and b is the vector or matrix of the equations.
\\n\\nThe apply() function can treat each row or column of a matrix as a vector for operations:
\\n\\nExample
\\n\\n\\n\\n>(A =matrix(c(1, 3, 2, 4), 2, 2))
\\n[,1][,2]
\\n[1,]1 2
\\n[2,]3 4
\\n>apply(A, 1, sum)# The second parameter is 1 for row-wise operation, using sum() function
\\n3 7
\\n>apply(A, 2, sum)# The second parameter is 2 for column-wise operation
\\n4 6
\\n
For more matrix content, refer to: R Matrix.
YouTip