R Input Csv File
As a professional statistical tool, R's functionality would be meaningless if data could only be imported and exported manually. Therefore, R supports batch data retrieval from mainstream tabular storage formats (such as CSV, Excel, XML, etc.).
### CSV Table Interaction
CSV (Comma-Separated Values, CSV, sometimes also called character-separated values because the delimiter can be something other than a comma) is a very popular tabular storage file format, suitable for storing small to medium-sized datasets.
Because most software supports this file format, it is commonly used for data storage and exchange.
CSV is essentially text, and its file format is extremely simple: data is saved line by line in text, with each record separated into fields by a delimiter, and each record having the same sequence of fields.
Here is a simple `sites.csv` file (stored in the same directory as the test program):
```csv
id,name,url,likes
1,Google,www.google.com,111
2,,www.,222
3,Taobao,www.taobao.com,333
CSV uses commas to separate columns. If the data contains commas, the entire data block must be enclosed in double quotes.
**Note:** Pay attention to the encoding when saving text containing non-English characters. Since many computers commonly use UTF-8 encoding, I saved it using UTF-8.
**Note:** The last line of a CSV file needs to retain an empty line; otherwise, executing the program will produce a warning message.
Warning message:In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'sites.csv'
!(#)
### Reading CSV Files
Next, we can use the `read.csv()` function to read data from a CSV file:
## Example
```r
data <- read.csv("sites.csv", encoding="UTF-8")
print(data)
If the `encoding` attribute is not set, the `read.csv` function will default to reading with the operating system's default text encoding. If you are using a Chinese version of Windows and have not changed the system's default encoding, the default encoding should be GBK. Therefore, please try to unify the text encoding as much as possible to prevent errors.
Executing the above code produces the following output:
id name url likes
1 1 Google www.google.com 111
2 2 www. 222
3 3 Taobao www.taobao.com 333
The `read.csv()` function returns a data frame, which allows us to conveniently perform statistical processing on the data. In the following example, we check the number of rows and columns:
## Example
```r
data <- read.csv("sites.csv", encoding="UTF-8")
print(is.data.frame(data)) # Check if it is a data frame
print(ncol(data)) # Number of columns
print(nrow(data)) # Number of rows
Executing the above code produces the following output:
TRUE
4
3
The following example finds the maximum value in the `likes` field of the data frame:
## Example
```r
data <- read.csv("sites.csv", encoding="UTF-8")
# Find the maximum value in 'likes'
like <- max(data$likes)
print(like)
Executing the above code produces the following output:
333
We can also specify search conditions, querying data similarly to an SQL `WHERE` clause, using the `subset()` function.
The following example finds the data where `likes` is 222:
## Example
```r
data <- read.csv("sites.csv", encoding="UTF-8")
# Data where 'likes' is 222
retval <- subset(data, likes == 222)
print(retval)
Executing the above code produces the following output:
id name url likes
2 2 www. 222
**Note:** Use `==` for equality conditions.
Multiple conditions are separated by the `&` operator. The following example finds data where `likes` is greater than 1 and `name` is "":
## Example
```r
data 1 and 'name' is ""
retval 1 & name == "")
print(retval)
Executing the above code produces the following output:
id name url likes
2 2 www. 222
### Saving to a CSV File
R language can use the `write.csv()` function to save data to a CSV file.
Continuing from the previous example, we save the data where `likes` is 222 to a file named `.csv`:
## Example
```r
data <- read.csv("sites.csv", encoding="UTF-8")
# Data where 'likes' is 222
retval <- subset(data, likes == 222)
# Write to a new file
write.csv(retval, ".csv")
newdata <- read.csv(".csv")
print(newdata)
Executing the above code produces the following output:
X id name url likes
1 2 2 www. 222
The `X` comes from the dataset `newper`. It can be removed by setting the parameter `row.names = FALSE`:
## Example
```r
data <- read.csv("sites.csv", encoding="UTF-8")
# Data where 'likes' is 222
retval <- subset(data, likes == 222)
# Write to a new file
write.csv(retval, ".csv", row.names = FALSE)
newdata <- read.csv(".csv")
print(newdata)
Executing the above code produces the following output:
id name url likes
1 2 www. 222
After execution, we can see the `.csv` file is generated:
!(#)
YouTip