R Examples
## R Programming Examples: A Comprehensive Developer's Guide
R is a powerful, open-source programming language and environment designed specifically for statistical computing, data analysis, and high-quality data visualization.
This comprehensive guide provides a curated set of practical R examples categorized by core programming concepts, data structures, statistical analysis, and data visualization. Whether you are a beginner looking to understand basic syntax or an experienced developer needing a quick reference, these examples will help you write clean, efficient, and idiomatic R code.
---
## 1. Basic Syntax and Control Flow
These foundational examples cover basic operations, conditional statements, and loops in R.
### Hello World and Basic Arithmetic
```R
# Print a string to the console
print("Hello, World!")
# Basic arithmetic operations
a <- 15
b <- 4
sum_val <- a + b # Addition
diff_val <- a - b # Subtraction
prod_val <- a * b # Multiplication
div_val <- a / b # Division
mod_val <- a %% b # Modulo (remainder)
exp_val <- a ^ b # Exponentiation
cat("Sum:", sum_val, "\nModulo:", mod_val, "\nExponent:", exp_val)
```
### Conditional Statements (if-else)
```R
score <- 85
if (score >= 90) {
print("Grade: A")
} else if (score >= 80) {
print("Grade: B")
} else {
print("Grade: C")
}
# Vectorized conditional using ifelse()
status <- ifelse(score >= 60, "Pass", "Fail")
print(status)
```
### Loops (for, while)
```R
# For loop iterating over a sequence
for (i in 1:5) {
cat("Iteration:", i, "\n")
}
# While loop
counter <- 1
while (counter <= 3) {
cat("Counter is at:", counter, "\n")
counter <- counter + 1
}
```
---
## 2. Core Data Structures
R features several built-in data structures optimized for scientific computing.
### Vectors
Vectors are the most basic data structure in R and must contain elements of the same type.
```R
# Create a numeric vector
numeric_vector <- c(1.2, 3.4, 5.6)
# Create a sequence of integers
sequence_vector <- 1:10
# Vector operations (vectorized by default)
doubled_vector <- numeric_vector * 2
print(doubled_vector)
```
### Lists
Lists can contain elements of different types, including other lists or vectors.
```R
# Create a list with mixed data types
employee_record <- list(
id = 101,
name = "Alice",
skills = c("R", "Python", "SQL"),
active = TRUE
)
# Accessing list elements
print(employee_record$name)
print(employee_record[]) # Access first skill
```
### Matrices
Matrices are two-dimensional arrays where all elements must be of the same data type.
```R
# Create a 3x3 matrix
matrix_data <- matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE)
print(matrix_data)
# Accessing elements: matrix[row, column]
element_2_3 <- matrix_data[2, 3] # Row 2, Column 3
row_1 <- matrix_data[1, ] # Entire first row
```
### Data Frames
Data frames are two-dimensional, tabular structures where columns can contain different data types. This is the standard structure for datasets.
```R
# Create a data frame
df <- data.frame(
EmployeeID = c(1, 2, 3),
Name = c("John", "Emma", "Luke"),
Salary = c(55000, 62000, 48000),
stringsAsFactors = FALSE
)
# View structure and summary
str(df)
summary(df)
# Subset data frame
high_earners <- df[df$Salary > 50000, ]
print(high_earners)
```
---
## 3. Functions and Functional Programming
Functions in R are first-class citizens. You can pass them as arguments, return them from other functions, and assign them to variables.
### Defining a Custom Function
```R
# Function to calculate the area of a circle
calculate_circle_area <- function(radius = 1) {
if (radius < 0) {
stop("Radius cannot be negative!")
}
area <- pi * radius^2
return(area)
}
# Call the function
print(calculate_circle_area(5))
```
### The `apply` Family
R discourages explicit loops for data transformations. Instead, use the `apply` family of functions for cleaner, faster execution.
```R
# Create a sample matrix
mat <- matrix(1:12, nrow = 3, ncol = 4)
# Calculate sum of each row (MARGIN = 1)
row_sums <- apply(mat, 1, sum)
# Calculate mean of each column (MARGIN = 2)
col_means <- apply(mat, 2, mean)
# sapply() on a vector to return a simplified vector
squared_values <- sapply(1:5, function(x) x^2)
print(squared_values)
```
---
## 4. Data Manipulation (dplyr)
The `dplyr` package (part of the `tidyverse`) is the industry standard for data manipulation in R.
```R
# Install and load dplyr (uncomment if not installed)
# install.packages("dplyr")
library(dplyr)
# Sample dataset
sales_data <- data.frame(
Region = c("North", "South", "North", "East", "South", "East"),
Product = c("A", "B", "B", "A", "A", "B"),
Revenue = c(1200, 1500, 800, 2000, 1100, 950)
)
# Using the pipe operator (%>%) to chain operations
summary_report <- sales_data %>%
filter(Revenue > 900) %>% # Filter rows
group_by(Region) %>% # Group by Region
summarise(
Total_Revenue = sum(Revenue), # Aggregate data
Avg_Revenue = mean(Revenue),
Transaction_Count = n()
) %>%
arrange(desc(Total_Revenue)) # Sort results
print(summary_report)
```
---
## 5. Statistical Analysis
R was built by statisticians, making statistical modeling and analysis straightforward and highly robust.
### Descriptive Statistics
```R
# Generate a random sample of 100 numbers from a normal distribution
set.seed(42) # Set seed for reproducibility
data_sample <- rnorm(100, mean = 50, sd = 10)
# Calculate basic statistics
mean_val <- mean(data_sample)
median_val <- median(data_sample)
sd_val <- sd(data_sample)
quantiles <- quantile(data_sample, probs = c(0.25, 0.5, 0.75))
cat("Mean:", mean_val, "\nSD:", sd_val, "\n")
```
### Linear Regression
```R
# Using the built-in 'mtcars' dataset
# Predict Miles Per Gallon (mpg) based on Horsepower (hp) and Weight (wt)
linear_model <- lm(mpg ~ hp + wt, data = mtcars)
# Print the model summary
summary(linear_model)
# Make a prediction for a new car
new_car <- data.frame(hp = 110, wt = 2.8)
predicted_mpg <- predict(linear_model, newdata = new_car)
cat("Predicted MPG:", predicted_mpg, "\n")
```
---
## 6. Data Visualization
R offers two primary plotting systems: Base R (built-in) and `ggplot2` (declarative, grammar of graphics).
### Base R Plotting
```R
# Simple scatter plot
plot(mtcars$wt, mtcars$mpg,
main = "Weight vs. MPG",
xlab = "Weight (1000 lbs)",
ylab = "Miles Per Gallon",
col = "blue",
pch = 19)
# Add a trend line
abline(lm(mpg ~ wt, data = mtcars), col = "red", lwd = 2)
```
### Advanced Plotting with ggplot2
```R
# Install and load ggplot2 (uncomment if not installed)
# install.packages("ggplot2")
library(ggplot2)
# Create a styled scatter plot with a regression line
ggplot(data = mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
geom_point(size = 3, alpha = 0.8) +
geom_smooth(method = "lm", se = FALSE, color = "darkgray") +
labs(
title = "Fuel Efficiency vs. Vehicle Weight",
subtitle = "Grouped by Number of Cylinders",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon (MPG)",
color = "Cylinders"
) +
theme_minimal()
```
---
## Considerations and Best Practices
When writing R code, keep the following professional guidelines in mind:
1. **Use the Arrow Operator (`<-`) for Assignment**: While `=` works in most contexts, `<-` is the idiomatic standard in R for variable assignment. Reserve `=` for passing arguments inside functions.
2. **Leverage Vectorization**: Avoid writing explicit `for` loops whenever possible. R is optimized for vectorized operations (e.g., `x + 1` instead of looping through each element of `x`).
3. **Manage Missing Data (`NA`)**: R uses `NA` to represent missing values. Many functions will return `NA` if the input contains missing values unless you explicitly pass `na.rm = TRUE` (e.g., `mean(x, na.rm = TRUE)`).
4. **Prefer the Tidyverse**: For data manipulation and visualization, packages like `dplyr`, `tidyr`, and `ggplot2` provide a more consistent, readable, and maintainable syntax than base R.
5. **Set Seeds for Reproducibility**: When using functions that involve randomness (like `rnorm`, `sample`, or machine learning algorithms), always use `set.seed()` to ensure your results can be exactly replicated.
YouTip