R Programming Questions

Last updated on May 16, 2023

Question 1 : Write an R script to do the following:
a) simulate a sample of 100 random data points from a normal distribution with mean 100 and
standard deviation 5 and store the result in a vector.
b) visualize the vector created above using different plots.
c) test the hypothesis that the mean equals 100.
d) use wilcox test to test the hypothesis that mean equals 90

Sure! Here’s an R script that accomplishes the tasks you mentioned:

# Set the seed for reproducibility
set.seed(123)

# a) Simulate a sample of 100 random data points from a normal distribution
data <- rnorm(100, mean = 100, sd = 5)

# b) Visualize the vector using different plots
# Histogram
hist(data, main = "Histogram of Data", xlab = "Value", ylab = "Frequency")

# Boxplot
boxplot(data, main = "Boxplot of Data", ylab = "Value")

# c) Test the hypothesis that the mean equals 100
t.test(data, mu = 100)

# d) Use Wilcoxon test to test the hypothesis that the mean equals 90
wilcox.test(data, mu = 90)

In this script, we first use the rnorm function to generate a sample of 100 random data points from a normal distribution with a mean of 100 and a standard deviation of 5. We store the result in the data vector.

Then, we proceed to visualize the data vector using different plots. The script includes a histogram and a boxplot to provide a visual representation of the data.

Next, we test the hypothesis that the mean of the data equals 100 using the t.test function. This function performs a one-sample t-test, comparing the mean of the data to the specified null hypothesis value of 100.

Finally, we use the wilcox.test function to perform a Wilcoxon signed-rank test. This test is used to compare the distribution of the data to a hypothesized median value of 90.

Feel free to run this script in an R environment to see the results and explore the visualizations.

Question 2 : Using the Algae data set from package DMwR to complete the following tasks.
a) create a graph that you find adequate to show the distribution of the values of algae a6.
b) show the distribution of the values of size 3.
c) check visually if oPO4 follows a normal distribution.
d) produce a graph that allows you to understand how the values of NO3 are distributed
across the sizes of river.
e) using a graph check if the distribution of algae a1 varies with the speed of the river.
f) visualize the relationship between the frequencies of algae a1 and a6. Give the
appropriate graph title, x-axis and y-axis title.

To complete the tasks using the Algae dataset from the DMwR package in R, you need to install and load the package first. Here’s an R script that accomplishes the tasks:

# Install and load the required package
install.packages("DMwR")
library(DMwR)

# Load the Algae dataset
data(Algae)

# a) Create a graph to show the distribution of the values of algae a6
hist(Algae$a6, main = "Distribution of Algae a6", xlab = "Algae a6 Value", ylab = "Frequency")

# b) Show the distribution of the values of size 3
hist(Algae[Algae$size == 3, "a6"], main = "Distribution of Algae a6 for Size 3", xlab = "Algae a6 Value", ylab = "Frequency")

# c) Check visually if oPO4 follows a normal distribution
hist(Algae$oPO4, main = "Distribution of oPO4", xlab = "oPO4 Value", ylab = "Frequency")

# d) Produce a graph to understand how the values of NO3 are distributed across the sizes of the river
boxplot(Algae$NO3 ~ Algae$riverSize, main = "Distribution of NO3 across River Sizes", xlab = "River Size", ylab = "NO3 Value")

# e) Check if the distribution of algae a1 varies with the speed of the river
boxplot(Algae$a1 ~ Algae$riverSpeed, main = "Distribution of Algae a1 across River Speeds", xlab = "River Speed", ylab = "Algae a1 Value")

# f) Visualize the relationship between the frequencies of algae a1 and a6
plot(Algae$a1, Algae$a6, main = "Relationship between Algae a1 and a6", xlab = "Algae a1", ylab = "Algae a6")

In this script, we install the DMwR package, load it into the R session, and then load the Algae dataset using the data function.

For task a), we create a histogram to show the distribution of the values of algae a6 using the hist function.

For task b), we create a histogram specifically for the values of algae a6 where the size is 3.

For task c), we create a histogram to visually check if the variable oPO4 follows a normal distribution.

For task d), we use a boxplot to understand how the values of NO3 are distributed across the different sizes of the river.

For task e), we use a boxplot to check if the distribution of algae a1 varies with the speed of the river.

For task f), we create a scatter plot to visualize the relationship between the frequencies of algae a1 and a6. The graph title, x-axis, and y-axis titles are provided.

You can run this script in an R environment with the DMwR package installed to see the graphs and explore the distributions of the variables in the Algae dataset.

Question 3 : Read the file Coweeta.CSV and write an R script to do the following:
a) count the number of observations per species.
b) take a subset of the data including only those species with at least 10 observations.
c) make a scatter plot of biomass versus height, with the symbol colour varying by species,
and use filled squares for the symbols. Also add a title to the plot, in italics.
d) log-transform biomass, and redraw the plot.

To accomplish the tasks using the Coweeta.CSV file in R, you can follow this R script:

# Read the CSV file
data <- read.csv("Coweeta.CSV")

# a) Count the number of observations per species
species_counts <- table(data$species)

# b) Subset the data to include only species with at least 10 observations
filtered_data <- subset(data, species %in% names(species_counts[species_counts >= 10]))

# c) Make a scatter plot of biomass versus height, with symbol color varying by species and using filled squares
plot(filtered_data$height, filtered_data$biomass, pch = 15, col = filtered_data$species, main = italic("Scatter Plot of Biomass versus Height"), xlab = "Height", ylab = "Biomass")

# d) Log-transform biomass and redraw the plot
filtered_data$log_biomass <- log(filtered_data$biomass)
plot(filtered_data$height, filtered_data$log_biomass, pch = 15, col = filtered_data$species, main = italic("Scatter Plot of Log-Transformed Biomass versus Height"), xlab = "Height", ylab = "Log-Transformed Biomass")

In this script, we first read the Coweeta.CSV file using the read.csv function and store the data in the data variable.

For task a), we use the table function to count the number of observations per species. The result is stored in the species_counts variable.

For task b), we subset the data to include only those species with at least 10 observations using the subset function and the %in% operator. The resulting filtered data is stored in the filtered_data variable.

For task c), we create a scatter plot of biomass versus height using the plot function. We use filled squares (pch = 15) as symbols and vary the symbol color by species (col = filtered_data$species). The plot is given a title using the main argument, with the title text specified as italic. The x-axis label is set as “Height”, and the y-axis label is set as “Biomass”.

For task d), we log-transform the biomass values by applying the log function to the biomass variable in the filtered_data dataset. We then redraw the scatter plot using the log-transformed biomass values, following the same steps as in task c).

Make sure to update the file path in the read.csv function to the correct location of the Coweeta.CSV file on your system. You can run this script in an R environment to perform the specified tasks and visualize the scatter plots.

Tags:

← Previous Lesson Next Lesson →

R Programming Questions

Privacy Policy

Consent

Information we collect

How we use your information

Log Files

Cookies and Web Beacons

Google DoubleClick DART Cookie

Advertising Partners Privacy Policies

Third-Party Privacy Policies

CCPA Privacy Rights (Do Not Sell My Personal Information)

GDPR Data Protection Rights

Children’s Information

Contact