Question 1 (a): What value will be stored in the variable “X”?
X ← vector(”complex”, 3)
The variable “X” will be initialized as a vector of complex numbers with 3 elements. However, the actual values of those elements will depend on how they are initialized.
Question 1 (b): Write an R statement to extract the rows from a data frame “df” that does not have missing values.
The R statement would be:
df_complete <- df[complete.cases(df), ]
The complete.cases()
function returns a logical vector indicating whether each row in the data frame has no missing values. The [
operator is used to subset the data frame, keeping only the rows that are returned by complete.cases()
.
Question 1 (c): Write the output for statements 1 and 2 in the following R script.
y <- c(2, 1, 5, 7, 8, 3, 2, 4, 5) length(y) <- 4 print(y) #statement 1 length(y) <- 6 print(y) #statement 2
Output of Statement 1:
2 1 5 7
Output of Statement 2:
2 1 5 7 NA NA
Question 1 (d): For the given factor f← factor(c(”abc”, “abc”, “cab”, “bac”, “abc”, “cab”, “cab”)). What will table(f) return?
Output:
f
abc bac cab
3 1 3
Question 1 (e): What are the two compulsory files in a package directory structure?
The two compulsory files in a package directory structure are:
- DESCRIPTION: This file contains information about the package such as its name, version, author, description, and dependencies.
- NAMESPACE: This file contains information about the package’s namespace, which determines which functions, variables, and other objects are visible to the user when the package is loaded.
Question 1 (f): What is the difference between the functions “read.csv” and “read.csv2”?
Both read.csv
and read.csv2
are used to read comma-separated files in R, but the difference between them lies in the default values for the argument sep
and dec
.read.csv
assumes that the separator in the input file is a comma (,
) and the decimal point is a period (.
).read.csv2
assumes that the separator in the input file is a semicolon (;
) and the decimal point is a comma (,
).
Question 2: Consider “Student” table in a MySQL database ‘db1’:
Student(roll_no, name, city, course)
Write an R script to perform the following tasks:
(i) Load relevant packages to connect with the database.
(ii) Establish the connection with the ‘db1’ database.
(iii) Display all tables of the database ‘db1’.
(iv) Display the total number of students from the ‘Student’ table.
(v) Close the database connection.
Output:
# Load relevant packages library(RMySQL) # Establish connection with the 'db1' database con <- dbConnect(MySQL(), user='username', password='password', dbname='db1', host='localhost') # Display all tables of the database 'db1' tables <- dbListTables(con) print(tables) # Display the total number of students from the 'Student' table query <- "SELECT COUNT(*) FROM Student" result <- dbGetQuery(con, query) print(result) # Close the database connection dbDisconnect(con)
Question 3 (a): Write output for the following command:
switch(5%/%2, sum(2:8), summary(c(’a’, ‘b’)), sample(10, 5))
The output of the given command will be a vector of 5 random numbers between 1 and 10 generated using the sample()
function.
Explanation:
The switch()
function in R is used to evaluate different expressions based on the value of a given condition. In this case, the condition is 5 %/% 2
, which represents the integer division of 5 by 2.
Let’s break down the different parts of the switch()
statement:
sum(2:8)
: This expression calculates the sum of numbers from 2 to 8, which is 2 + 3 + 4 + 5 + 6 + 7 + 8 = 35. However, since the condition5 %/% 2
is not equal to 1 (the first case value in theswitch()
statement), this expression is not selected.summary(c('a', 'b'))
: This expression generates a summary of the character vector ‘a’ and ‘b’, which provides information about their lengths and the number of occurrences of each unique value. However, since the condition5 %/% 2
is not equal to 2 (the second case value in theswitch()
statement), this expression is not selected.sample(10, 5)
: This expression generates a random sample of 5 numbers chosen without replacement from the numbers 1 to 10. The output will be a vector of 5 random numbers. In this case, the condition5 %/% 2
is equal to 2, which matches the second case value in theswitch()
statement. Therefore, this expression is selected, and the output will be a vector of 5 randomly chosen numbers between 1 and 10.
Overall, the output of the switch()
command will be [10, 1, 5, 2, 4]
. This is the output from the selected expression, which is sample(10, 5)
.
Question 3 (b): Given a list L as:
L <- list( a=2, b=3, twin= c(2, 2), trip= c(2, 2, 2) )
what will be the output of the following R statements?
(i) unlist(L)
(ii) lapply(L, length)
(iii) sapply(L, length)
Output:
# (i)
a b twin1 twin2 trip1 trip2 trip3
2 3 2 2 2 2 2
# (ii)
$a
[1] 1
$b
[1] 1
$twin
[1] 2
$trip
[1] 3
# (iii)
a b twin trip
1 1 2 3
Question 4: Consider the following data frame ‘df’.
SNo | Value | Class |
1 | 98 | A |
2 | 21 | B |
3 | 67 | C |
4 | 23 | A |
5 | 11 | A |
6 | 12 | C |
7 | 34 | C |
8 | 56 | B |
9 | 78 | A |
10 | 90 | C |
11 | 12 | C |
Write an R script to perform the following:
(i) Display the rows of “df” where Class is “A”
(ii) Display the total values for each class.
(iii) Create a suitable plot to show the statistical summary of all values with respect to their class.
# Load the dataset into a data frame df <- data.frame(SNo = 1:11, Value = c(98, 21, 67, 23, 11, 12, 34, 56, 78, 90, 12), Class = c("A", "B", "C", "A", "A", "C", "C", "B", "A", "C", "C")) # (i) Display the rows of “df” where Class is “A” subset(df, Class == "A") # (ii) Display the total values for each class. aggregate(df$Value, by=list(df$Class), FUN=sum) # (iii) Create a suitable plot to show the statistical summary of all values with respect to their class. boxplot(df$Value ~ df$Class, col="lightblue", main="Boxplot of Values by Class")
For better understanding, the output of the above code is:
# Output of (i)
SNo Value Class
1 1 98 A
4 4 23 A
5 5 11 A
9 9 78 A
# Output of (ii)
Group.1 x
1 A 210
2 B 77
3 C 305
Question 5 (a): Given a data frame “rect” containing the length and height of five rectangles and a function “rect_area” to compute the area of rectangles as:
rect <- data_frame(L=c(10, 5.5, 6, 7.8, 9.7) B= c(6, 4, 1.2, 3, 4)) rect_area <- function(a, b) { a*b }
Write an R statement to create a package called “my_area” to compute the area of rectangles using given data frame and function.
For rect_area.R file:
rect_area <- function(a, b) { a * b }
For the “my_area.Rd” file:
\name{rect_area}
\title{Compute the area of a rectangle}
\description{
This function computes the area of a rectangle given its length and height.
}
\usage{
rect_area(a, b)
}
\arguments{
\item{a}{length of rectangle}
\item{b}{height of rectangle}
}
\value{
The area of the rectangle.
}
Question 5 (b): For the given vectors ‘x’ and ‘y’,
x <- matrix(rep(1:3, each = 2), nrow = 3, ncol= 2) y <- matrix(rep(1:3, length out = 6), nrow = 2, ncol= 3)
What will be the output of:
(i) x%*%y
(ii) x*t(y)
Output:
# Output for x %*% y
[,1] [,2] [,3]
[1,] 4 10 16
[2,] 4 10 16
[3,] 4 10 16
# Output for x * t(y)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
Question 6: Consider the following dataset that shows the number of times the task 5 is performed by either P1, P2 or jointly by P1 and P2:
Task/Person | P1 | P2 | Jointly |
Laundry | 56 | 34 | 4 |
Meal | 24 | 10 | 4 |
Cleaning | 53 | 23 | 20 |
Dishes | 32 | 56 | 40 |
Finances | 13 | 23 | 70 |
Driving | 10 | 78 | 0 |
Holidays | 0 | 4 | 0 |
Write R script to:
(i) Find the tasks which are performed more by the P1 than the P2.
(ii) Display the tasks that are jointly performed by P1 and P2.
(iii) Give a suitable plot to show the frequency of each task performed by P1 and P2. Give appropriate labels and legends.
# create the dataset task_person <- data.frame(Task_Person = c("Laundry", "Meal", "Cleaning", "Dishes", "Finances", "Driving", "Holidays"), P1 = c(56, 24, 53, 32, 13, 10, 0), P2 = c(34, 10, 23, 56, 23, 78, 4), Jointly = c(4, 4, 20, 40, 70, 0, 0)) # display the tasks which are performed more by the P1 than the P2 task_person[task_person$P1 > task_person$P2, "Task_Person"] # display the tasks that are jointly performed by P1 and P2 task_person[task_person$Jointly > 0, "Task_Person"] # create a plot to show the frequency of each task performed by P1 and P2 library(ggplot2) library(reshape2) # reshape the data into long format task_person_long <- melt(task_person, id.vars = "Task_Person") # create the plot ggplot(data = task_person_long, aes(x = Task_Person, y = value, fill = variable)) + geom_bar(stat = "identity", position = "dodge") + labs(x = "Task", y = "Frequency", fill = "Person") + ggtitle("Frequency of Tasks Performed by P1 and P2") + theme(plot.title = element_text(hjust = 0.5))
Question 7 (a): Write an R script to read a file “my_file.txt”:
(i) Headers as in input file,
(ii) Separator as newline character,
(iii) Indicate blank rows as missing values,
(iv) Quoting strings as ‘ ‘.
# Read file with headers, newline separator, and blank rows as missing values my_data <- read.table("my_file.txt", header = TRUE, sep = "\n", na.strings = "", quote = "'")
Question 7 (b): What will be the output of “f(5)”? Function “f” is defined as follows:
f <- function(x) { f <- function(x) { print(x^2) } f(x) +1 }
The output of f(5) will be 26.