What will be the value of each variable after each statement in the following program?
mass <- 47.5
age <- 122
mass <- mass * 2.3
age <- age - 20Solution
Mass - 47.5
Age - 122
Mass - 47.5*2.3 = 109.25
Age - 122-20 = 102
Run the code from the previous challenge, and write a command to compare if mass is larger than age.
mass <- 47.5
age <- 122
mass <- mass * 2.3
age <- age - 20Solution
mass > age
Which of the following are valid R variable names?
min_height
max.height
_age
.mass
MaxLength
min-length
2widths
celsius2kelvin
Solution
.mass creates a hidden variable. We're not covering this here, so best not to use a full stop at the start of a variable name.
_age, min-length, and 2widths are not valid.
Given what you now know about type conversion, look at the class of data in nordic_2$lifeExp (using str(nordic_2$lifeExp)) and compare it with nordic$lifeExp. Why are these columns different classes?
Solution
The data in nordic_2$lifeExp is stored as factors rather than numeric. This is because of the “or” character string in the third data point. “Factor” is R’s special term for categorical data. We will be working more with factor data later in this workshop.
There are several subtly different ways to call variables, observations and elements from data frames:
1. nordic[1]
2. nordic[[1]]
3. nordic$country
4. nordic["country"]
5. nordic[1, 1]
6. nordic[, 1]
7. nordic[1, ]
Try out these examples and explain what is returned by each one.
Hint: Use the function class() to examine what is returned in each case.
Solution
-
nordic[1]= We can think of a data frame as a list of vectors. The single brace [1] returns the first slice of the list, as another list. In this case it is the first column of the data frame. -
nordic[[1]]= The double brace [[1]] returns the contents of the list item. In this case it is the contents of the first column, a vector of type factor. -
nordic$country= This example uses the $ character to address items by name. coat is the first column of the data frame, again a vector of type factor. -
nordic["country"]= Here we are using a single brace ["country"] replacing the index number with the column name. Like example 1, the returned object is a list. -
nordic[1, 1]= This example uses a single brace, but this time we provide row and column coordinates. The returned object is the value in row 1, column 1. The object is an integer but because it is part of a vector of type factor, R displays the label “Denmark” associated with the integer value. -
nordic[, 1]= Like the previous example we use single braces and provide row and column coordinates. The row coordinate is not specified, R interprets this missing value as all the elements in this column vector. -
nordic[1, ] = Again we use the single brace with row and column coordinates. The column coordinate is not specified. The return value is a list containing all the values in the first row.
You can create a new data frame right from within R with the following syntax:
df <- data.frame(id = c("1", "b", "c"), age = c("12", "18", "21"), likesPizza = c(TRUE, TRUE, FALSE), stringsAsFactors = FALSE)
Make a data frame that holds the following information for yourself:
- first name
- last name
- lucky number
Then use rbind to add a new entry.
Finally, use cbind to add a column with each person’s answer to the question, “Do you like pizza?”
Solution
df <- data.frame(first = c("Grace"),
last = c("Hopper"),
lucky_number = c(0),
stringsAsFactors = FALSE)
df <- rbind(df, list("Marie", "Curie", 238) )
df <- cbind(df, likesPizza = c(TRUE, TRUE))
Given the following code:
x <- c(5.4, 6.2, 7.1, 4.8, 7.5)
names(x) <- c('a', 'b', 'c', 'd', 'e')
print(x)
Come up with at least 3 different commands that will produce the following output:
b c d
6.2 7.1 4.8
Solution
x[2:4]
x[-c(1,5)]
x[c("b", "c", "d")]
x[c(2,3,4)]
&AND, returnsTRUEif left and right areTRUE|OR, returnsTRUEif left or right (or both) areTRUEallcompares all elements within a vector, returnsTRUEif every element isTRUEanycompares all elements within a vector, returnsTRUEif one or more element isTRUE
(&& and || only compare first element of a vector and ignore the rest.)
Given the following code:
x <- c(5.4, 6.2, 7.1, 4.8, 7.5)
names(x) <- c('a', 'b', 'c', 'd', 'e')
print(x)Write a subsetting command to return the values in x that are greater than 4 and less than 7.
Solution
x_subset <- x[x<7 & x>4]
print(x_subset)
is.nawill return all positions in a vector, matrix, or data frame containingNA(orNaN)- likewise,
is.nan, andis.infinitewill do the same forNaNandInf is.finitewill return all positions in a vector, matrix, or data.frame that do not containNA,NaNorInf.na.omitwill filter out all missing values from a vector
Fix each of the following common data frame subsetting errors:
- Extract observations collected for the year 1957:
gapminder[gapminder$year = 1957, ] - Extract all columns except 1 through to 4:
gapminder[, -1:4] - Extract the rows where the life expectancy is longer the 80 years:
gapminder[gapminder$lifeExp > 80] - Extract the first row, and the fourth and fifth columns (
lifeExpandgdpPercap):gapminder[1, 4, 5]
Solution
1. gapminder[gapminder$year == 1957, ]
2. gapminder[,-c(1:4)]
3. gapminder[gapminder$lifeExp > 80,]
4. gapminder[1, c(4, 5)]
Write a single command (which can span multiple lines and includes pipes) that will produce a dataframe that has the African values for lifeExp, country and year, but not for other Continents.
Solution
year_country_lifeExp_Africa <- gapminder %>%
filter(continent=="Africa") %>%
select(year,country,lifeExp)
Modify the example so that the figure shows the distribution of gdp per capita (gdpPercap), rather than life expectancy.
Solution
ggplot(data = gapminder, aes(x = gdpPercap)) +
geom_histogram()
Subset the gapminder data to include only data points collected since 1990. Write out the new subset to a file.
Solution
gapminder_after_1990 <- filter(gapminder, year > 1990)
write.csv(gapminder_after_1990,
file = "gapminder-after-1990.csv",
row.names = FALSE)

