<- c(8, 1.2, 33, 14) x
Subsetting
Learning Objectives
- How to subset atomic vectors and lists.
- Chapters 6 and 7 of HOPR
- Chapter 4 from Advanced R
Subsetting an Atomic Vector
Subsetting is extracting elements from an object.
- Subset because you only want some elements of a vector.
- Subset so you can assign new elements to that subset.
Six ways to subset atomic vector.
- Integer Subsetting:
Put integers in brackets and it will extract those elements. R starts counting at 1.
1] x[
[1] 8
c(1, 3)] x[
[1] 8 33
<- c(1, 3) iset x[iset]
[1] 8 33
This can be used for sorting
order(x)
[1] 2 1 4 3
order(x)] x[
[1] 1.2 8.0 14.0 33.0
You can use duplicate integers to extract elements more than once.
c(2, 2, 2)] x[
[1] 1.2 1.2 1.2
- Negative Integer Subsetting:
Putting negative integers in instead will return all elements except the negative elements.
-1] x[
[1] 1.2 33.0 14.0
c(-1, -3)] x[
[1] 1.2 14.0
-c(1, 3)] x[
[1] 1.2 14.0
- Logical Vector Subsetting:
Wherever there is a
TRUE
will return the element.c(TRUE, FALSE, TRUE, FALSE)] x[
[1] 8 33
- No Subsetting:
Empty brackets will return the original object.
x[]
[1] 8.0 1.2 33.0 14.0
- Zero Subsetting:
Using
0
in a bracket will return a zero-length vector.0] x[
numeric(0)
- Names Subsetting:
If a vector has names, then you can subset using those names in quotes.
names(x) <- c("a", "b", "c", "d") "a"] x[
a 8
c("a", "c")] x[
a c 8 33
c("a", "a")] x[
a a 8 8
If you know what names you want to remove, use
setdiff()
.setdiff(names(x), "a")
[1] "b" "c" "d"
setdiff(names(x), "a")] x[
b c d 1.2 33.0 14.0
Exercise: Explain the output of the following
<- 1:9 y c(TRUE, TRUE, FALSE)] y[
[1] 1 2 4 5 7 8
TRUE] y[
[1] 1 2 3 4 5 6 7 8 9
FALSE] y[
integer(0)
Exercise: Explain the output of the following
<- c(1, 2) y c(TRUE, TRUE, FALSE, TRUE, TRUE, FALSE)] y[
[1] 1 2 NA NA
Exercise: Show all the ways to extract the second element of the following vector:
<- c(af = 3, bd = 6, dd = 2) y
Double brackets enforces that you are only extracting one element. This is really good in places where you know that you should only subset one element (like for-loops).
<- runif(100) x <- 0 sval for (i in seq_along(x)) { <- sval + x[[i]] sval }
Double brackets remove attributes of the vector (even names).
<- c(a = 1, b = 2) x 1] x[
a 1
1]] x[[
[1] 1
List subsetting
If you subset a list using single brackets, you will get a sublist. You can use integers, negative integers, logicals, and names as before
<- list(a = 1:3, b = "hello", c = 4:6) x str(x)
List of 3 $ a: int [1:3] 1 2 3 $ b: chr "hello" $ c: int [1:3] 4 5 6
1] x[
$a [1] 1 2 3
c(1, 3)] x[
$a [1] 1 2 3 $c [1] 4 5 6
-1] x[
$b [1] "hello" $c [1] 4 5 6
c(TRUE, FALSE, FALSE)] x[
$a [1] 1 2 3
"a"] x[
$a [1] 1 2 3
c("a", "c")] x[
$a [1] 1 2 3 $c [1] 4 5 6
Using double brackets extracts out a single element.
1]] x[[
[1] 1 2 3
"a"]] x[[
[1] 1 2 3
A shorthand for using names inside double brackets is to use dollar signs.
$a x
[1] 1 2 3
Exericse: Why does this not work. Suggest a correction.
<- "a" var $var x
NULL
Data Frame Subsetting
Data frame subsetting behaves both like lists and like matrices.
<- data.frame(a = 1:3, df b = c("a", "b", "c"), c = 4:6)
It behaves like a list for
$
,[[
, and[
if you only provide one index. The columns are the elements of the list.$a df
[1] 1 2 3
1] df[
a 1 1 2 2 3 3
1]] df[[
[1] 1 2 3
c(1, 3)] df[
a c 1 1 4 2 2 5 3 3 6
It behaves like a matrix if you provide two indices.
1:2, 2] df[
[1] "a" "b"
You can keep the data frame structure by using
drop = FALSE
.1:2, 2, drop = FALSE] df[
b 1 a 2 b
It is common to filter by rows by using the matrix indexing.
$a < 3, ] df[df
a b c 1 1 a 4 2 2 b 5
Hadley’s Advanced R Exercises
Fix each of the following common data frame subsetting errors:
$cyl = 4, ] mtcars[mtcars-1:4, ] mtcars[$cyl <= 5] mtcars[mtcars$cyl == 4 | 6, ] mtcars[mtcars
Why does the following code yield five missing values? (Hint: why is it different from
x[NA_real_]
?)<- 1:5 x NA] x[
[1] NA NA NA NA NA
What does
upper.tri()
return? How does subsetting a matrix with it work?<- outer(1:5, 1:5, FUN = "*") x upper.tri(x)] x[
[1] 2 3 6 4 8 12 5 10 15 20
Why does
mtcars[1:20]
return an error? How does it differ from the similarmtcars[1:20, ]
?An
lm
object is a list-like object. Given a linear model, e.g.,mod <- lm(mpg ~ wt, data = mtcars)
, extract the residual degrees of freedom. Then extract the R squared from the model summary (summary(mod)
).
Subassignment
All subsetting operators can be used to assign subsets of a vector new values. This is called subassignment.
<- 1:5 x 2]] <- 200 x[[ x
[1] 1 200 3 4 5
c(1, 3)] <- 0 x[ x
[1] 0 200 0 4 5
== 0] <- NA_real_ x[x x
[1] NA 200 NA 4 5
<- list(a = 1:3, y b = "hello", c = 4:6) $a <- "no way" y y
$a [1] "no way" $b [1] "hello" $c [1] 4 5 6
Remove a list element with
NULL
.1]] <- NULL y[[ y
$b [1] "hello" $c [1] 4 5 6
$b <- NULL y y
$c [1] 4 5 6
Exercises
These are just meant to buff up your Base R skills. Consider the data from the {Sleuth3}
package that contains information on sex and salary at a bank. Try to use just base R methods.
library(Sleuth3)
data("case0102")
<- case0102 sal
What is the salary of the person in the 51st row? Use two different subsetting strategies to get this.
What is the mean salary of Male’s?
How many Females are in the data?
How many Females make over $6000?