Subsetting is extracting elements from an object.
Six ways to subset atomic vector.
Put integers in brackets and it will extract those elements. R starts counting at 1.
## [1] 8
## [1] 8 33
## [1] 8 33
This can be used for sorting
## [1] 2 1 4 3
## [1] 1.2 8.0 14.0 33.0
You can use duplicate integers to extract elements more than once.
## [1] 1.2 1.2 1.2
Putting negative integers in instead will return all elements except the negative elements.
## [1] 1.2 33.0 14.0
## [1] 1.2 14.0
## [1] 1.2 14.0
Wherever there is a TRUE
will return the
element.
## [1] 8 33
Empty brackets will return the original object.
## [1] 8.0 1.2 33.0 14.0
Using 0
in a bracket will return a zero-length
vector.
## numeric(0)
If a vector has names, then you can subset using those names in quotes.
## a
## 8
## a c
## 8 33
## a a
## 8 8
If you know what names you want to remove, use
setdiff()
.
## [1] "b" "c" "d"
## b c d
## 1.2 33.0 14.0
Exercise: Explain the output of the following
## [1] 1 2 4 5 7 8
## [1] 1 2 3 4 5 6 7 8 9
## integer(0)
Exercise: Explain the output of the following
## [1] 1 2 NA NA
Exercise: Show all the ways to extract the second element of the following vector:
Double brackets enforces that you are only extracting one element. This is really good in places where you know that you should only subset one element (like for-loops).
Double brackets remove attributes of the vector (even names).
## a
## 1
## [1] 1
If you subset a list using single brackets, you will get a sublist. You can use integers, negative integers, logicals, and names as before
## List of 3
## $ a: int [1:3] 1 2 3
## $ b: chr "hello"
## $ c: int [1:3] 4 5 6
## $a
## [1] 1 2 3
## $a
## [1] 1 2 3
##
## $c
## [1] 4 5 6
## $b
## [1] "hello"
##
## $c
## [1] 4 5 6
## $a
## [1] 1 2 3
## $a
## [1] 1 2 3
## $a
## [1] 1 2 3
##
## $c
## [1] 4 5 6
Using double brackets extracts out a single element.
## [1] 1 2 3
## [1] 1 2 3
A shorthand for using names inside double brackets is to use dollar signs.
## [1] 1 2 3
Exericse: Why does this not work. Suggest a correction.
## NULL
Data frame subsetting behaves both like lists and like matrices.
It behaves like a list for $
, [[
, and
[
if you only provide one index. The columns are the
elements of the list.
## [1] 1 2 3
## a
## 1 1
## 2 2
## 3 3
## [1] 1 2 3
## a c
## 1 1 4
## 2 2 5
## 3 3 6
It behaves like a matrix if you provide two indices.
## [1] "a" "b"
You can keep the data frame structure by using
drop = FALSE
.
## b
## 1 a
## 2 b
It is common to filter by rows by using the matrix indexing.
## a b c
## 1 1 a 4
## 2 2 b 5
Fix each of the following common data frame subsetting errors:
Why does the following code yield five missing values? (Hint: why
is it different from x[NA_real_]
?)
## [1] NA NA NA NA NA
What does upper.tri()
return? How does subsetting a
matrix with it work?
## [1] 2 3 6 4 8 12 5 10 15 20
Why does mtcars[1:20]
return an error? How does it
differ from the similar mtcars[1:20, ]
?
An lm
object is a list-like object. Given a linear
model, e.g., mod <- lm(mpg ~ wt, data = mtcars)
, extract
the residual degrees of freedom. Then extract the R squared from the
model summary (summary(mod)
).
All subsetting operators can be used to assign subsets of a vector new values. This is called subassignment.
## [1] 1 200 3 4 5
## [1] 0 200 0 4 5
## [1] NA 200 NA 4 5
## $a
## [1] "no way"
##
## $b
## [1] "hello"
##
## $c
## [1] 4 5 6
Remove a list element with NULL
.
## $b
## [1] "hello"
##
## $c
## [1] 4 5 6
## $c
## [1] 4 5 6
These are just meant to buff up your Base R skills. Consider the data
from the {Sleuth3}
package that contains information on sex
and salary at a bank. Try to use just base R methods.
What is the salary of the person in the 51st row? Use two different subsetting strategies to get this.
What is the mean salary of Male’s?
How many Females are in the data?
How many Females make over $6000?