NULL
: Absence of a vector.Four basic types:
TRUE
or FALSE
L
behind it
(for “long integer”).-1L
, 0L
, 1L
, 2L
,
3L
, etc…1
, 1.0
, 1.01
, etc…Inf
, -Inf
, and NaN
are also
doubles."1"
, "one"
, "1 won one"
,
etc…You create vectors with c()
for “combine”
There are no scalars in R. A “scalar” is just a vector length 1.
## [1] TRUE
Integers and doubles are together called “numerics”
You can determine the type with typeof()
.
## [1] "logical"
## [1] "integer"
## [1] "double"
## [1] "character"
The special values, Inf
, -Inf
, and
NaN
are doubles
## [1] "double"
Determine the length of a vector using length()
## [1] 2
Missing values are represented by NA
.
NA
is technically a logical value.
## [1] "logical"
This rarely matters because logicals get coerced to other types when needed.
## [1] "integer"
## [1] "double"
## [1] "character"
But if you need missing values of other types, you can use
Never use ==
when testing for missingness. It will
return NA
since it is always unknown if two unknowns are
equal. Use is.na()
.
## [1] NA NA
## [1] TRUE FALSE
You can check the type with is.logical()
,
is.integer()
, is.double()
, and
is.character()
.
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
Attempting to combine vectors of different types coerces them to the same type. The order of preference is character > double > integer > logical.
## [1] "integer"
## [1] "double"
## [1] "character"
Exercise (from Advanced R): Predict the output:
Exercise (from Advanced R): Explain these results:
## [1] TRUE
## [1] TRUE
## [1] FALSE
Attributes are meta information applied to atomic vectors.
Many common objects (like matrices, arrays, factors, date-times) are just atomic vectors with special attributes.
You get and set attributes with attr()
a <- 1:3
attr(a, "x") <- "abcdef" # sets x attribute of vector a to be "abcdef"
attr(a, "x") # retrieve the x attribute of vector a
## [1] "abcdef"
You can see all attributes of a vector with
attributes()
.
## $x
## [1] "abcdef"
##
## $y
## [1] 4 5 6
You can set many attributes at the same time with
structure()
.
## $x
## [1] "abcdef"
##
## $y
## [1] 4 5 6
Attributes are name-value pairs, and all of these attributes are
associated with an object. Below, the vector c(1, 2, 3)
points to attributes x
and y
that each have
their own values.
Most attributes are typically lost by most operations.
## NULL
## NULL
Exception: Two attributes are not lost typically: names and dim.
Names are a character vector the same length as the atomic vector. Each name corresponds to a single element.
You could set names using attr()
, but you should
not.
## $names
## [1] "a" "b" "c"
Names are so special, that there are special ways to create them and view them
## [1] "a" "b" "c"
## [1] "a" "b" "c"
The proper way to think about names is like this:
But each name corresponds to a specific element, so Hadley does it like this:
Names stay with single bracket subsetting (not double bracket subsetting)
## [1] "a"
## [1] "a" "b"
## NULL
Names can be used for subsetting (more in Chapter 4)
## [1] 1
You can remove names with unname()
.
## [1] 1 2 3
The class of an object is an important attribute that controls R’s S3 system for object oriented programming.
The class of an object will determine its behavior when you use
that class in a generic function such as
print()
or summary()
.
You can create your own S3 classes (chapter 13).
Here, we will talk about some S3 classes that come with R by default.
You can determine the class of object with class()
,
and you can set the class to NULL
by
unclass()
.
Factors, Dates, and POSIXct (date-times)
A factor is an integer vector with
class
attribute factor
, andlevels
attribute describing the possible levels## [1] a b b a
## Levels: a b
## [1] "integer"
## [1] "factor"
## $levels
## [1] "a" "b"
##
## $class
## [1] "factor"
R also does some stuff under the hood for encoding factors (i.e. has a lot of methods specifically for factors).
Factors are R’s way of storing categorical variables, and are useful when a variable only has a certain number of possible values.
Learn more about factors here.
A Date is a double vector with class attribute
Date
.
## [1] "double"
## $class
## [1] "Date"
## [1] "Date"
Let’s look at the underlying double to today:
## [1] 19622
## [1] 0
Date-time classes are called either POSIXct
(Portable Operating System Interface in Unix, Calendar Time) or
POSIXlt
(Portable Operating System Interface in Unix, Local
Time).
POSIXct
shows up more often. It is a double
representing the number of seconds since the beginning of 1970.
## [1] "double"
## [1] "POSIXct" "POSIXt"
## [1] 1.695e+09
POSIXlt
is a named list of vectors with elements
representing seconds, minutes, hours, days of the month, months, years,
weekdays, etc…
ltvec <- as.POSIXlt(x = c("1980-10-10 01:11:01",
"1970-01-11 10:15:22",
"2010-05-30 20:01:18"))
typeof(ltvec)
## [1] "list"
## $sec
## [1] 1 22 18
##
## $min
## [1] 11 15 1
##
## $hour
## [1] 1 10 20
##
## $mday
## [1] 10 11 30
##
## $mon
## [1] 9 0 4
##
## $year
## [1] 80 70 110
##
## $wday
## [1] 5 0 0
##
## $yday
## [1] 283 10 149
##
## $isdst
## [1] 1 0 1
##
## $zone
## [1] "EDT" "EST" "EDT"
##
## $gmtoff
## [1] NA NA NA
##
## attr(,"tzone")
## [1] "" "EST" "EDT"
## attr(,"balanced")
## [1] TRUE
You mostly interact with these date-time objects through the
{lubridate}
package, but base R has their own interfaces
(which I think are more difficult to use).
Learn more about dates and date-times here.
Exercise (From Advanced R): table()
will take as input a vector or vectors and count how many observations
have each value. What sort of object does table()
return?
What is its type? What attributes does it have? How does the
dimensionality change as you tabulate more variables?
In many applications, you will want to create empty vectors or vectors filled with missing values.
Create an empty vector with vector()
.
## character(0)
## numeric(0)
## integer(0)
## logical(0)
Shorthand for this is
## character(0)
## numeric(0)
## integer(0)
## logical(0)
Empty vectors often show up in defaults that are returned when folks ask for something of length 0.
E.g., in if you are simulating something, you might return a vector of length 0 if they ask for 0 elements.
You often want to create an empty vector that you then fill in with values. I like to create this vector to be with missing values, so that I know I made a mistake if they are not all filled in.
E.g. in a for-loop, you often fill in the elements of a vector. Let’s suppose we are evaluating the performance of the mean in a simulation study.
nsim <- 1000 ## number of simulations
nsamp <- 10 ## sample size
mvec <- rep(NA_real_, length.out = nsim)
true_mean <- 0
for (i in seq_len(nsim)) {
mvec[[i]] <- mean(rnorm(n = nsamp, mean = true_mean))
}
mean((mvec - true_mean)^2) ## mean squared error
## [1] 0.1024
If you are filling in the values of a matrix, you need to be able to create a matrix with missing values.
Lists are like vectors except each element can be of any type.
You create lists with list()
.
You can view a list with str()
.
## List of 3
## $ a : int [1:3] 1 2 3
## $ log_val: logi TRUE
## $ :List of 1
## ..$ c: num 10
c()
will combine lists into a single list. If you
use c()
with a list and a vector, then it will first coerce
the vector into a list where each element is a list.
## [[1]]
## [1] 1 2
##
## [[2]]
## [1] "a" "b"
##
## [[3]]
## [1] TRUE FALSE
## [[1]]
## [1] 1 2
##
## [[2]]
## [1] "a" "b"
##
## [[3]]
## [1] "c"
##
## [[4]]
## [1] "d"
## [[1]]
## [1] "c"
##
## [[2]]
## [1] "d"
typeof()
will return "list"
and
is.list()
tests for a list.
## [1] "list"
## [1] TRUE
Use unlist()
to remove the list structure.
## [[1]]
## [1] 1 2
##
## [[2]]
## [1] "a" "b"
## [1] "1" "2" "a" "b"
The dim
attribute can be applied to lists
## [,1] [,2]
## [1,] integer,2 numeric,4
## [2,] integer,8 character,2
## [1] 0.3428 0.3689 0.9490 0.6025
Data Frames are lists where
## [1] "list"
## $names
## [1] "a" "b"
##
## $class
## [1] "data.frame"
##
## $row.names
## [1] 1 2 3
Above, the “names” attribute are the columnames, and you can get
them with colnames()
## [1] "a" "b"
## [1] "a" "b"
The row.names are the row names, and you can obtain them with
row.names()
or rownames()
.
row.names()
are specifically for data frames, whereas
rownames()
was designed for extracting dimnames and was
also altered to work with data frames.## [1] "1" "2" "3"
## [1] "1" "2" "3"
Those row names are automatically generated, but you can set them
with rownames()
.
## a b
## h 4 A
## i 5 B
## j 6 C
tibbles
, from the package {tibble}
are
tidyverse data frames. The main differences are:
Tibbles do not automatically coerce data (such as from strings to factors). Data frames used to do this in older versions of R.
data.frame(x = c("a", "b", "c"),
stringsAsFactors = FALSE) ## needed to be safe for older versions of R
## x
## 1 a
## 2 b
## 3 c
## # A tibble: 3 × 1
## x
## <chr>
## 1 a
## 2 b
## 3 c
Tibbles do not change names if they happen to be non-syntactic (e.g. have spaces in them)
## hello.world
## 1 1
## 2 2
## 3 3
## # A tibble: 3 × 1
## `hello world`
## <dbl>
## 1 1
## 2 2
## 3 3
Tibbles will only recycle vectors of length 1.
## x y
## 1 1 1
## 2 2 2
## 3 3 1
## 4 4 2
## Error in `tibble::tibble()`:
## ! Tibble columns must have compatible sizes.
## • Size 4: Existing data.
## • Size 2: Column `y`.
## ℹ Only values of size one are recycled.
{tibbles}
do not reduce to vectors when you subset
one column. Folks disagree on whether this is good or bad.
df <- data.frame(`hello world` = c(1, 2, 3))
tib <- tibble::tibble(`hello world` = c(1, 2, 3))
attributes(df[, 1])
## NULL
## $names
## [1] "hello world"
##
## $row.names
## [1] 1 2 3
##
## $class
## [1] "tbl_df" "tbl" "data.frame"
Data frames allow for row names, tibbles do not. Folks disagree on whether this is desirable (Hadley is extremely against it).
Tibbles print differently than data frames. Tibbles only print 10 rows and only the columns that will fit. But I actually prefer the data frame method better, because pretty doesn’t matter when you are doing data analysis, and it’s better to see all columns.
Exercise: Based on our discussion of making
zero-length vectors, create a data frame with zero rows and columns
a
, and b
. Both should be double
columns.
Exercise: What does data.frame()
do
without any arguments?
Exercise: Use the row.names
argument of data.frame()
to create a data frame with 100
rows and no columns.
NULL
is its own data type, that always has length
0.
## [1] "NULL"
## [1] 0
NULL
is used to represent an empty vector.
## NULL
NULL
is often used as a default argument in a
function for complicated arguments. The function operates one way unless
a user specifies something for that argument. Look at
?ashr::ash.workhorse
for multiple examples.
E.g., let’s create a function wmean
that calculates
a weighted mean if weights are provided, an the sample mean
otherwise.
wmean <- function(x, w = NULL) {
if (is.null(w)) {
w <- rep(1 / length(x), length.out = length(x))
} else {
w <- w / sum(w)
}
return(sum(x * w))
}
x <- c(1, 2, 3)
wmean(x)
## [1] 2
## [1] 1.429
There are two alternative strategies to this. First, use
missingArg()
to test if an argument is missing.
wmean <- function(x, w) {
if (missingArg(w)) {
w <- rep(1 / length(x), length.out = length(x))
} else {
w <- w / sum(w)
}
return(sum(x * w))
}
x <- c(1, 2, 3)
wmean(x)
## [1] 2
## [1] 1.429
This works because of lazy evaluation (which we will learn about later).
I don’t like this because it is confusing to the user, who thinks
w
is a required argument.
Second, you can include more complicated defaults.
wmean <- function(x, w = rep(1, length(x))) {
w <- w / sum(w)
return(sum(x * w))
}
x <- c(1, 2, 3)
wmean(x)
## [1] 2
## [1] 1.429
I don’t like this because default arguments are evaluated inside the function, but user-provided arguments are evaluated outside the function (more on this later). This can lead to strange results.
NULL
is one of the ways R handle’s missingness. The
others are NA
and NaN
.
NULL
: An empty object. Can be thought of as a
zero-length vector.NA
: A missing value. Can be used as an element of a
vector.NaN
: Undefined numeric values, such as the output of
0/0
.typeof()
: Determine the type of an object (character,
double, integer, or logical).attr()
: Get or set an attribute.attributes()
: View all attributes.structure()
: Create an object with many
attributes.names()
: Get or set names attributes.unname()
: Remove the names attribute.dim()
: Get or set dim attributes.class()
: Get or set class attributes.unclass()
: Remove the class attribute.