<- c(1, 2, 3) x
Memory
Learning Objectives
- Names/Values
- Copy-on-modify
- Modify-by-reference
- Chapter 2 from Advanced R
Names and Values
Computer Memory is information (like numbers or strings) that is for immediate use. When you put information there, it is located at some “address” on your computer, and you can retrieve it from that address.
The following puts the vector
c(1,2,3)
in memory, and binds the namex
to itThe function
lobstr::obj_addr()
let’s us see the address of this object.::obj_addr(x) lobstr
[1] "0x134494d68"
When you assign
x
to a new variable namey
, it makes a new name that points to the same object asx
.<- x y
::obj_addr(y) lobstr
[1] "0x134494d68"
If you modify
y
, then it will make a copy of object 0x134494d68 and pointy
to that new object. This is called copy-on-modify3]] <- 4 y[[
::obj_addr(y) lobstr
[1] "0x1448a9c18"
Copy-on-modify exists so that
x
does not change when you changey
.x
[1] 1 2 3
You can use
tracemem()
to track whenever an object is copied.<- c(1, 2, 3) x tracemem(x)
[1] "<0x147644038>"
<- x y 3]] <- 4 ## copy made y[[
tracemem[0x147644038 -> 0x14778f228]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main
5]] <- 1 ## no copy made, y modified y[[
Note:
tracemem()
is connected to the object (here 0x147644038), not the namex
. So the following will not show a copy-on-modify because we changed the binding of the namex
.<- c(1, 2, 3) x tracemem(x)
[1] "<0x13340ca48>"
<- c(4, 5) x <- x y 2]] <- 6 y[[
Note:
tracemem()
will give you weird results if you use it inside of RStudio. That’s because the Environment pane makes references to objects.Name
a
inside function points to same object<- c(1, 2, 3) x tracemem(x)
[1] "<0x134432a48>"
<- function(a) { f return(a) }<- f(x) ## no copy made z
and
x
andz
now point to same objectExercise (From Advanced R): Explain the relationship between a, b, c and d in the following code:
<- 1:10 a <- a b <- b c <- 1:10 d
Verify your conclusions using
lobstr::obj_addr()
.Exercise: When does the address of
x
change? Usecat()
andlobstr::obj_addr()
to verify your answer. Doestracemem()
help you here? Why are why not?<- c() x for (i in 1:10) { <- i x[[i]] }
Exercise: When does the address of
x
change? Usecat()
andlobstr::obj_addr()
to verify your answer.<- rep(x = NA_real_, length.out = 10) x for (i in 1:10) { <- i x[[i]] }
Exercise: When does the address of
x
change? Usecat()
andlobstr::obj_addr()
to verify your answer.<- vector(mode = "numeric", length = 10) x for (i in 1:10) { <- i x[[i]] }
Understanding when an object is copied is important for performance. Making copies can be expensive if you are doing it a lot (like in a for-loop), making your code run much slower.
Lists and Data Frames
Recall that a list is a vector that can have elements of any type.
To do this, the list’s name points to a vector of references, and these point to the objects.
<- list(1, 2, 3) l1
Copy on modify for a list only copies the references, so is much more memory efficient. This is called a shallow copy.
<- l1 l2
3]] <- 4 l2[[
lobstr::ref()
allows you to see the location of each component of a list.::ref(l1, l2) lobstr
█ [1:0x145401928] <list> ├─[2:0x1452cb9e8] <dbl> ├─[3:0x1452cb828] <dbl> └─[4:0x1452cb668] <dbl> █ [5:0x146635c98] <list> ├─[2:0x1452cb9e8] ├─[3:0x1452cb828] └─[6:0x14576c168] <dbl>
- Notice that the lists are at different addresses, the first two objects of each list are at the same address, but the third object of each list are at different addresses.
NOTE: Older versions of R (before 3.1.0) always created deep copies, and so were less memory efficient.
Data frames are lists of vectors (the columns)
<- data.frame(x = c(1, 5, 6), y = c(2, 4, 3)) d1
Modify a column, then only that column is copied and modified
<- d1 d2 2] <- d2[, 2] * 2 d2[,
::ref(d1, d2) lobstr
█ [1:0x13442d3c8] <df[,2]> ├─x = [2:0x134336e28] <dbl> └─y = [3:0x134336dd8] <dbl> █ [4:0x13423bb88] <df[,2]> ├─x = [2:0x134336e28] └─y = [5:0x1342935d8] <dbl>
Modify a row, then the entire data frame is copied (much less efficient).
<- d1 d3 1, ] <- d3[1, ] * 3 d3[
::ref(d1, d3) lobstr
█ [1:0x13442d3c8] <df[,2]> ├─x = [2:0x134336e28] <dbl> └─y = [3:0x134336dd8] <dbl> █ [4:0x1341a98c8] <df[,2]> ├─x = [5:0x134493d28] <dbl> └─y = [6:0x134493cd8] <dbl>
Character Vectors
A character vector is a vector of references to a global string pool.
<- c("a", "a", "abc", "d") x
But Hadley usually writes this as
Use
lobstr::ref()
to show these references.::ref(x, character = TRUE) lobstr
█ [1:0x134484e78] <chr> ├─[2:0x15330c540] <string: "a"> ├─[2:0x15330c540] ├─[3:0x134479278] <string: "abc"> └─[4:0x1544867f8] <string: "d">
Exercise (from Advanced R): Why do you think
x
is copied here? (it is only copied twice if you use R studio). Modify the code so thatx
is not copied.<- c(1L, 2L, 3L) x tracemem(x)
[1] "<0x153581388>"
3]] <- 4 x[[
tracemem[0x153581388 -> 0x144fac6c8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main tracemem[0x144fac6c8 -> 0x1455ea2d8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main
<- c(1L, 2L, 3L) x tracemem(x)
[1] "<0x1472bf148>"
3]] <- 4L x[[
tracemem[0x1472bf148 -> 0x1473fb888]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main
Exercise (From Advanced R): Sketch out the relationship between the following objects:
<- 1:10 a <- list(a, a) b <- list(b, a, 1:10) c
Object Size
You can tell how much memory an object takes up with
lobstr::obj_size()
.<- 1:10 x ::obj_size(x) lobstr
680 B
Functions also take up memory
::obj_size(mean) lobstr
1.13 kB
::obj_size(lm) lobstr
63.50 kB
Because of the way objects efficiently bind, they might be smaller in memory than you expect.
Exercise: Why does the following list not take up 3 times as much memory as
x
?<- 1:10 x <- list(x, x, x) y ::obj_size(x) lobstr
680 B
::obj_size(y) lobstr
760 B
Character strings may also be a lot smaller than you expect.
<- "hello world, how are you" a <- rep(a, 100) b ::obj_size(a) lobstr
136 B
::obj_size(b) ## not 100 times larger lobstr
928 B
New versions of R have optimizations to efficiently store sequences of numbers called “ALTREP” for “alternative representation”. So the following are all the same size.
::obj_size(1:10) lobstr
680 B
::obj_size(1:100) lobstr
680 B
::obj_size(1:1000000) lobstr
680 B
Modify-in-place
The opposite of copy-on-modify is modify-in-place, where a new object is not created you you modify it.
Modify in place occurs when there is only a single binding.
<- c(1, 2, 3) v
3]] <- 4 v[[
Exercise: Why is a copy made here?
<- 1:3 x tracemem(x)
[1] "<0x1465a7158>"
3]] <- 4 x[[
tracemem[0x1465a7158 -> 0x134272f48]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main tracemem[0x134272f48 -> 0x134408388]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main
Modify-in-place also occurs in environments.
Environments are data structures that you can think of as like an unordered list. It’s a “bag of objects”.
Here, I create an environment, and bind the names
e1
ande2
to it.<- rlang::env(a = 1, b = 2, c = 3) e1 <- e1 e2
If I change the
e1
environment, thene2
is also changed.$c <- 4 e1$c e2
[1] 4
We will learn more about environments in Chapter 7, where this will be very important.
Garbage Collection
R often creates objects which no longer have names bound to them.
<- 1:3 x
<- 2:4 x
rm(x)
R has a garbage collector that periodically deletes these objects to free up memory. It is hard to reason when garbage collection is done.
This is only ever important to think about if you use C code in R without Rcpp.
New Functions
tracemem()
: Tracks an object so that a message is printed whenever it is copied.untracemem()
: Untracks an object.lobstr::ref()
: Display a tree of object addresses.lobstr::obj_addr()
: Gives the address (in memory) of an object that a name points to.lobstr::obj_size()
: Gives the size (in memory) of an object.