Introduction to Object Oriented Programming

Author

David Gerard

Published

September 22, 2025

Learning Objectives

Overview of OOP.
Vocabulary of OOP.
R Base Types.
Introduction to OOP and Chapter 12 of Advanced R

Motivation

There are various strategies to programming that folks use.

You are mostly used to procedural programming where you list out a sequence of steps that are carried out in succession.

mean_vec <- rep(NA_real_, length.out = length(mtcars))
names(mean_vec) <- names(mtcars)
for (i in seq_along(mtcars)) {
  mean_vec[[i]] <- mean(mtcars[[i]])
}
mean_vec

     mpg      cyl     disp       hp     drat       wt     qsec       vs 
 20.0906   6.1875 230.7219 146.6875   3.5966   3.2172  17.8487   0.4375 
      am     gear     carb 
  0.4062   3.6875   2.8125

You have also been exposed to functional programming where you compose functions with other functions.

sapply(mtcars, mean)

     mpg      cyl     disp       hp     drat       wt     qsec       vs 
 20.0906   6.1875 230.7219 146.6875   3.5966   3.2172  17.8487   0.4375 
      am     gear     carb 
  0.4062   3.6875   2.8125

purrr::map_dbl(mtcars, mean) ## tidyverse version

     mpg      cyl     disp       hp     drat       wt     qsec       vs 
 20.0906   6.1875 230.7219 146.6875   3.5966   3.2172  17.8487   0.4375 
      am     gear     carb 
  0.4062   3.6875   2.8125

Object oriented programming (OOP) is a different style of programming than you are used to, centered around objects with data and functions attached to them and their class.
R has three native object oriented programming systems (S3, S4, and RC for “reference classes”), and many other third-party packages have made their own object oriented systems ({R6} being the most popular).
- These systems are listed in increasing order of complexity, with S3 being “baby” OOP, S4 being “YA” OOP, and RC and R6 being “big boy” OOP.
- If you are extending {ggplot2} then you will learn about another OOP system specific to {ggplot2}: ggproto.

E.g.: To calculate the column means in S3 OOP, we would probably create a generic function for column means.

col_means <- function(x, ...) {
  UseMethod("col_means")
}

and then create a specific method for the data.frame class

col_means.data.frame <- function(x) {
  sapply(x, mean)
}

Finally, we would call the generic function on a object of class data.frame.

col_means(mtcars)

     mpg      cyl     disp       hp     drat       wt     qsec       vs 
 20.0906   6.1875 230.7219 146.6875   3.5966   3.2172  17.8487   0.4375 
      am     gear     carb 
  0.4062   3.6875   2.8125

E.g.: To calculate the column means in S4 OOP is very similar, just more formal:

setOldClass(Classes = "data.frame")
setGeneric(name = "col_means_s4", def = function(x) standardGeneric("col_means_s4"))

[1] "col_means_s4"

setMethod(f = "col_means_s4", 
          signature = "data.frame", 
          definition = function(x) {
            sapply(x, mean)
          }
)
col_means_s4(mtcars)

     mpg      cyl     disp       hp     drat       wt     qsec       vs 
 20.0906   6.1875 230.7219 146.6875   3.5966   3.2172  17.8487   0.4375 
      am     gear     carb 
  0.4062   3.6875   2.8125

E.g.: To calculate the column means in R6 OOP, we would probably create a new class that has the $col_means() method that we could call.

datFrame <- R6::R6Class(classname = "datFrame", public = list(
  df = NULL,
  initialize = function(df) {
    stopifnot(is.data.frame(df))
    self$df <- df
  },
  col_means = function() {
    sapply(self$df, mean)
  }
)
)

mtcars_df <- datFrame$new(df = mtcars)
mtcars_df$col_means()

     mpg      cyl     disp       hp     drat       wt     qsec       vs 
 20.0906   6.1875 230.7219 146.6875   3.5966   3.2172  17.8487   0.4375 
      am     gear     carb 
  0.4062   3.6875   2.8125

Because R programmers are not OOP programmers, you should be coding mostly in S3 and S4 when using OOP. We’ll only spend time on S3 for this class (the most popular one).
S3 and S4 use generic function OOP where the same function name is evaluated differently based on the class of the object.

E.g. that allows the output of summary() to differ between doubles and factors.

x <- sample(1:10, size = 100, replace = TRUE)
summary(x)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    3.00    5.00    5.58    8.00   10.00

y <- factor(x)
summary(y)

 1  2  3  4  5  6  7  8  9 10 
 7 10  9 12 14 10 10  8  8 12

R6 and RC use encapsulated OOP where objects are the center of everything, holding fields (data) and methods (functions) that operate on those values. These are closest to what you would be used to if you are coming from an OOP language. Try not to use them.
E.g. in R we apply a function, like mean() to a vector, like x. But in an encapsulated object oriented programming system would have the function mean() attached to a vector x. That’s one difference between R and Python.

R
```
x <- c(19, 22, 31)
mean(x) ## apply mean to x
```
```
[1] 24
```
Python
```
import numpy as np
x = np.array([19, 22, 31])
x.mean() ## mean belongs to x
```
```
np.float64(24.0)
```
S3 allows you to use functions like print() and summary() and plot() on outputs of your functions. You can also define your own “generics.”
S4 is similar to S3 but is more formal and strict. S4 is important to understand if you want to use or contribute to Bioconductor.

OOP Vocabulary

Polymorphism: Use the same function name for different types of input, but have the function evaluate differently based on the types of input.
An object is a specific instance of a class. E.g. below, x is an object of class factor.
```
x <- factor(c(119, 22, 31))
class(x)
```
```
[1] "factor"
```
A function for a specific class is a method.
- In R6, methods belong to objects, like the col_means() method for our R6 class above.
- In S3 and S4, methods are specific versions of generics. Like in S3, print.factor() is the print method for factor objects.
A field is data that belongs to an object. In our R6 example, we had the df and mean_vec fields.
- In S3, fields are called attributes.
- In S4, fields are called slots.
Classes are defined in a hierarchy. So if a method does not exist in one class it is searched for in the parent class. It is said that the child class inherits the behavior the parent class.

E.g. tibbles inherit the behavior of data frames.

class(tibble::tibble(a = 1))

[1] "tbl_df"     "tbl"        "data.frame"

The order in which classes are searched for methods is called method dispatch.

`{sloop}`

The {sloop} package is an interface for exploring OOP systems.

sloop::otype() allows you to see if the system is S3, S4, R6, etc…

sloop::otype(mtcars) ## Most R stuff is in S3.

[1] "S3"

data("USCounties", package = "Matrix") ## Efficient matrix computations package
sloop::otype(USCounties)

[1] "S4"

pb <- progress::progress_bar$new() ## progress bars for for-loops
sloop::otype(pb)

[1] "R6"

Base Types

S (the precursor to R) was developed first without an OOP system. So their only objects were “base types”. But these don’t have basic OOP functionality like polymorphism, inheritance, etc..
R users often call base types “objects” even though they aren’t OOP objects.
```
x <- 1:10
sloop::otype(x)
```
```
[1] "base"
```

In R, an OO object has a class attribute and a base type does not.

x <- 1:10
attr(x, "class")

NULL

y <- factor(x)
attr(y, "class")

[1] "factor"

class() will return the result of typeof() if an object has no class attribute, this is called its implicit class.
```
class(x)
```
```
[1] "integer"
```
```
typeof(x)
```
```
[1] "integer"
```

Every object, including OO objects, have a base type that can be seen by typeof().

typeof(y)

[1] "integer"

typeof(mtcars)

[1] "list"

typeof(USCounties)

[1] "S4"

typeof(pb)

[1] "environment"

There are 25 base types. From Hadley’s list, the important ones are:

Vector: NULL, logical, integer, double, character, list

typeof(NULL)

[1] "NULL"

typeof(TRUE)

[1] "logical"

typeof(1L)

[1] "integer"

typeof(1)

[1] "double"

typeof("1")

[1] "character"

typeof(list(1))

[1] "list"

Functions: closure (regular R functions), special (internal R functions), builtin (“primitive” functions in the base namespace that were built using C)
```
typeof(mean)
```
```
[1] "closure"
```
```
typeof(`if`)
```
```
[1] "special"
```
```
typeof(sum)
```
```
[1] "builtin"
```

Environments: environment

typeof(rlang::global_env())

[1] "environment"

S4 types: S4
```
typeof(USCounties)
```
```
[1] "S4"
```

Language types (used in metaprogramming): symbol, language, pairlist, and expression.

typeof(quote(a))

[1] "symbol"

typeof(quote(a + 1))

[1] "language"

typeof(formals(mean))

[1] "pairlist"

typeof(expression(a))

[1] "expression"

Exercise: What’s the (i) type, (ii) OOP system, and (iii) class of the following objects.

x <- lubridate::make_date(year = c(1990, 2022), month = c(1, 2), day = c(30, 22))
y <- matrix(NA_real_, nrow = 10, ncol = 2)
z <- tibble::tibble(a = 1:3)
aa <- lm(mpg ~ wt, data = mtcars)
bb <- t.test(mpg ~ am, data = mtcars)
cc <- rTensor::as.tensor(array(1:30, dim = c(2, 3, 5)))

Exercise: Why do we get different results from summary() with the following code?

a <- lm(mpg ~ wt, data = mtcars)
b <- t.test(mpg ~ am, data = mtcars)
summary(a)


Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
   Min     1Q Median     3Q    Max 
-4.543 -2.365 -0.125  1.410  6.873 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   37.285      1.878   19.86  < 2e-16
wt            -5.344      0.559   -9.56  1.3e-10

Residual standard error: 3.05 on 30 degrees of freedom
Multiple R-squared:  0.753, Adjusted R-squared:  0.745 
F-statistic: 91.4 on 1 and 30 DF,  p-value: 1.29e-10

summary(b)

            Length Class  Mode     
statistic   1      -none- numeric  
parameter   1      -none- numeric  
p.value     1      -none- numeric  
conf.int    2      -none- numeric  
estimate    2      -none- numeric  
null.value  1      -none- numeric  
stderr      1      -none- numeric  
alternative 1      -none- character
method      1      -none- character
data.name   1      -none- character

Exercise: From the previous exercise, if we remove the class from a and b, what happens to the summary() call? What does this tell you about the summary() methods of the htest and lm classes?