<- function(a, b) {
add_two return(a + b)
}add_two(2, 4)
[1] 6
David Gerard
June 3, 2025
All functions are of the form
You can choose any name
you want, but they should be informative.
You choose the names of the arguments arg1
, arg2
, etc…
Arguments can have defaults by setting arg1 = default1
, where default1
is whatever the default value of arg1 is. In the above example, arg1
has no default but arg2
and arg3
have defaults.
Your code creates some output which I call result
above.
You put the output in a return()
call at the end of the function.
Steps to creating a function:
Coding standards
Example from our book follows.
df <- tibble::tibble(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
df$a <- (df$a - min(df$a, na.rm = TRUE)) /
(max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) /
(max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) /
(max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) /
(max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
How many inputs does each line have?
[1] 0.78057 0.08637 0.32873 0.88008 0.03995 0.26108 0.00000 1.00000 0.75874
[10] 0.93993
[1] 0.78057 0.08637 0.32873 0.88008 0.03995 0.26108 0.00000 1.00000 0.75874
[10] 0.93993
# make it into a function and test it
rescale01 <- function(x) {
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
rescale01(c(0, 5, 10))
[1] 0.0 0.5 1.0
[1] 0.0 0.5 1.0
[1] 0.00 0.25 0.50 NA 1.00
Now, if we have a change in requirements, we only have to change it in one place. For instance, perhaps we want to handle columns that have Inf as one of the values.
Do’s and do not’s of function naming:
# load data --------------------
Exercise: Write a function that calculates the \(z\)-scores of a numeric vector. The \(z\)-score takes each value, subtracts the mean, then divides the standard deviation. It is the measure of how many standard deviations above (or below) the mean a value is.
Exercise: Write a function that takes a numeric vector as input and replaces all instances of -9
with NA
.
Exercise: Write a function that takes a numeric vector and returns the coefficient of variation (the mean divided by the standard deviation).
Exercise: Write a function that takes as input a vector and returns the number of missing values.
Exercise (from RDS): Given a vector of birth dates, write a function to compute the age in years.
Exercise: Re-write the the function range()
. Use functions: min()
, max()
Exercise: Write both_na()
, a function that takes two vectors of the same length and returns the number of positions that have an NA
in both vectors. Useful functions: is.na()
, sum()
, logical operators.
Exercise: Read the source code for each of the following three functions, describe what they do, and then brainstorm better names.