All functions are of the form
You can choose any name
you want, but they should be
informative.
You choose the names of the arguments arg1
,
arg2
, etc…
Arguments can have defaults by setting
arg1 = default1
, where default1
is whatever
the default value of arg1 is. In the above example, arg1
has no default but arg2
and arg3
have
defaults.
Your code creates some output which I call result
above.
You put the output in a return()
call at the end of
the function.
Steps to creating a function:
Coding standards
## [1] 6
Example from our book follows.
df <- tibble::tibble(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
df$a <- (df$a - min(df$a, na.rm = TRUE)) /
(max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) /
(max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) /
(max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) /
(max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
How many inputs does each line have?
## [1] 0.0328336 0.4655800 1.0000000 0.5985189 0.0000000 0.0003578 0.5912672
## [8] 0.8056156 0.4987288 0.2805086
## [1] 0.0328336 0.4655800 1.0000000 0.5985189 0.0000000 0.0003578 0.5912672
## [8] 0.8056156 0.4987288 0.2805086
# make it into a function and test it
rescale01 <- function(x) {
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
rescale01(c(0, 5, 10))
## [1] 0.0 0.5 1.0
## [1] 0.0 0.5 1.0
## [1] 0.00 0.25 0.50 NA 1.00
Now, if we have a change in requirements, we only have to change it in one place. For instance, perhaps we want to handle columns that have Inf as one of the values.
## [1] 0 0 0 0 0 0 0 0 0 0 NaN
rescale01 <- function(x) {
rng <- range(x, na.rm = TRUE, finite = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
rescale01(x)
## [1] 0.0000 0.1111 0.2222 0.3333 0.4444 0.5556 0.6667 0.7778 0.8889 1.0000
## [11] Inf
Do’s and do not’s of function naming:
# load data --------------------
Exercise: Write a function that calculates the \(z\)-scores of a numeric vector. The \(z\)-score takes each value, subtracts the mean, then divides the standard deviation. It is the measure of how many standard deviations above (or below) the mean a value is.
Exercise: Write a function that takes a numeric
vector as input and replaces all instances of -9
with
NA
.
Exercise: Write a function that takes a numeric vector and returns the coefficient of variation (the mean divided by the standard deviation).
Exercise: Write a function that takes as input a vector and returns the number of missing values.
Exercise (from RDS): Given a vector of birth dates, write a function to compute the age in years.
Exercise: Re-write the the function
range()
. Use functions: min()
,
max()
Exercise: Write both_na()
, a
function that takes two vectors of the same length and returns the
number of positions that have an NA
in both vectors. Useful
functions: is.na()
, sum()
, logical
operators.
Exercise: Read the source code for each of the following three functions, describe what they do, and then brainstorm better names.