R CMD check
is a command-line tool for automatic
package checking that can automatically detect common problems that are
hard to check.
R CMD check
can be run from R via
::check() devtools
The check will return errors, warnings, and notes.
Errors: Serious problems you should fix right away.
Warnings: Likely problems that you should fix eventually.
Notes: Minor problems that may or may not be real problems. I would fix them anyway.
Workflow for package checking:
devtools::check()
I wouldn’t run devtools::check()
every time you make
a change. I would do it only a couple times a day. But the earlier you
do it, the easier it is to fix all issues.
If you run into an error/warning/note that you do not understand, go to the R packages page to understand it: https://r-pkgs.org/r-cmd-check.html
I won’t cover all of the checks, but I’ll list the ones that I most often need to fix.
In the “DESCRIPTION” file, your title should be in title case (“This is a Title”, not “This is a title”), and not end in a period. I would also go through this checklist for the DESCRIPTION file.
Every argument of a documented function needs to be documented, and there cannot be excess documentation.
If you have a typo in your {roxygen2}
documentation,
you will get this warning:
> checking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'pkg'
‘arg’
Documented arguments not in \usage in documentation object 'pkg':
‘agr’
An undocumented function (where you don’t have
{roxygen2}
comments or where you added the
@noRd
tag) does not throw such a warning.
Your package should not be too large.
R will tell you which folders have large files.
If you are including too much data, that will be an issue.
If R tells you that only libs
is too large then this
is usually OK:
> checking installed package size ... NOTE
installed size is 5.2Mb
sub-directories of 1Mb or more:
libs 4.4Mb
The above often occurs when you are using Rcpp.
Failure to have a package installed: This happens sometimes when you work on multiple computers (or collaborate with others). You can install all dependencies of a package with
::install_deps(dependencies = TRUE) devtools
Failure to specify a package:
R will tell you if you have not imported a function, but try to use it.
The error will look like this:
> checking R code for possible problems ... NOTE
simreg: no visible global function definition for ‘rnorm’
Undefined global functions or variables:
rnorm
Consider adding
importFrom("stats", "rnorm")
to your NAMESPACE file.
The R comes with the following packages which are attached at startup by default (from the R FAQ)
{base}
{compiler}
{datasets}
{grDevices}
{graphics}
{grid}
{methods}
{parallel}
{splines}
{stats}
{stats4}
{tcltk}
{tools}
{utils}
Unless a function is from {base}
, you will need to
specify that package using ::
. E.g.
stats::rnorm()
or utils::read.table()
. Most of
the issues I come across is forgetting about the {stats}
package. You can see what functions are from {base}
with
library(help = "base")
You can set up GitHub Actions so that it will run
R CMD check
on multiple virtual machines (Windows, Mac, or
Ubuntu) each time you push. This is really great for making sure your
package is robust and constantly being checked.
Automatic checking each time you make a change is called continuous integration.
In a package, run
::use_github_action_check_standard() usethis
Running this will create a new file in a hidden folder via the path “.github/workflows/R-CMD-check.yaml”. This YAML file contains instructions for setting up a virtual machine, installing R and your dependencies, and running R CMD check.
To use it, simply commit your files and push to GitHub, then wait for the checks to run. You can see their progress by clicking on the “Actions” tab on the GitHub page of your package.
It’s not too important to know what that file does, but there are some parts that you may need to edit.
You may comment out one of the operating systems for the check if
you know that the error is artificial. Use #
for comments
in a YAML file. Below, I comment out the Mac.
strategy:
fail-fast: false
matrix:
config:
- {os: windows-latest, r: 'release'}
# - {os: macOS-latest, r: 'release'}
- {os: ubuntu-20.04, r: 'release', rspm: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"}
- {os: ubuntu-20.04, r: 'devel', rspm: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"}
Sometimes (but rarely) you need to fix the install code for the
dependencies. Onetime {remotes}
was failing to install the
correct Bioconductor packages I needed, so I had to edit it this
way:
- name: Install dependencies
run: |
remotes::install_deps(dependencies = TRUE)
remotes::install_cran("rcmdcheck")
install.packages("BiocManager") # new line
BiocManager::install("VariantAnnotation") # new line shell: Rscript {0}
You can see a variety of other YAML files at https://github.com/r-lib/actions/tree/v1/examples
Recall the simreg()
example from the Testing lecture. Use the edit-check
workflow to further develop your package with the following
capabilities:
In simreg()
, instead of simulating \(x\) from a standard normal, give the user
the ability to choose the variance of \(x\), which we will call \(\tau^2\).
It is probably difficult for the user to specify both \(\sigma^2\) (the residual variance) and \(\tau^2\) (the variance of the predictors). A better option would allow the user to provide more intuitive inputs. One possible input would be the proportion of variance explained (PVE), which we will define as \[ PVE = \frac{\beta_1^2\tau^2}{\beta_1^2\tau^2 + \sigma^2}. \] This follows from \[ var(y_i) = var(\beta_0 + \beta_1x_i + \epsilon_i) = \beta_1^2\tau^2 + \sigma^2, \] and so \(\beta_1^2\tau^2\) is how much of the variance in \(y\) is explained by the predictors.
Allow the user to set the PVE, the residual variance (\(\sigma^2\)), and the regression coefficient
(\(\beta_1\)). To do this, you should
create a new function called tau_from_pve()
which will
calculate the proper \(\tau^2\) given
the PVE, the residual variance, and the regression coefficient. Then you
can just use that \(\tau^2\) to
simulate \(x\).
It would probably be better to include many options to choose
\(x\). Create a new function called
simx()
that will generate \(x\) values under different conditions:
sample(x = c(a, b), size = n, replace = TRUE)
for
different numeric values of a
and b
.