Project Organization

Author

David Gerard

Learning Objectives

What is Truth

  • What is Truth?

    • Hadley uses this metaphore, where Truth = source of reproducibility.
  • Right now, your Truth is whatever is in your global environment (the variables and functions that you, or R, have created in your current R session).

  • This is not reproducible. It’s hard to share it. It’s hard to exactly re-create it. You should never depend on your R environment for Truth.

  • Truth should be what is written in your R scripts and Quartos.

  • Run this to get a blank global environment each time you start up R studio:

    usethis::use_blank_slate()
  • That forces you to have the correct Truth.

Where does your project live?

  • Your working directory is where R looks for files/folders.

  • It’s the same concept as from the bash lecture, but applied to R instead of bash.

  • R’s working directory can be different from bash’s.

  • In R, you can see the current working directoy by

    getwd()

Where should your project live?

  • All files/data/output for a single project should be in a single directory.

  • Your working directory should be at the root of the project directory.

  • You can format this manually, but R Studio Projects do this for you automatically.

    • Sets the working directory to be the root of the project.
    • You can set default behavior for the project.
    • R studio has tools that interface with projects (like using git or R environments inside a project).
  • You create a project with “File” > “New Project…”

    New project screen

  • Click on “New Directory”

    New project type

  • Click on “New Project”

    New project screen

  • Fill out the project name (making sure they follow the same standards as file names). Choose a directory to place the project.

  • The new project will open. The working directory will be the project location.

    getwd()

    [1] "/Users/dgerard/Documents/teaching/r4ds"

  • If you quit R Studio and double click on the “.Rproj” file, then this will reopen the project. Try this now.

  • You can switch between projects by clicking on the project name at the top right of R Studio and toggling between projects.

  • Once inside a project, you should only use relative paths.

    • In an R script, assume the working directory is the root of the project.
    • In a Quarto Kuato document, assume the working directory is the location of the Quarto Kuato document.

Project Organization

  • My default project structure is here: https://github.com/gerardlab/proj

  • It contains four basic folders:

  • analysis/:

    • Contains Quarto Kuato documents for exploratory analysis and reports.
  • code/:

    • Contains R scripts. Use this for reusable functions or large code that doesn’t require literate programming.
  • output/:

    • Contains figures and cleaned data that were created in code/ and analysis/.
  • data/:

    • Contains raw data. Once placed here, data should never be modified.
    • Always keep a copy of the original dataset untouched. If you overwrite it, you may never be able to reproduce your results.
  • There are lots of other perfectly fine project structures, like Jim Hester’s.

README files

  • In a repo/project, you should have a small summary of your project so others can see what it’s about.

  • Create a README file in your open project by writing:

    usethis::use_readme_rmd()
  • This opens up an R markdown file that you can edit to include a summary of your project.

  • When you knit the R markdown file, it renders it to a markdown file that is viewable on GitHub.

Exercise

  1. In your r4ds project, create four folders: analysis, code, output, and data.

  2. Download the Big Mac data from here, as described here. Place it in the data folder.

  3. Create a Quarto Kuato file in the analysis folder. Load the tidyverse and the Big Mac data. Make sure the document renders.

  4. Create an R script in the code folder. Load the tidyverse and the Big Mac data. Confirm that the working directory is still the root of the project.

  5. Create a README file that summarizes that this repo’s goal is to analyze the Big Mac data.