getwd()
Project Organization
Learning Objectives
- R Studio Projects
- Chapter 6 of R for Data Science
- Most of my notes here are from Hadley. Thank you!
- My project template: https://github.com/gerardlab/proj
- Jim Hester’s project template: https://github.com/jimhester/analysis_framework
What is Truth
What is Truth?
- Hadley uses this metaphore, where Truth = source of reproducibility.
Right now, your Truth is whatever is in your global environment (the variables and functions that you, or R, have created in your current R session).
This is not reproducible. It’s hard to share it. It’s hard to exactly re-create it. You should never depend on your R environment for Truth.
Truth should be what is written in your R scripts and Quartos.
Run this to get a blank global environment each time you start up R studio:
::use_blank_slate() usethis
That forces you to have the correct Truth.
Where does your project live?
Your working directory is where R looks for files/folders.
It’s the same concept as from the bash lecture, but applied to R instead of bash.
R’s working directory can be different from bash’s.
In R, you can see the current working directoy by
Where should your project live?
All files/data/output for a single project should be in a single directory.
Your working directory should be at the root of the project directory.
You can format this manually, but R Studio Projects do this for you automatically.
- Sets the working directory to be the root of the project.
- You can set default behavior for the project.
- R studio has tools that interface with projects (like using git or R environments inside a project).
You create a project with “File” > “New Project…”
Click on “New Directory”
Click on “New Project”
Fill out the project name (making sure they follow the same standards as file names). Choose a directory to place the project.
The new project will open. The working directory will be the project location.
getwd()
[1] "/Users/dgerard/Documents/teaching/r4ds"
If you quit R Studio and double click on the “.Rproj” file, then this will reopen the project. Try this now.
You can switch between projects by clicking on the project name at the top right of R Studio and toggling between projects.
Once inside a project, you should only use relative paths.
- In an R script, assume the working directory is the root of the project.
- In a Quarto
document, assume the working directory is the location of the Quarto
document.
Project Organization
My default project structure is here: https://github.com/gerardlab/proj
It contains four basic folders:
analysis/
:- Contains Quarto
documents for exploratory analysis and reports.
- Contains Quarto
code/
:- Contains R scripts. Use this for reusable functions or large code that doesn’t require literate programming.
output/
:- Contains figures and cleaned data that were created in
code/
andanalysis/
.
- Contains figures and cleaned data that were created in
data/
:- Contains raw data. Once placed here, data should never be modified.
- Always keep a copy of the original dataset untouched. If you overwrite it, you may never be able to reproduce your results.
There are lots of other perfectly fine project structures, like Jim Hester’s.
README files
In a repo/project, you should have a small summary of your project so others can see what it’s about.
Create a README file in your open project by writing:
::use_readme_rmd() usethis
This opens up an R markdown file that you can edit to include a summary of your project.
When you knit the R markdown file, it renders it to a markdown file that is viewable on GitHub.
Exercise
In your
r4ds
project, create four folders:analysis
,code
,output
, anddata
.Download the Big Mac data from here, as described here. Place it in the data folder.
Create a Quarto
file in the
analysis
folder. Load thetidyverse
and the Big Mac data. Make sure the document renders.Create an R script in the
code
folder. Load thetidyverse
and the Big Mac data. Confirm that the working directory is still the root of the project.Create a README file that summarizes that this repo’s goal is to analyze the Big Mac data.