Learning Objectives

Initial Setup

Version Control:

Motivation 1: Change code without the fear of breaking it

  • You want to try out something new, but you aren’t sure if it will work.

  • Non-git solution: Copy the files

    • analysis.R,
    • analysis2.R,
    • analaysis3.R,
    • analysis_final.R,
    • analysis_final_final.R,
    • analysis_absolute_final.R,
    • analysis7.R
    • analysis8.R
  • Issues:

    • Difficult to remember differences of files.
    • Which files produced specific results?
  • Git lets you change files, keeping track of old versions, and reverting to old versions if you decide the new changes don’t work.

Motivation 2: Easy Collaboration

  • In a group setting, your collaborators might suggest how to change your analysis/code.

  • First non-git solution: Email files back/forth.

  • Issues:

    • You have to manually incorporate changes.
    • Only one person can work on the code at a time (otherwise multiple changes might be incompatible).
  • Second non-git solution: Share a Dropbox or Google Docs folder (a “centralized” version control system).

  • Issues:

    • Again, only one person can work on the code at a time.
    • Less user-friendly for tracking changes.
  • Git let’s each individual work on their own local repository and you can automatically incorporate changes.

Motivation 3: Great for job interviews

Basic Git

Initialize a repository

  • Git needs to be told that a folder is a repo. Otherwise, it won’t keep files under version control.

  • In this class, you won’t need to tell git this (I’ll tell git this), but in the real world you will. So we’ll go over how to do this on GitHub and on the terminal.

On the terminal

  • Don’t initialize on your local for this lecture. These are just the steps you would do if you needed to initialize on your local.

  • Use cd to enter the folder that you would like to keep under version control.

  • The use git init

    git init
  • This will tell git that the folder is a single repo.

  • Your files are not yet tracked. You’ll need to do the steps below to tell git which files to track. But at least git now knows that this is a repo where tracking is possible.

On GitHub

  • Git is a version control system, GitHub is a website that hosts git repositories. (so on your resume, say that you know git, not GitHub).

  • You can create a git repo on GitHub (GitHub’s server is called the “remote”), then download (“clone”) the repo onto your computer (your computer is called the “local”).

  • On your GitHub homepage, click on “New”

    Create New Repo on GitHub

  • Fill out the form. The options are pretty self-explanatory, and GitHub does a good job of providing descriptions. For this lecture, make sure

    • Repository name is “test”.
    • The repo is set to be “Private”
    • You check “Add a README File”

    Repo Options on GitHub

  • Click on “Create Repository.

Cloning

  • “Cloning” is a fancy way to say download from GitHub.

  • But it also means that your local copy is connected to the remote copy automatically.

  • Enter the repo you want to clone, then click on the Code Button Button

  • Make sure that “SSH” is highlighted.

  • Then click on the Copy to Clipboard Button button to copy the link.

    Clone a Repo

  • In the terminal, navigate to where you want to download the repo, then clone it with git clone

    git clone git@github.com:dcgerard/test.git

    Make sure to change the link to what you copied (don’t use my link above).

  • Then move into your new repo

    ls
    cd test

Status

  • Use git_status to see what files git is tracking and which are untracked.

    git status
  • Git should tell you that everything is up-to-date

    On branch main
    Your branch is up to date with 'origin/main'.
    
    nothing to commit, working tree clean
  • Edit the README.md file to include your name, so that it looks something like this:

    # test
    David Gerard
    
    Repo for trying out GitHub.

    Make sure to save your changes.

  • Now check the status again.

    git status
  • Git should be telling you that README.md has been modified, and the changes are not yet committed.

    On branch main
    Your branch is up to date with 'origin/main'.
    
    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
        modified:   README.md
    
    no changes added to commit (use "git add" and/or "git commit -a")
  • Add a new file, called “empty.txt” by

    touch empty.txt
  • Exercise: Check the status again. What do you notice?

Staging

  • Use git add to add files to the stage.

    git add README.md
  • Always check which files have been added:

    git status
  • Useful flags for git add:

    • --all will stage all modified and untracked files.
    • --update will stage all modified files, but only if they are already being tracked.

Committing

  • Use git commit to commit files that are staged to the commit history.

    git commit -m "Add name to README.md."
  • Your message (written after the -m argument) should be concise, and describe what has been changed since the last commit.

  • If you forget to add a message, git will open up your default text-editor where you can write down a message, save the file, and exit. The commit will occur after you exit the text editor.

  • If your default text editor is vim, you can exit it using this.

  • git status should no longer have README.md as a modified file.

    git status

History of Changes

  • You can use git log to see what commits you have done.

    git log
  • There should be only two commits right now. One from GitHub and one from adding the name to README.md.

    commit 0301eeaf74062f0b80fdb3c27a60cc5ac6f28ca7 (HEAD -> main)
    Author: dcgerard <gerard.1787@gmail.com>
    Date:   Tue Nov 16 10:53:42 2021 -0500
    
        Add name to README.md
    
    commit fefbaffe03e0b074c33aa215d1135e6f8b68701d (origin/main, origin/HEAD)
    Author: David Gerard <gerard.1787@gmail.com>
    Date:   Tue Nov 16 10:04:47 2021 -0500
    
        Initial commit
  • Exercise: Add the following line of text to “empty.txt”

    blah blah blah

    Save the output. Now stage and commit the changes.

Looking at differences

  • Add the following lines of text to README.md

    Never and never, my girl riding far and near
    In the land of the hearthstone tales, and spelled asleep,
    Fear or believe that the wolf in a sheep white hood
    Loping and bleating roughly and blithely leap,
    My dear, my dear,
    Out of a lair in the flocked leaves in the dew dipped year
    To eat your heart in the house in the rosy wood.

    And delete the line

    Repo for trying out GitHub.
  • Use git diff to see changes in all modified files.

    git diff
  • Lines after a “+” are being added. Lines after a “-” are being removed.

  • You can exit git diff by hitting q.

  • git diff won’t check for changes in the staged files by default. But you can see the differences using git diff --staged.

    git diff
    git diff --staged
  • Exercise: Stage and commit your changes.

Pushing

  • Use git push to push commits to GitHub.

    git push origin main

    Do this now.

  • “origin” is the name of the remote.

  • “main” is the name of the branch we are pushing to remote.

  • You can see what the remote is named by typing

    git remote -v
  • You can see what branch you are on by

    git branch

Pulling

  • If a colleague has pushed changes to GitHub, you’ll need to pull those changes ontol your local before you can push anything to GitHub.

  • This is different than cloning. “Cloning” downloads a repo that wasn’t on your local machine. “Pulling” updates your local machine with the changes on the remote.

  • Use git pull to pull changes.

    git pull origin main
  • “origin” is the name of the remote.

  • “main” is the name of the branch we are pulling to.

  • If there are no changes on the remote, you’ll get the following message

    From github.com:dcgerard/test
     * branch            main       -> FETCH_HEAD
    Already up to date.

Branching

  • A branch is an “alternative universe” of your project, where you can experiment with new ideas (e.g. new data analyses, new data transformations, new statistical methods). After experimenting, you can then “merge” your changes back into the main branch.

  • Branching isn’t just for group collaborations, you can use branching to collaborate with yourself, e.g., if you have a new idea you want to play with but do not want to have that idea in main yet.

  • The “main” branch (the default in GitHub) is your best draft. You should consider anything in “main” as the best thing you’ve got.

  • The workflow using branches consists of

    1. Create a branch with an informative title describing its goal(s).
    2. Add commits to this new branch.
    3. Merge the commits to main

Create a branch

  • You create a branch with the name <branch> by

    git branch <branch>
  • Suppose we wanted to calculate some summary statistics, but we are not sure if we want to include these in the report. Let’s create a branch where we explore these summary statistics.

    git branch sumstat
  • You can see the list of branches (and the current branch) with

    git branch

Move between branches

  • You switch between branches with:

    git checkout <branch>
  • Move to the sumstat branch with

    git checkout sumstat

Edit Branch

  • When you are on a branch, you can edit and commit as usual.

Push branch to GitHub

  • You can push your new branch to GitHub just like you can push your main branch to GitHub:

    git push origin <branch>

Merge changes into main

  • Suppose you are satisfied with your changes in your new branch, then you’ll want to merge these into the main branch. You can do this on GitHub (see here). If you do so, then don’t forget to pull the changes from main back into your local machine.

    git pull origin main
  • Alternatively, you can merge the changes in your local machine. First, checkout the main branch.

    git checkout main
  • Then use merge to merge the changes from <branch> into main.

    git merge sumstat
  • Don’t forget to push your changes to GitHub

    git push origin main

Resolving Merge Conflicts

  • If two branches with incompatible histories try to merge, then git does not merge them.

  • Instead, it creates a “merge conflict”, which you need to resolve.

  • Instructions on resolving merge conflicts can be found here.

List of git commands

Vocabulary List (Blischak et. al., 2016)