Git Large File Storage

Author

David Gerard

Published

June 3, 2025

Learning Objectives

Motivation

  • Git was designed to track small changes in small text files.

  • So by default, it is not well-equipped to handle larger files (e.g. greater than 100 MB).

  • But many datasets (in particular the ones you will use for the final project) are larger.

Installation and Usage

  • Git-LFS is an extension to git to version large files. You can install it and use it with the following steps:

  • Once per computer, install git-lfs:

    • Ubuntu : Open up the terminal and run:
      1. sudo apt-get install git-lfs
    • Mac OSX: Open up the terminal and run:
      1. brew update
      2. brew install git-lfs
    • Windows:
      1. Download the windows installer from: https://git-lfs.github.com/
      2. Run the windows installer
  • Once per repo, set up git-lfs:

    1. Open up the terminal.

    2. Navigate to the repo where you want to use git-lfs.

    3. Run git lfs install

    4. Select the types of files that you would like to place under git-lfs. These should be the large data files. For example, to place all CSV files under git-lfs, run:

      git lfs track "*.csv"
    5. Commit the hidden file “.gitattributes”

      git add .gitattributes
      git commit -m "Add .gitattributes"
  • Use git as you normally would.

  • If you accidently committed large files before running git lfs install, then you can retroactively update your repo to commit them with git-lfs by:

    git lfs migrate import --include="*.csv" --everything
  • The above code is for CSV files. You should change the above code based on the file type.

Paying for storage

  • GitHub charges money for storing large files.

  • I used an education discount to obtain free data storage for all of the repos in our class organization. So you won’t need to pay anything.

  • To store large files for your personal repos (outside of your classwork), click on your photo at the top right of the the GitHub website. Go to Settings > Billing > Add more data. You can then choose the data plan you want.