Git was designed to track small changes in small text files.
So by default, it is not well-equipped to handle larger files (e.g. greater than 100 MB).
But many datasets (in particular the ones you will use for the final project) are larger.
Git-LFS is an extension to git to version large files. You can install it and use it with the following steps:
Once per computer, install git-lfs:
sudo apt-get install git-lfs
brew update
brew install git-lfs
Once per repo, set up git-lfs:
Open up the terminal.
Navigate to the repo where you want to use git-lfs.
Run git lfs install
Select the types of files that you would like to place under git-lfs. These should be the large data files. For example, to place all CSV files under git-lfs, run:
git lfs track "*.csv"
Commit the hidden file “.gitattributes”
git add .gitattributes
git commit -m "Add .gitattributes"
Use git as you normally would.
If you accidently committed large files before running
git lfs install
, then you can retroactively update your
repo to commit them with git-lfs by:
git lfs migrate import --include="*.csv" --everything
The above code is for CSV files. You should change the above code based on the file type.
GitHub charges money for storing large files.
I used an education discount to obtain free data storage for all of the repos in our class organization. So you won’t need to pay anything.
To store large files for your personal repos (outside of your classwork), click on your photo at the top right of the the GitHub website. Go to Settings > Billing > Add more data. You can then choose the data plan you want.