Our own Bitbucket Server and SourceTree provide professional teams with the ultimate Git solutions, including support for Git LFS. Last week, Git LFS reached a new milestone with the release of version 1.2. Looking at it now I think we’re all surprised how many things we squeezed in! LFS其实是git的一个扩展，并没有改变git的工作方式，有点像耍了个小花招，把指定需要lfs管理的文件替换成了一个指针文件交给git进行版本管理； 在pull/push等这些操作中，lfs又通过lfs服务器把这些文件的真身给下载或上传回来；.
Git is a fantastic choice for tracking the evolution of your code base and collaborating efficiently with your peers. But what happens when the repository you want to track is really really big?
In this post I’ll give you some techniques for dealing with it.
Two categories of big repositories
If you think about it there are broadly two major reasons for repositories growing massive:
- They accumulate a very very long history (the project grows over a very long period of time and the baggage accumulates)
- They include huge binary assets that need to be tracked and paired together with code.
…or it could be both.
Sometimes the second type of problem is compounded by the fact that old, deprecated binary artifacts are still stored in the repository. But there’s a moderately easy – if annoying – fix for that (see below).
The techniques and workarounds for each scenario are different, though sometimes complementary. So I’ll cover them separately.
Cloning repositories with a very long history
Even though threshold for a qualifying a repository as “massive” is pretty high, they’re still a pain to clone. And you can’t always avoid long histories. Some repos have to be kept in tact for legal or regulatory reasons.
Simple solution: git shallow clone
The first solution to a fast clone and saving developer’s and system’s time and disk space is to copy only recent revisions. Git’s shallow clone option allows you to pull down only the latest n commits of the repo’s history.
How do you do it? Just use the –depth option. For example:
git clone --depth [depth] [remote-url]
Imagine you accumulated ten or more years of project history in your repository. For example, we migrated Jira (an 11 year-old code base) to Git. The time savings for repos like this can add up and be very noticeable.
The full clone of Jira is 677MB, with the working directory being another 320MB, made up of more than 47,000+ commits. A shallow clone of the repo takes 29.5 seconds, compared to 4 minutes 24 seconds for a full clone with all the history. The benefit grows proportionately to how many binary assets your project has swallowed over time.
Tip:Build systems connected to your Git repo benefit from shallow clones, too!
Shallow clones used to be somewhat impaired citizens of the Git world as some operations were barely supported. But recent versions (1.9 and above) have improved the situation greatly, and you can properly pull and push to repositories even from a shallow clone now.
Surgical solution: git filter branch
For the huge repositories that have lots of binary cruft committed by mistake, or old assets not needed anymore, a great solution is to use git filter-branch. The command lets you walk through the entire history of the project filtering out, modifying, and skipping files according to predefined patterns.
It is a very powerful tool once you’ve identified where your repo is heavy. There are helper scripts available to identify big objects, so that part should be easy enough.
The syntax goes like this:
git filter-branch --tree-filter 'rm -rf [/path/to/spurious/asset/folder]'
git filter-branch has a minor drawback, though: once you use _filter-branch_, you effectively rewrite the entire history of your project. That is, all commit ids change. This requires every developer to re-clone the updated repository.
So if you’re planning to carry out a cleanup action using git filter-branch, you should alert your team, plan a short freeze while the operation is carried out, and then notify everyone that they should clone the repository again.
Tip: More on git filter-branch in this post about tearing apart your Git repo.
Alternative to git shallow-clone: clone only one branch
Since git 1.7.10, you can also limit the amount of history you clone by cloning a single branch, like so:
git clone [remote url] --branch [branch_name] --single-branch [folder]
This specific hack is useful when you’re working with long running and divergent branches, or if you have lots branches and only ever need to work with a few of them. If you only have a handful of branches with very few differences you probably won’t see a huge difference using this.
Managing repositories with huge binary assets
The second type of big repository is those with huge binary assets. This is something many different kinds of software (and non-software!) teams encounter. Gaming teams have to juggle around huge 3D models, web development teams might need to track raw image assets, CAD teams might need to manipulate and track the status of binary deliverables.
Git is not especially bad at handling binary assets, but it’s not especially good either. By default, Git will compress and store all subsequent full versions of the binary assets, which is obviously not optimal if you have many.
There are some basic tweaks that improve the situation, like running the garbage collection (‘git gc’), or tweaking the usage of delta commits for some binary types in .gitattributes.
But it’s important to reflect on the nature of your project’s binary assets, as that will help you determine the winning approach. For example, here are some points to consider:
- For binary files that change significantly – and not just some meta data headers – the delta compression is probably going to be useless. So use ‘delta off’ for those files to avoid the unnecessary delta compression work as part of the repack.
- In the scenario above, it’s likely that those files don’t zlib compress very well either so you could turn compression off with ‘core.compression 0’ or ‘core.loosecompression 0’. That’s a global setting that would negatively affect all the non-binary files that actually compress well so this makes sense if you split the binary assets into a separate repository.
- t’s important to remember that ‘git gc’ turns the “duplicated” loose objects into a single pack file. But again, unless the files compress in some way, that probably won’t make any significant difference in the resulting pack file.
- Explore the tuning of ‘core.bigFileThreshold’. Anything larger than 512MB won’t be delta compressed anyway (without having to set .gitattributes) so maybe that’s something worth tweaking.
Solution for big folder trees: git sparse-checkout
Git Lfs Lock Sourcetree
A mild help to the binary assets problem is Git’s sparse checkout option (available since Git 1.7.0). This technique allows to keep the working directory clean by explicitly detailing which folders you want to populate. Unfortunately, it does not affect the size of the overall local repository, but can be helpful if you have a huge tree of folders.
What are the involved commands? Here’s an example:
- Clone the full repository once: ‘git clone’
- Activate the feature: ‘git config core.sparsecheckout true’
- Add folders that are needed explicitly, ignoring assets folders:
- echo src/ › .git/info/sparse-checkout
- Read the tree as specified:
- git read-tree -m -u HEAD
After the above, you can go back to use your normal git commands, but your work directory will only contain the folders you specified above.
Solution for controlling when you update large files: submodules
Another way to handle huge binary asset folders is to split those into a separate repository and pull the assets in your main project using Git submodules. This gives you a way to control when you update the assets. See more on submodules in these posts: Git submodules core concept and tips, and alternatives to Git submodules.
[UPDATE] …or you can skip all that and use Git LFS
If you work with large files on a regular basis, the best solution might be to take advantage of the large file support (LFS) Atlassian co-developed with GitHub in 2015. (Yes, you read that right. We teamed up with GitHub on an open-source contribution to the Git project.)
Git LFS is an extension that stores pointers (naturally!) to large files in your repository, instead of storing the files themselves in there. The actual files are stored on a remote server. As you can imagine, this dramatically reduces the time it takes to clone your repo.
Bitbucket supports Git LFS, as does GitHub. So chances are, you already have access to this technology. It’s especially helpful for teams that include designers, videographers, musicians, or CAD users.
Don’t give up the fantastic capabilities of Git just because you have a big repository history or huge files. There are workable solutions to both problems.
Check out the other articles I linked to above for more info on submodules, project dependencies, and Git LFS. And for refreshers on commands and workflow, our Git microsite has loads of tutorials. Happy coding!
Opens the Repository Browser window.
Show the Repository Browser and select the Remote section.
Prompts to add selected repositories to the Repository Browser.
Toggle the current repository's sidebar (branches, tags, and more.)
Toggle the current repository's command history to view details.
Switch to the file browser and commit preparation area.
Switch to the primary commit history browser area.
Switch to the commit history search area.
Toggle the visibility of branch labels in commit history graphs for all repositories.
Toggle the visibility of tag labels in commit history graphs for all repositories.
Toggle the visibility of the Build Status column in the History view for the current repository.
Retrieve the current repository's branches, tags, and history from the remote server.
Start interactively updating a branch's content to align with another.
Connect another repository to act like a descendent of the current repository, instead of a remote dependency.
Remove the selected item(s) from the staging area prior to committing.
Git Lfs Tracking Sourcetree
Retrieve the current repository's state from the remote server.
Save an archival copy of the repository.
Configure and migrate the current repository to Bitbucket Cloud.
Configure settings for the current repository.
Reload the current repository's commit history graph, file status, and build status.
Reload the current repository's status from the remote server.
Go to current repository's File Status view, focus the message editor, and populate with a message template if available.
Begin reverting one or more of the current repository's in-progress changes.
Save the current repository's in-progress changes for later use.
Send local commits for the current repository to the remote server.
Retrieve any new commits for the current repository from the remote server.
Switch to a particular branch or revision in the current repository.
Create a new branch from the currently active branch.
Start the process of combining the target branch into the active one in the current repository.
Create a new tag for the active branch and commit in the current repository.
Connect a new remote server to the current repository.
Connect another repository as a dependency for the current repository.
Open the remote server's website (if available) for the current repository.
Start a pull request on the remote server for the current active branch.
Open the selected item(s) for the current repository.
Open a new Finder window with the selected item(s) for current repository highlighted.
Open a new Terminal window to the directory for the current repository.
Display a preview of the selected item(s)'s contents if possible.
Open the assigned application (see Preferences) to display the item's changes.
Save the current repository's in-progress changes as a file for transfer or later use.
Take the contents of a patch (file or text) and merge them into the current repository if possible.
Add the selected untracked item(s) to the current repository's index, making them available for commit.
Remove the selected item(s) from the current repository's index and on-disk; requires a commit to save.
Combines the Add and Remove actions (see their entries) into a single task for all the selected item(s).
Remove the selected item(s) from the current repository's index; requires a commit to complete.
Create entries (specific patterns, file types, or other designations) for items that should be ignored completely in the current repository.
Ensure the selected item(s) are included in the current commit and focus the message editor, populating with a message template if available.
Begin reverting in-progress changes for the selected item(s).
Begin reverting the selected item(s) to a specific point in their history.
Proceed to the next step of an in-progress action such as rebasing or grafting.
Stop an in-progress action such as rebasing or grafting.
Show a list of commits specific to the selected item in reverse chronological order.
Display the contents of the selected item, as lines, with the latest commit's information (author, date, metadata) alongside.
Create a copy of the selected item(s) in the current repository using appropriate source control mechanisms.
Move the selected item(s) from one location to another (including renaming) in the current repository using appropriate source control mechanisms.
Open the diff for the conflicted item(s) in the application specified in Preferences -> Diff.
Choose the contents of the current branch when there's a conflict.
Choose the contents of the branch that you are merging when there's a conflict.
Cancel an in-progress merge.
Update the selected item(s) to indicate conflicts were resolved manually.
Update the selected item(s) to indicate a conflict or other problem exists still.
Git Lfs Pull Sourcetree
Configure the necessary changes to enable Git LFS for the current repository.
Add or remove the selected item(s) from the Git LFS index and storage when pushing.
Fetch any LFS changes from the remote for the current repository and then checkout appropriate files.
Retrieve LFS changes from the remote for the current repository.
Retrieve large file content from the Git LFS storage endpoint to replace existing placeholder files in the current repository.
Remove LFS item(s) that have been checked out into the current repository. This frees up space if they are unused.
Configure Flow prefixes for the current repository.
Begin the next Flow action based on what's currently in progress for the current repository.
Create a branch with the feature prefix and specified name.
Wrap up the in-progress Flow feature branch.
Create a branch with the release prefix and specified name.
Wrap up the in-progress Flow release branch.
Create a branch with the hotfix prefix and specified name.
Wrap up the in-progress Flow hotfix branch.
Wraps the selected text in Markdown syntax for a header.
Inserts the Markdown syntax for a single line.
Wraps the selected text in Markdown syntax to be bold.
Wraps the selected text in Markdown syntax to be italicized.
Inserts the Markdown syntax for a bullet at the start of selected text.
Inserts the Markdown syntax for a number sequence at the start of selected text.
Wraps the selected text in Markdown syntax to be formatted as code.
Wraps the selected text in Markdown syntax for a link, focused to edit the URL portion.
Git Lfs Sourcetree
Git Lfs Not Supported Sourcetree
- New user? Check out Getting Started with Sourcetree.
- Need help? Join the Atlassian Community today!
- Created in January 2018 by Brian Ganninger.