Deleting files from Git commit history

If you commits an sensitive file or a huge unwanted file by accident, you may want to remove it from every commit. Because there are issues with filter-branch , here we will use its alternative and the recommend filter-repo command to do that. And git filter-repo is much more easier to use than filter-branch.

About stopping using filter-branch

Git has recommended you to use filter-repo to replace filter-branch when you run git filter-branch:

         git-filter-branch has a glut of gotchas generating mangled history
         rewrites.  Hit Ctrl-C before proceeding to abort, then use an
         alternative filtering tool such as 'git filter-repo'
         (https://github.com/newren/git-filter-repo/) instead.  See the
         filter-branch manual page for more details; to squelch this warning,
         set FILTER_BRANCH_SQUELCH_WARNING=1.

Delete files and/or folders from commit history

Using filter-repo to delete files or folders from the commit history is very easy, just pass it with --path to specify the paths to files and/or folders to delete and --invert-paths. Be care to take such actions for it would make changes that may can not be recovered. You can make a copy of your repository first before you run filter-repo.

Below is an example:

# Delete build folder and test.txt file from all commits
# You can specify folder with or without the trailing slash.
$ git filter-repo --path build/ --path test.txt  --invert-paths
Parsed 7 commits
New history written in 4.04 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 90c767d delete c.txt from lib of inc
Enumerating objects: 70, done.
Counting objects: 100% (70/70), done.
Delta compression using up to 8 threads
Compressing objects: 100% (55/55), done.
Writing objects: 100% (70/70), done.
Total 70 (delta 18), reused 0 (delta 0), pack-reused 0
Completely finished after 5.64 seconds.

Note:

If your repository is not a fresh clone, you also need to pass it --force option to force the execution, otherwise it prompts below :

$ git filter-repo --path build/ --path test.txt  --invert-paths
Aborting: Refusing to destructively overwrite repo history since
this does not look like a fresh clone.
  (expected at most one entry in the reflog for HEAD)
Please operate on a fresh clone instead.  If you want to proceed
anyway, use --force.

Without --invert-paths, it will only keep the files or folders you pass it with --path:

# Keep only src/ folder and readme.txt but delete others
$ git filter-repo --path src/ --path readme.txt

You can even use two commands to use both the inclusion mode and exclusion mode (with --invert-path).

# Keep the src/ folder and readme.md but exclude files under src/ named data
# --path-glob, glob of paths to include in filtered history.
#  (*) will match across multiple directories under src/ folder.
git filter-repo --path src/ --path readme.md
git filter-repo --path-glob 'src/*/data' --invert-paths

~~Not recommend: delete a huge folder from each commit with git filter-branch~~

git filter-branch executes the specified command for each commit specified by you and generates new commits.

Before you start, you must keep it in mind that this operation changes the existing history. If it is a public repository and someone have did some work based on the commits you want to rewrite, you’d better not do this. If you have to, remember to notify them to run git pull --rebase command.

Here is an example of how to remove a huge folder from each commit which is committed accidentally at first.

You’d better test below commands in a temporary repository to make sure that they work properly for your git version. It is met that git filter-branch below removes the folder in working tree as well but it should not.

We do it in a new testing branch, when the result is what we want then reset it as the prior branch.

# Do it in a new testing branch
$ git checkout -b test

# Remove 'build' folder from every commit on the new branch
# --index-filter, rewrite index without checking out
# -r, remove recursively in subfolders
# --cached, remove it from index but not include working tree
# --ignore-unmatch, ignore if files to be removed are absent in a commit
# --prune-empty, remove empty commits generated by 'git rm' command
# HEAD, execute the specified command for each commit reached from HEAD by parent link
$ git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch build' --prune-empty HEAD
Rewrite fee4b8ee9df321a877cd2663b20b293eea4a1f8c (1/2)rm 'build/main.app'
Rewrite 63f272ab5152c66693614efae77567799837c6e0 (2/2)
Ref 'refs/heads/test-filter' was rewritten

# The output is OK, reset it to the prior branch master
$ git checkout master
$ git reset --soft test

# Remove test branch
$ git branch -rm test

# Push it with force
$ git push --force origin master

If you changed commits in remote repository, remember notice other members execute below command:

# Tell others to execute below command if you changed commits in remote repository.
$ git pull --rebase

Note:

  1. If --ignore-unmatch option is not added, it will fail when the files to be removed do not exist in the commit.
  2. The files you removed will stay in disk for a while, they will be removed entirely in the next automatic garbage collection of git.

Tips: To avoid adding unwanted files by accident, you should ignore it.

Other useful options:

# Execute the specified command for the last 5 commit
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch user.pem' HEAD~6..HEAD

# Execute the specified command for all branches
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch user.pem' -- --all

# Update tags when executing filter-branch, remember to push them to remotes afterwards
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch user.pem' --tag-name-filter cat HEAD

# Remove empty commits generated by 'git rm' command
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch user.pem' --prune-empty HEAD

Resources

git filter-repo

git filter-branch

  • git-filter-branch command

    Warning

    git filter-branch has a plethora of pitfalls that can produce non-obvious manglings of the intended history rewrite (and can leave you with little time to investigate such problems since it has such abysmal performance). These safety and performance issues cannot be backward compatibly fixed and as such, its use is not recommended. Please use an alternative history filtering tool such as git filter-repo. If you still need to use git filter-branch, please carefully read SAFETY (and PERFORMANCE) to learn about the land mines of filter-branch, and then vigilantly avoid as many of the hazards listed there as reasonably possible.

  • github: removing sensitive data from history

  • git book: rewriting history