Tech Notes: Recovering without a reflog

Recovering without a reflog

July 20, 2015

I've been tinkering on a project for a few weeks and I decided it was time to publish it somewhere else, just to have the work in two locations in case something bad happened. Of course in doing that I caused something very bad to happen.

In the early stages of a project I often have useless git history so I first wanted to reimport the entire project as a single new commit. I ran these commands:

# Move the old git history aside.
mv .git oldgit
# Start up a new git history.
git init; git add .
# See what I'm about to check in.
git status

But with that I had unintentionally added all the files in the oldgit tree, which isn't what I wanted. So without thinking:

git reset --hard

This restored my tree to its initial state — that of one without any files, deleting everything including the oldgit directory! Typically you don't need to worry much about resetting in git because of the reflog, but there is no reflog here because I haven't made any commits yet.

At first I thought I'd lost everything: there were no files, there was no master branch, .git/index was empty. I'd even gone as far as composing an email to the friends I'd demoed the project to lamenting my mistake. But then I remembered that whenever something is added to the git index there's an associated object created under .git/objects, even if you never check it in. (These leftovers are part of why git gc is necessary to find things to delete.)

So here's how I recovered. The very first thing to do is to checkpoint where you started at, in case something else goes wrong!

cd ..
cp -a myproject what-were-you-thinking
cd myproject

Then I extracted the contents of all the objects.

ls .git/objects/??/* | sed -e 's|.git/objects/\(..\)/|\1|' |
while read obj; do
    git cat-file -p $obj > obj/$obj
done

(After writing this post, Aristotle told me a better way to do this:

git fsck --unreachable --no-reflogs --no-progress |
while read status objtype objname ; do
    git cat-file $objtype $objname > obj/$objname
done

Note that my manual approach will miss packed objects, which wasn't an issue in this particular case but could be in other scenarios.)

At first I thought I'd just be able to look through these to find the git "commit" object for my old master branch in there, but recall that all I had run on this git repository was a single git add which just added all the files as blobs to the index. All of the objects were blobs, no git trees or commits.

The majority of these objects were from a git add of the files within the oldgit directory, so they were blobs of files containing git objects, or twice encoded. I figured I'd need to identify all the ones containing source code and manually replace them.

But thankfully, running file obj/* found one file that it identified as Git index, version 2, 20 entries. That was the .git/index from the old repository, before I ruined everything. So it's easy to just paste that over over the new (empty) one.

cp obj/4a92e84106caea0835d918affe4735009b43d147 .git/index

With that, git status showed that it expected there to be 20 files, each in their original locations. So finally, for each file mentioned in the index, I could just check it out again. (Recall that "checkout" in git means "make the file on disk match the file in the index".)

git ls-files | xargs git checkout

And my files were restored. And this time I will be more careful.