Github Repo Corruption - Sha1 Collision Github Repo Corruption - Sha1 Collision git git

Github Repo Corruption - Sha1 Collision


After some back and forth with GitHub (and some troubleshooting help from ssmir), this problem is split between a thing I needed to solve and a thing Github needed to solve.

What needed to be solved on my end was this:

Hyperion:Convoy-clone saalon$ git fsckwarning in tree 5b7ff7b4ac7039c56e04fc91d0bf1ce5f6b80a67: contains zero-padded file modeswarning in tree 5db54a0cdcd5775c09365c19c061aff729579209: contains zero-padded file modesbroken link from    tree 6697c12387f8909cfe7250e9d5854fd6713d25c1              to    blob 87859f196ec9266badac7b2b03e3397e398cdb18dangling tree 144becf61ae14cec34b6af1bd8a0cf4f00d346d1missing blob 87859f196ec9266badac7b2b03e3397e398cdb18

If you notice, there's a broken link from a tree to a blob. What this is saying is that there's a folder that should have a file in it, but there's not actually a file in it. Someone added a file to their local repo and pushed it, but the file itself didn't end up in the remote repo. Now every time someone pulls down the repo themselves, they get the same broken git filesystem link.

The instructions here do a good job of explaining what to do if you get the problem, but in the midst of the actual crisis, I found the description a little lacking in context. It gave a clear list of steps but not a great idea of the why - at least, not for someone who's still a little new to Git.

Basically, what you need to do is figure out what file that missing blob is, track down what computer checked it in last and go to work on their local repo. Their computer has both the SHA1 link to the file and the contents of the file itself. Everyone else has a pile of broken.

So first, we need to find out what blobs/files are in that tree. To do that, you use git ls-tree.

git ls-tree 6697c12387f8909cfe7250e9d5854fd6713d25c1

In my case, that listed only one file: the file that was corrupt. In your case, it might give a whole list of files, in which case what you need to do is match up the blob/file's SHA1 hash to the one mentioned in the broken link error. In my case, it was this:

100644 blob 87859f196ec9266badac7b2b03e3397e398cdb18    short_description.html

Notice that it doesn't give you the directory the file is actually supposed to be in. That's kind of frustrating, but with a little detective work you can find it. The file might be uniquely named, in which case you can just do a find for the file name. Or you can look through your commit history and see when and where a file called short_description.html was placed.

Here's the part the GitFaq wasn't entirely clear on. They say to recreate the file, then run this command:

git hash-object -w db/content/page_parts/venues/86/short_description.html 

But what is that doing?

Basically, when you run git hash-object is returns the sha1 hash for that file. And (and here's the important part) it creates a blob from the file, and a blob was just what we were missing. Here's the part it's not clear on, though: In order for this to work, the file needs to match exactly the file that initially caused the problem. In other words, if that short_description.html file had content in it, you can't just create a blank file and run hash-object. If you do, the blob's sha1 hash won't match the one git is missing, and that broken link will still be broken.

This is why you need to be on the offending machine's repo. Everyone else has a link but not file and no blob. The offending machine (hopefully) still has the original file. In my case, they didn't have the original file (in my flailing, it had been deleted inadvertently), but when I looked at their commit history on their box, the diff contained the content of the file that had been committed but never made it to github. I copied that out, recreated the file and ran hash-object. The next time I ran git fsck, the broken link was gone.

One note: technically, this problem can be fixed on someone else's repo, provided you can recreate the missing file. In my case, I actually had the file created on the offending machine, but had it e-mailed to me and fixed the problem in a clean repo on a different system. The important thing is recreating the file exactly so it generates the same sha1 hash that the repo is missing.

As for the SHA1 collision problem I got when I tried to push to github? This ugly sucker?

fatal: SHA1 COLLISION FOUND WITH 87859f196ec9266badac7b2b03e3397e398cdb18 !error: unpack failed: index-pack abnormal exit

That was a problem in github's side that they needed to fix.


Just a reminder. A small likelihood of something happening is not the same as it not being able to happen. You can get hash collisions with git's use of sha-1. Once you have two files that collide, the likelihood becomes 100%. At that point, there's slim consolation from the theoretical likelihood. Add a space to one and you'll be fine though.


I ran into the same issue and ran:

git prune  git gc  

which mentioned

error: bad ref for refs/remotes/origin/ticketName

so I removed the reference and that fixed the issue:

rm .git/refs/remotes/origin/ticketName