Git Smart: git bisect

Here at Panda Strike, we are big fans of git. As anyone who has been writing software for long enough to remember tools like CVS or Subversion will tell you, the day they were first introduced to git, it was like their wishes came true, and version control would never be the same again.

While a basic knowledge of git (i.e. checkout, add, commit, push, and pull) is fairly common among developers these days, when it comes to advanced topics, you still earn a lot of puzzled faces and blank staring eyes when you start talking about things like git bisect or git rebase. This blog series will shed some light on these topics of advanced gitology.

git bisect, or git can do THAT?

The first time you introduce a developer to the magic of git bisect, the reaction is usually a minute of contemplative silence, followed by an outburst of joy that could only be matched by someone who had been lost for weeks in the desert and finally found not just a source of fresh water, but an entire convenience store filled with Gatorade, coconut water, and all the soda you could think of.

The reasons for this, I believe, are twofold: one, the basic premise is so simple that you can explain it to a computer science freshman and they’ll GET it. Second, the power of it is so great that it can only be described as being handed a magical bug-finding sword (my metaphors are kinda running dry here).

Anyways, imagine the following scenario: you come into work on a Monday, bleary eyed, still recuperating from last weekend (that fourth Bloody Mary on Sunday may have been one too much), you log into your workstation, only to find out that the build is broken. One of your colleagues had been working over the weekend, fixing that critical production bug that had the boss on edge last Friday afternoon, until this angel of a human being decided to take one for the team so you all could enjoy that football game.

And while that critical bug has certainly been fixed, some other, slightly less important thing is broken now, and you know it’s only going to be a matter of time until the product manager finds out, and he certainly won’t be happy to hear that. Of course, the selfless soul who sacrificed his weekend so you could get drunk has taken a fair amount of experimentation to isolate the issue, as evidenced by a stream of commits with messages like “Trying again” and “12th attempt”.

So, what’s a Panda to do? Pandas don’t actually ever come to the office bleary eyed on Mondays, because we don’t work in offices. Offices don’t have bamboo.
Also, we love our jobs, so we don’t have to drink 4 Bloody Marys to forget that tomorrow is Monday.

Enter git bisect: first, we have to tell git that we’re about to do a bisect. We do that by typing

git bisect start

Now, we want to inform git that the current commit is broken. We do this as follows:

git bisect bad

Still with me? Good. Now, we go back in time, and attempt to find a commit that was still working. It doesn’t have be the last commit that was working, just any commit that was working and isn’t too old. So we just randomly pick one from Friday afternoon, when you knew things were still in good shape. No worries if that’s 10 or 20 steps back. So we do this:

git checkout <some hash>
(run tests to confirm build is okay)
git bisect good

And now, the magic kicks in: git will attempt to find the LAST working commit by splitting the interval in half, and asking you to find out whether that is broken or not. So let’s say you went 20 commits back (a.k.a. HEAD~20) to find some working code. Then the next commit git will ask you to test will be 10 commits back (a.k.a. HEAD~10)from the most recent version.

So you check the build again, run your tests (or whatever you have to do), and if you confirmed that the bug is still present, you would do

git bisect bad

or, if the commit is bug-free,

git bisect good

After which git again will split the upper or lower interval in half, check out the commit right in the middle, and ask you do run your tests again. In our example, let’s say HEAD~10 was still bad, git will now check out HEAD~15.

Anyone who remembers binary search from CS 101 can see how this will play out: we keep dividing the interval in half until there is only one commit left. By the logic of mathematics that must be the first commit that introduced that particular issue. Basically, we just reduced our work of finding the exact commit that broke the build from O(n) to O(log n).

Once you have isolated the exact commit that broke the build, finding the exact line is only a matter of time. For this reason, I would recommend against the practice of squashing your merges. This basically reduces an entire feature branch to a single (usually fairly large) commit. And while it keeps your history short and neat, at the same time, it makes git bisect rather useless. Hopefully, you’ve been keeping your commits small and focused, so you don’t have to wade through a 500 line diff. Unfortunately, git can’t help you with that.

Check out [Part 2](/posts/20141126-git-smart-git-rebase) to learn about `git rebase`.