sahf crash

September 01, 2009

The Linux Google Chrome build we pushed out to users last Friday was the first official 64-bit release. We then discovered that for some subset of users it would fail to load any pages — the renderer would crash immediately. Amusingly, one of those users was my neighbor over the cubicle wall; I overheard him asking his officemate how to downgrade Chrome since it didn't work after the update.

Adam got it to the point where he had two identical software stacks (binaries, system libs) on two different machines where one would fail with an illegal instruction, and from there (I believe) the v8 team diagnosed it.

Here's the fix.

It turns out the v8 codegen uses the "archaic" sahf instruction in its "one major use": extracting floating point status flags for testing. But, as even mentioned in TODO comments around the fixed code, it's not available on some CPUs. So the fix is just to query the CPU flags and generate different code if the CPU doesn't support this instruction.

In conclusion, it's just the growing pains you'd expect for a new codegen. The more interesting questions, at least for me, are the following:

  1. Ubuntu's PPA has more users than our builds (at least according to popcon), and they've been pushing a 64-bit build for a while (at least a week). Why did nobody report problems there? Perhaps people saw it crash and just shrugged it off, saying "eh, it's unstable"?

  2. What's the proper way to roll back such a thing in the Debian packaging world? If we repackage version n-1 as version n+1, that means every user who never got the bad version n will do another huge download where the actual file content hasn't changed at all. (See also my previous lamentations about the limited packaging architecture used by Linux distros.)