Moonlight vs IcedTea

January 29, 2011

Moonlight is a reimplementation of Silverlight — a browser plugin for C#. IcedTea is a reimplementation of Java — and includes a browser plugin for Java. How ironic, then, that loading both plugins into the same browser would cause the browser to crash!

The crash wasn't in an obvious place (the stacks looked like somehow getenv was failing in IcedTea?) so nobody was clear on who to blame. Because the bug was an interaction between two separate plugin projects but caused a third project — the browser — to crash, blame fell through the cracks. It didn't help that it was difficult to reproduce. I had known about this bug for a while but hadn't done much on it; I CC'd myself on the various trackers and ignored it.

But then yesterday I discovered a new Mozilla bug where they were consisdering blacklisting Moonlight. (It seems reasonable enough: maybe Moonlight was corrupting the environment in such a way that when IcedTea loaded its getenv was failing?) And with that, I though I may as well pop into the #moonlight IRC channel and ask them if they had any thoughts on the bug.

Then followed an epic debugging session, full of false starts and finger-pointing. Red herrings included the fact that I found I couldn't reproduce the problem on my Lucid machine, nor could I reproduce it on my Maverick machine unless I was specifically using Ubuntu's chromium-browser package and Ubuntu's moonlight (neither google-chrome nor a debug build of chromium, nor the Moonlight from go-mono.com could produce the crash).

But I now had the #moonlight guys hooked, and after some gdb-via-IRC, where they would say "ok, now disassemble $pc-32" and I'd paste the result, we found the problem! (I was in parallel attempting to rebuild IcedTea from source to get debugging symbols, but the JDK is enormous.) The fun of the complexity involved ("hey, pmap says that %rax is pointing into libmoon's pages!") and the happiness of finding the bug, helped me overcome my general distate for plugins.

So here's the bug. IcedTea had some code like this:

int plugin_debug = getenv ("ICEDTEAPLUGIN_DEBUG") != NULL;

This was in the global scope, so it is a static initializer. Meanwhile, Moonlight's plugin on startup would load some other libraries and import their symbols into the global scope (apparently necessary for some other technical reason). One of those symbols was a function named... plugin_debug(). So if Moonlight loaded first, IcedTea would attempt to write the result of that getenv call over the code of the Moonlight function, which is an access violation due to code pages being non-writeable.

Who is to blame? In short, everyone. I blame Moonlight for stuffing a bunch of symbols into the global scope. I blame IcedTea both for its static initializer (anyone who loads the library has to do this extra, perhaps unnecessary work) rather than lazy initialization, as well as for leaving off the simple static keyword on the above code snippet that would've saved us from this bug. (The neighboring lines in the IcedTea source correctly had the annotation; just a simple oversight. Or, as I wrote on their bug tracker, they should build with -fvisibility=hidden. This isn't the first time we've needed to fiddle with visibility for plugins...)

And at the end, I myself am ironically to blame. We shouldn't be loading these plugins into the same process at all; it is an old bug that I've also been putting off (as described earlier). Though even had I fixed that bug, Mozilla would still be crashing on this, so I don't feel too bad that we tracked it down.

PS: I honestly don't work on plugins that much; it just happens that they are where all the war stories are. I will try to mix up the content for future posts.