Moonlight is a reimplementation of Silverlight — a browser plugin for C#. IcedTea is a reimplementation of Java — and includes a browser plugin for Java. How ironic, then, that loading both plugins into the same browser would cause the browser to crash!
The crash wasn't in an obvious place (the stacks looked like somehow
getenv was failing in IcedTea?) so nobody was clear on who to blame.
Because the bug was an interaction between two separate plugin
projects but caused a third project — the browser — to crash, blame
fell through the cracks. It didn't help that it was difficult to
reproduce. I had known about this bug for a while but hadn't done
much on it; I CC'd myself on the various trackers and ignored it.
But then yesterday I discovered a new Mozilla bug where they were
consisdering blacklisting Moonlight. (It seems reasonable enough:
maybe Moonlight was corrupting the environment in such a way that when
IcedTea loaded its
getenv was failing?) And with that, I though I
may as well pop into the
#moonlight IRC channel and ask them if they
had any thoughts on the bug.
Then followed an epic debugging session, full of false starts and
finger-pointing. Red herrings included the fact that I found I
couldn't reproduce the problem on my Lucid machine, nor could I
reproduce it on my Maverick machine unless I was specifically using
Ubuntu's chromium-browser package and Ubuntu's moonlight (neither
google-chrome nor a debug build of chromium, nor the Moonlight from
go-mono.com could produce the crash).
But I now had the
#moonlight guys hooked, and after some
gdb-via-IRC, where they would say "ok, now disassemble
I'd paste the result, we found the problem! (I was in parallel
attempting to rebuild IcedTea from source to get debugging symbols,
but the JDK is enormous.) The fun of the complexity involved ("hey,
pmap says that
%rax is pointing into libmoon's pages!") and the
happiness of finding the bug, helped me overcome my general distate
So here's the bug. IcedTea had some code like this:
int plugin_debug = getenv ("ICEDTEAPLUGIN_DEBUG") != NULL;
This was in the global scope, so it is a static initializer.
Meanwhile, Moonlight's plugin on startup would load some other
libraries and import their symbols into the global scope (apparently
necessary for some other technical reason). One of those symbols was
a function named...
plugin_debug(). So if Moonlight loaded first,
IcedTea would attempt to write the result of that
getenv call over
the code of the Moonlight function, which is an access violation due
to code pages being non-writeable.
Who is to blame? In short, everyone. I blame Moonlight for stuffing
a bunch of symbols into the global scope. I blame IcedTea both for
its static initializer (anyone who loads the library has to do this
extra, perhaps unnecessary work) rather than lazy initialization, as
well as for leaving off the simple
static keyword on the above code
snippet that would've saved us from this bug. (The neighboring lines
in the IcedTea source correctly had the annotation; just a simple
oversight. Or, as I wrote on their bug tracker, they
should build with
-fvisibility=hidden. This isn't the first time
we've needed to fiddle with visibility for plugins...)
And at the end, I myself am ironically to blame. We shouldn't be loading these plugins into the same process at all; it is an old bug that I've also been putting off (as described earlier). Though even had I fixed that bug, Mozilla would still be crashing on this, so I don't feel too bad that we tracked it down.
PS: I honestly don't work on plugins that much; it just happens that they are where all the war stories are. I will try to mix up the content for future posts.