Plugin loading regression
I sat down to write a post about another project I've been working on but it required so much backstory I thought I'd split it into a separate post. In brief, we had a regression related to how we load plugins that motivated a separate project. Here's a description of the regression.
A browser needs to know which plugins are available — not only because pages can use them, but also because JavaScript can query the list of available plugins. Getting the list of available plugins requires scanning some directories, while getting which mime types a given plugin file is capable of handling required poking into the file. (On Linux, the full list of directories you ought to scan is hilariously long, in part due to gratuitous differences between Linux distros; I believe Windows is also disasterous in that you also need to query the registry as well as hardcode the various paths to well-known plugins.)
Because of this extra work, we don't do this scan during startup.
Instead we load the plugin list lazily: some seconds after startup we
kick off a thread in the background to scan for plugin metadata, but
if a page happens to need the plugin list (for example, if it has an
<embed>
tag) before that background process completes, we block the
page and run the scan immediately. This means that in most cases the
data is always available, and in the rarer case where a page needs a
plugin early only that page blocks on the list loading.
Normally all of this extra machinery isn't especially needed; scanning
a few directories and files doesn't take that long, especially on
machines where where the disk is warm you're already running a
browser. But on Linux in particular it can be pretty bad. On Windows
and Mac, the list of mime types a given plugin file supports is
available as metadata in the file — a quick seek and read, presumably
— but for historical reasons the Linux plugin API differs in that
querying a plugin (a .so
on Linux) as to which mime types it
supports requires actually dlopen
ing the file and calling a function
in it.
dlopen
means both loading the file and loading dependent libraries.
Within Google, which uses NFS, I clocked Flash at taking 10 seconds
to load due to a bug (which ended up as a Flash security
vulnerability, but that's another story). And once plugin
authors had an opportunity to run code, they started doing all sorts
of bad things, including making blocking IPCs to other running
processes. We've also had some plugins that cause us to crash here,
when we're just asking them for the list of mime types they cover.
While we already run plugins out of process to protect against their
crashes, that is only after we've decided we need the plugin; the
Chrome architecture currently does the "scan the systems for which
plugins are available" code in the main process. The oldest bug I
have assigned to me is one to move this query into a separate
process rather than a separate thread.
In any case, to summarize, loading the plugin list can take a long time. The regression that prompted this post was simple but easy to miss: a new piece of code that ran during startup accidentally prodded the lazily-loaded plugin list, hanging the browser's startup on loading all these plugins.
Next, I'll discuss what lessons we learned from this.