Plugin loading regression

November 19, 2010

I sat down to write a post about another project I've been working on but it required so much backstory I thought I'd split it into a separate post. In brief, we had a regression related to how we load plugins that motivated a separate project. Here's a description of the regression.

A browser needs to know which plugins are available — not only because pages can use them, but also because JavaScript can query the list of available plugins. Getting the list of available plugins requires scanning some directories, while getting which mime types a given plugin file is capable of handling required poking into the file. (On Linux, the full list of directories you ought to scan is hilariously long, in part due to gratuitous differences between Linux distros; I believe Windows is also disasterous in that you also need to query the registry as well as hardcode the various paths to well-known plugins.)

Because of this extra work, we don't do this scan during startup. Instead we load the plugin list lazily: some seconds after startup we kick off a thread in the background to scan for plugin metadata, but if a page happens to need the plugin list (for example, if it has an <embed> tag) before that background process completes, we block the page and run the scan immediately. This means that in most cases the data is always available, and in the rarer case where a page needs a plugin early only that page blocks on the list loading.

Normally all of this extra machinery isn't especially needed; scanning a few directories and files doesn't take that long, especially on machines where where the disk is warm you're already running a browser. But on Linux in particular it can be pretty bad. On Windows and Mac, the list of mime types a given plugin file supports is available as metadata in the file — a quick seek and read, presumably — but for historical reasons the Linux plugin API differs in that querying a plugin (a .so on Linux) as to which mime types it supports requires actually dlopening the file and calling a function in it.

dlopen means both loading the file and loading dependent libraries. Within Google, which uses NFS, I clocked Flash at taking 10 seconds to load due to a bug (which ended up as a Flash security vulnerability, but that's another story). And once plugin authors had an opportunity to run code, they started doing all sorts of bad things, including making blocking IPCs to other running processes. We've also had some plugins that cause us to crash here, when we're just asking them for the list of mime types they cover. While we already run plugins out of process to protect against their crashes, that is only after we've decided we need the plugin; the Chrome architecture currently does the "scan the systems for which plugins are available" code in the main process. The oldest bug I have assigned to me is one to move this query into a separate process rather than a separate thread.

In any case, to summarize, loading the plugin list can take a long time. The regression that prompted this post was simple but easy to miss: a new piece of code that ran during startup accidentally prodded the lazily-loaded plugin list, hanging the browser's startup on loading all these plugins.

Next, I'll discuss what lessons we learned from this.