Chromium Notes: The zygote process and software updates

The zygote process and software updates

August 03, 2011

When you make a new tab Chrome (usually) starts a new process for that tab. How is this done? It would seem natural to just fork(), but fork can't be used safely in the presence of threads. fork only forks the current thread but other threads may be holding locks (including e.g. inside glibc or in the allocator) which would never be released after the fork.

If you are careful to not touch anything after a fork, it can be safe to immediately exec. This matches the process launching model on Windows (no fork, only fork+exec), with the negative that it forces the overhead of startup again on each new process. (Code reference: LaunchProcess(), which also knows to e.g. use _exit instead of exit.)

Forking and execing ourselves is how we spawn subprocesses on Mac (I believe; there may be some trickery related to how app bundles work that complicate this). On Linux it is unfortunately more complicated. Updates on Linux are managed by a systemwide package manager that runs independently of other software, which effectively means at any point any file you rely upon can be silently clobbered. (This problem even affects single-process apps like Firefox; an update will clobber some JavaScript used in the UI and suddenly things will either crash or get weird.) In Chrome's case, if Chrome binary itself is updated while the browser is running, processes spawned by the running Chrome would be the newer Chrome, which may have made an incompatible change to the interface between Chrome processes.

Instead, at startup, before we spawn any threads, we fork off a helper process. This process opens every file we might use and then waits for commands from the main process. When it's time to make a new subprocess we ask the helper, which forks itself again as the new child. By virtue of always forking from the same initial process, we guarantee that we are always running the same code; even if the files we opened are replaced by a system update our handle on them is the handle for the previous file. (That works as long as nobody overwrites the contents of the file we have open; thankfully, package updates write a new file and rename it over the old name, leaving our open copy the only remaining reference to the old file.)

(Code reference: ChildProcessLauncher's LaunchInternal(), the gory ifdef soup used when launching a subprocess. Truly some ugly code.)

This solution is both clever and an ugly hack. Any time someone adds code to Chrome that interacts with a file on disk they either need to be aware that they need to preemptively open it or they will produce mysterious failures across updates (in practice, usually the latter; e.g. bug 35793: Devtools stop working when chrome gets updated). An interesting question to ask is: why is this not a problem on Windows and Mac?

On Windows, files are locked if any process is using them, which forces a design where updates install into a separate directory. But — annoyingness of locking aside — in fact I think that design is preferable. To start with, a given version of Chrome will know its files will remain unmolested by updates. Furthermore, when an update happens, the updater can write out a separate "update succeeded" sentinel after writing all the files out, making impossible for an aborted update to leave both the previous and next version in a half-working state. (On Mac, we take a similar approach; I don't know enough about Macs to know whether the versioned directories within bundles make this magically work.)

With all this in mind you might reasonably ask why Linux needs to be special: why we waste memory on this zygote process launcher and have extra buggy codepaths just to support an inferior update model. (Note that by using .deb files we also lose our tiny incremental updates.)

And to that I can only answer the thinking we had at the time: one, we wanted to be good citizens on Linux; one distinction between "lame port of a Windows app" and "real Linux software" is exactly whether you distribute as a tarball or as a package. Secondly, and more importantly, we knew that regardless of what we did for Google Chrome the Linux distros would attempt to stuff Chromium into their package manager even when they know it breaks the app, much like they've done to Firefox. Now that I've summarized it in these terms it sounds a little depressing, but there it is; with ChromeOS where we control the stack we have more intelligent updates.