Keyboard input across processes

February 26, 2009

The API of "The Web" is effectively the behavior of Internet Explorer. WebKit knows how to translate native keybord events into Windows-like ones for JavaScript. (Gtk and Mac are similar to each other but different than Windows.) But this happens at the wrong layer for us: we get the keyboard event in the browser process (which doesn't run WebKit) and then it the salient details are proxied over to the renderer process (which does). By the time WebKit gets the event, it's too late for its code to query whether the control key is pressed. (And it's the renderer — by the sandbox, it shouldn't have access to the keyboard state.)

A platform-agnostic "web" format keyboard event coincidentally nicely works with our cross-platform-ness as well; the platform-specific browser code converts into a WebKeyboardEvent and then all platforms can share the same serialization code. Avi is refactoring our event-forwarding code to make this picture more of a reality, since it is particularly critical for the Mac. He's been having trouble landing it, though; r10464 is a good picture of one of his attempts.

The other interesting twist about input is keyboard accelerators. We provide the shortcut Ctrl-B to hide and show the bookmarks bar, but if you're typing into a web app that is simulating a document editor, perhaps the app wants Ctrl-B to toggle bold. In the spirit of our slogan: "content, not chrome", the web app gets precedence.

This means that keyboard accelerators are first forwarded to the web content, which decides whether or not it wants to process the key, and then they are forwarded back to the browser so the browser can do the default handling. These two handoffs are asynchronous (we wouldn't want the browser to block waiting for a potentially-malicious page to decide to hang instead of processing keys), so the browser effectively gets a message out of the blue from the renderer saying "process this key as if the user just typed it". This is a security hole, though it's predicated on arbitrary code execution in the renderer.

The message, of coUrse, isn't really out of the blue; we know which keys we had previously forwarded to the renderer. So r10563 keeps a queue of keyboard events we'd previously sent to the renderer, and then only trusts our browser-side list of which keystrokes actually happened when it hears back from the renderer that a key was unprocessed.