URL copy and paste

April 14, 2010

The URL bar ("omnibox"), despite its appearance, is a weird hybrid of a text box and something else. IP addresses are what computers understand while words (search terms) are what humans understand. The omnibox attempts to span the two simultaneously, transiently switching between searches and URLs, prefetching DNS as you type.

Here's an interesting instance of this tension. Load a URL that contains some Japanese text as a query parameter. (Don't speak Japanese? Here's an example that would come up — a Google search for some Japanese text.) The URL bar shows the text as you'd expect, despite it heading to the server as a bunch of percent-escaped bytes. So what happens when you try to copy and paste this URL?

Well, there are two use cases for copy and paste. If I select a word out of my URL bar and paste it into this post, I expect the word to come through — the human interpretation of the URL. But then if I want to share this URL with a friend, I want to select the entire URL and paste it in its machine-readable form, with the escapes — the machine interpretation of the URL.

So that is what we do: selecting portions of the URL copies it in the human-readable form, while selecting the entire URL copies it in its underlying machine representation. This seemed entirely crazy to me when I first learned about it but I had never noticed it until it was pointed out to me and I confirmed Firefox does it too.

(Why not just always use the human-readable form everywhere? I dunno. I guess you'd need to mandate UTF-8 only, otherwise it'd be unclear what sequence of bytes a given string of Unicode produces, which I think is probably ok because that's what browsers currently assume; but even then I worry about normalization forms and sites that use non-UTF-8 byte strings in their query parameters. I wonder what we and other browsers do with Big5 in a URL?)

This is the reason I'm not too concerned with the news we're dropping the http:// prefix from URLs using that protocol. It's already what browsers assume on input from an unprefixed URL, since users never type them anyway (though our Strict Transport Security has allowed sites to opt into HTTPS instead). And copy and paste should work as before since we're already doing special behavior for it. (What about X selections? The fact that they currently behave differently is just a bug, nothing more. I'm a little surprised it's not going down the same code path.)