Mime sniffing

January 31, 2009

Browsers sniff mime types of HTTP responses, initially because page authors frequently don't get them right* and now because browsers have done it historically.

The worst instance related to mime sniffing is an old IE bug. As I understand it their sniffer tried some image formats and then HTML; then when they added PNG sniffing it was added to the sniff list after HTML, either by mistake or to maintain compatibility with pages that were currently being sniffed as HTML. The result of this is that even valid PNG images can be sniffed as HTML, converting a user-uploadable image into a Javascript (XSS) vector. The Chromium mime sniffer's comments (which are quite readable, and tabulate various browsers' behaviors) describe this as a "dangerous mime type".

But there are plenty of other ways that sniffing can screw you as a site author. Your only defenses if you're building a site are:

I believe this bug is why you cannot view images attached to gmail messages — if you click "view image" in gmail you instead get an HTML page with an <img> tag, and if you right-click on that image and pick "view image" you'll get it served with the attachment header.

To solve this mess, IE introduced the X-Content-Type-Options: nosniff header, which means "don't sniff the mime type". It looks like a reasonable workaround to me: it lets new pages opt into sane behavior without breaking old ones. Chromium added support for it.

It sounded good to developers of a Google-internal HTTP server as well; they added it by default to all responses. And then the bug reports started coming in: "Why does my page render in all browsers but Chromium?" It turned out many of these sites were sending no Content-type header, which, when coupled with the nosniff header, meant Chromium would pick the default of application/octet-stream, triggering a download box.

The fix is to match IE (r8559) for this corner case, which is to instead default to text/plain; I made wisecracks about adding an X-Content-Type-Options-Options: no-really-none-of-these-mime-shenanigans header. Adam (master of content-type sniffing, and I believe editor of the HTML5 sniffing spec) also wrote r8257. This collects stats (aggregated anonymized and only from users who have opted in) on what fraction of pages that we normally would've sniffed but were instead blocked by the header.

* In fairness, the greater problem is that page authors sometimes don't control HTTP headers. They're frequently defined by server configuration, which often requires root on the server or at least a lot more technical know-how than "click on the upload button in your website creator program".