February 17, 2010

What does the reload button do? Well, it reloads the page, obviously. Or at least that was about how far I had thought about it when I thought to poke into fixing the age-old "Chrome should support shift-reload" bug and I discovered there's plenty of subtlety in even a simple-sounding feature.

RFC 2616 (the HTTP 1.1 spec) has a section 14.9.4 titled "Cache Revalidation and Reload Controls" specifically discussing the various meanings "reload" could have. But rather than dive into that, another way of looking at it is to think about it from the user's perspective.

One use case is: I'm looking at a live blog or a stock ticker site and I think they may have updated something on the page. I want to go out to the site and ask it if it has anything new, but I don't want to redownload the stuff (like, say, the site logo) that I already have. That is, I use the existing HTTP support for If-Modified-Since or ETag and just request that the site (and any caching layers in between me and the site) go check if there's anything new for me and show me if so, and the same for subresources.

In a perfect world where caching is done correctly, I think that is all you would ever need. But that leads to the other use case, which is: something has gone wrong in the caching, and I want to flush all caches (both local and held by proxy servers). Here, you request the resources along with the appropriate HTTP headers to instruct everyone along the line to please not use their caches.

These two cases are intended to correspond in Chrome to the refresh button versus holding shift and hitting the refresh button. And though I fixed the bug, my work was just to plumb the shift state down to a flag passed to WebKit, and Darin suggested there are still WebKit bugs in this area.

How do we at least compare to other browsers? I don't know; there is more depth I haven't fully thought through. For example, suppose a server does the common JS/image caching trick where they provide headers that say the resources never expire and then just use a new URL when they want to change the resources. When I do a normal refresh, should I even go out to the network to see if those were modified? How about when I just hit enter in the URL bar of the currently-displayed page: should I just load everything from on-disk cache? (I believe that's what we'd do when you hit the back button.) Which behavior does the meta refresh tag do? And most importantly, how can a normal human being user understand all this?

From this I mostly conclude that web stuff is always more subtle than you'd expect. I came into this thinking that reload meant "go fetch the resources again as if you hadn't fetched them before", but learned that neither plain reload (which will allow servers to tell you to reuse your existing cached entries) nor shift-reload (which includes extra headers to try to get intermediate caches to drop their copy, something that even a first-time load won't do) does that. And also I learned not to trust browsers when they claim they've reloaded something.