January 17th, 2014
One of the challenges of web development is figuring out how best to use the browser’s download cache. When you surf the ‘net, if anyone uses that term any more, webpages and images aren’t downloaded every time you access them. The browser saves that extra work, time, and bandwidth by caching resources – images, pages, scripts, and styles – it’s already downloaded.
But what happens when you DO want the browser to download a new version? Say you updated an image and want to make sure that the user has the latest content. How do you instruct the browser to download the new version?
One common approach is to attach a query string to the end of the assets that somehow indicates their content. That is, say you have ‘main-image.jpg’. Instead of referencing it in the HTML as
<img src="/images/main-image.jpg">, you would use something like
<img src="/images/main-image.jpg?v1">, where
v1 indicates that this is the first version of main-image.jpg. If you change main-image.jpg but don’t rename the file, you can just change the query string to
v3, etc. The browser sees these as different images and will download the fresh version where appropriate.
Many web frameworks have some sort of “asset pipeline” where these query strings are automagically added to the end of your static file references. I’m now running this site on Jekyll, which has a number of plugins to do just that. I’m a little wary of adding this extra complexity to my project; I’ve been burned by plugins poorly-written in many languages, particularly in Ruby, and I don’t want to add more trouble than it’s worth. So, instead of updating the references to content, as above, I’ve configured my server to instruct the browser to cache content for a limited amount of time.
Here’re the relevant bits from my nginx config:
This tells the browser that after a week, images should be considered stale and checked for updates. Everything else should be cached for up to four hours. This means that any page you’re looking at may be up to four hours old, depending on when you last looked at it. The value “public” for Pragma and Cache-Control headers indicate that the content may be shared in a public cache and that the data isn’t user-specific. This results in a higher likelihood that the data will be available in the cache for subsequent requests.
This is a quick and dirty way to ensure that users (like you!) will get the freshest that this page has to offer!