HTTP Made Simple, Part 5: Caching And Compression

May 15, 2014Dan Yoder

Here’s what we’ve learned so far:

In part 1, we said that HTTP views the Internet as a big key-value store.
In part 2, we established that GET, PUT, and DELETE were the main methods, with POST acting as a fallback for things that don’t fit the key-value store model.
In part 3, we discussed how to discover and dynamically construct URLs to reduce coupling between client and server.
In part 4, we explored a flexible mechanism, known as content negotiation, that allows the client to choose their preferred content format.

In this installment, we’re going to look at how HTTP helps us optimize performance. After all, network requests are considerably slower than reading and writing from disk or memory. We want to do everything we can to speed things up. As it turns out, HTTP can help us do that.One of the primary mechanisms HTTP provides for doing this is caching. It’s a measure of how confused things have gotten that most of the search results for http caching concern turning caching off. And, indeed, we see this often at clients, due to the fact that, for historical reasons, browsers are very aggressive about caching HTML, CSS, JavaScript, and image assets. But that shouldn’t discourage you from using HTTP caching for your APIs.

Networks Are Slow

Let’s first consider the problem: network requests are slow. The threshold of human cognition is between eighty and a hundred milliseconds. That’s the point where humans will notice a delay. That’s, effectively, our budget for providing a response to an action. If I click a link, and it renders in less than eighty milliseconds, it will seem instantaneous to me. More than a hundred milliseconds and we have a person glaring at their phone.

Latency

Suppose I want to request an image from a server that’s five hundred miles away from me. Even traveling at the theoretical maximum of the speed of light, the round trip is going to take over five milliseconds. That’s the absolute minimum, not accounting for processing time on the server, network delays, or rendering the image. That’s already five percent of our human cognition budget.

Now, five hundred miles is relatively close. But what if I’m talking to a server across the country, three thousand miles away? Now we’re looking at more than 32 milliseconds. And, again, this is just the round trip for the network. That’s nearly a third of our human cognition budget, right there.

Bandwidth

Another dimension of the problem of making network requests is bandwidth. I can only move so much data so fast. If I have a 20MB/s connection, moving just 200kb is going to consume ten percent of our budget, all by itself. Even worse, we don’t always know what kind of bandwidth the client will have. This is especially true for mobile applications. What if the bandwidth slows to 1MB/s? Now we’ve completely blown our budget!

Put simply, network requests can take an otherwise performant application and slow it to a crawl. And it’s all just physics, basically. There’s nothing we can do about it.

Well, almost nothing.

Caching

The good news is that we know how to do this! It’s called caching. The bad news is that caching is hard.There are two hard problems in computer science: cache invalidation, naming things, and off-by-one errors. The good news is that HTTP has done most of the work for us by providing a comprehensive caching model, built on top of the key-value store semantics we’ve already reviewed. We know we can safely cache the value of keys we GET, and when we need to GET them again, we can return the value from cache.

Cache Invalidation

So how does HTTP help us with cache invalidation? HTTP allows the server to provide caching guidance based on meta-data (headers) in the response to a GET request. We use the cache-control HTTP header for this. For any given resource, we can tell the client:

Don’t cache this value at all
Cache it for a certain period of time (expiration model)
Cache it, but ask the server if it’s changed before using the value (validation model)

Put another way, we can invalidate a cached value either periodically or when it changes. The client must also invalidate the cache on DELETE or POST requests (because we know the value may have been updated or no longer exists).This is one example of why meaningful verbs are useful. This makes it easy for us to avoid costly network requests or, at the very least, minimize the amount of data we’re moving.

But, wait! There’s more!

Caching Proxies

It isn’t just our client that can cache responses. So can intermediary servers. In which case, even if the client doesn’t have a response cached, we might still be able to avoid that cross-country network call and hit a server that’s closer to home and already has the content cached. As we saw earlier, if we can make a request to a server that’s just five-hundred miles away, instead of three-thousand, we can free up as much as 27ms (nearly 34%) of our human cognition budget.

HTTP Caching Headers

HTTP accomplishes all of this with a handful of fairly simple headers: cache-control, last-modified, and etag.Wondering why expires isn’t on this list? The cache-control header effectively supersedes it, unless you’re dealing with very old clients or proxies. The client can check to see if a resource has changed using if-modified-since and if-none-match. If it hasn’t a changed, a 304 Not Modified is returned and the client can use the cached version of the resource.

Example: Product Catalog

Let’s suppose we have a product catalog consisting of the names, serial numbers, descriptions, and so forth, for thousands and thousands of products. We want to load this into our client application so that we can implement an auto-complete feature. The thing is, sometimes it changes, and we need to make sure that we push those changes out to the client.

We can’t load it when someone starts typing into an auto-complete field beause the catalog is too large and might take awhile to load. Instead, we load it on startup and then periodically check the server for updates. All we need to do, for this to work, is to stamp the responses with the approriate caching headers. HTTP (and the browser) will take care of the rest.

If we want an up-to-the-minute product catalog, we can just make a request every sixy seconds. But… that’s crazy, right?

Well, no, it isn’t, not if the product catalog is updated on the server-side relatively infrequently (or, even if it is, the server can add a max-age directive to slow things down). In that case, most of those calls will just return a 304 with an empty body. That adds almost no overhead, so we can have our product catalog and load it, too.I’ll be here all week.

And the browser does most of the work for us!In practice, there are sometimes variances between browsers in how they handle caching. Shocking, I know. So you should make sure to test your assumptions carefully on the browsers you want to support. The only thing we did that was atypical of an ordinary (non-cached) request was check to see if the JSON body had changed before parsing it. That seems like a pretty big win.

Compression

Even when we have to load data across the network, we can compress it. The client can ask for this simply by including an accept-encoding header that includes a compressed encoding. Typically, these include gzip or compress. The server must see the header, compress the response body, and set the content-type header appropriately (based on the compression). But, again, since the browser already knows what to do with a compressed response, our client code doesn’t have to change to take advantage of this.

Large JSON responses also tend to compress well, especially if they contain lots of similar objects. Our imaginary product catalog probably consists of list of products that have many similar fields, like title, price, and description, so it would probably compress well. In practice, this often means loading at least five times faster. In our example above, where moving just 200kb consumed 10% of our latency budget, we’re now down to a much more reasonable 2%, even in the worst case.

Paginating Subdividing Resources

We can also break up resources into smaller pieces. This is often known as pagination, but it’s useful to do even when we’re not, strictly speaking, loading pages that someone is scrolling through. In the example above, if our product is quite large, we might want to avoid loading it all at once. For a bandwidth constrained connection (quite common in mobile scenarios), a 10Mb file could easily take ten seconds to load. This is way beyond our cognitive threshold budget.You might think, yeah, but we’re loading that in the background. But that makes it even worse. Someone using your app will just have these inexplicable pauses whenever your product catalog changes, because the execution context of a browser session is single-threaded.

We can simply take the resource URL for the file and add a parameter to it, indicating which part of the resource we want. (Remember, using query parameters allows us to build URLs from other URLs without violating the principal of opacity.) This approach allows us to use the same caching strategy we were using before. Because the URL will be different, everything just works as it did before. HTTP doesn’t care. We simply need to change our server to handle the new request parameter.

Write-Through Caching

We can also cache the responses of PUT requests. Provided we stamp the responses with the appropriate caching headers, the client is free to cache the response just as it would with a GET. And, again, most modern browsers will do this automatically for us.

Exercises For The Reader

We didn’t talk much about the server side of the performance equation. That’s partly because there are a lot of possible ways to approach the problem. There are proxy servers, like Varnish or Squid, that can do a lot of the work for you. Exercise for the reader: how does the design of HTTP help make this use of proxies possible? You can use CDNs, of course, which serve the dual purpose of reducing latency and acting as a proxy cache. There are also simple design patterns you can use in your own application code.

But even if all you do on the server is stamp responses with max-age and etag headers, and then check the etag before sending a response (sending a 304 if the if-no-match header matches the etag you would have returned), you can still see significant performance gains.