HTTP Made Simple, Part 2: Method Safety, And Idempotence

Jan 11, 2014Dan Yoder

In part 1, we said that HTTP views the world as a distributed key-value store.You can make a pretty strong argument that the HTTP model is actually a distributed hash table. The protocol’s designers likely didn’t think in those terms (the first documented use of the term was in 1986 in reference to Linda, so it’s possible those ideas influenced HTTP somehow), but the simliarities are profound. DHT schemes usually rely on a two tier system for resolving keys. A key is hashed to a secondary server, which can actually resolve the reference. This precisely mirrors what HTTP does with URLs, which include a host component. Effectively, HTTP leverages TCP/IP and DNS to resolve a reference to a specific server. This partly explains why it scales so well. It’s an extremely clever idea that we tend to take entirely for granted today. HTTP lacks recovery capabilities associated with DHTs, but, even there, redirects arguably perform a similar function. The URLs are the keys and the values are resources. Resources, in turn, are actually dictionaries of different representations, or formats, for the resource. For example, a video resource might have different encodings, each of which is a representation that can be accessed if you know its media type. In Part 2, we’re going to begin to explore this idea in more detail, starting with the verb we’ve conspicuously ignored up until now: POST.

Which One Of These Methods Does Not Belong?

The core operations for a key-value store are get, put, and delete. As you’d expect, each of these correspond to well-defined HTTP verbs. And by well-defined, I mean that they’re more than just window-dressing to indicate intent. For example, a client might cache the response to a GET request exactly because it’s defined to allow that.

But HTTP includes a fourth verb, POST, which provides for cases where strict key-value store semantics don’t suffice. Rather than take the pedantic tack of insisting that everything fit into a single abstraction, HTTP gives you POST as a fallback.Unfortunately, for historical reasons, this led developers to misunderstand and overuse POST, which, in turn, contributed heavily to the confusion that surrounds HTTP to this day.

The Supporting Cast

So that accounts for the existence of POST. What about PATCH, HEAD, OPTIONS, and so forth? It’s easy when looking at all these methods to lose sight of the underlying abstraction that HTTP provides. It’s important to understand that these other methods exist largely in support of GET, PUT, and DELETE.

`PATCH`: Small Alterations

Let’s start with PATCH, which is a variation on PUT. For large resources, we might not want to update the entire resource. We can specify byte ranges using HTTP headers, but sometimes even this isn’t enough. Sometimes we want to provide a logical description of an update, such as update the name and date-of-birth, which doesn’t strictly correspond to a byte range. For these cases, we can use PATCH.

`HEAD`: Tell Me About Yourself

The HEAD method is an analogous variation on GET. We might not want to GET an entire resource, we might just be interested in information about the resource. For example, does it even exist? When was it last modified? The HEAD method works just like GET, except it doesn’t actually return the resource, just the headers, or metadata, about the resource.

`OPTIONS`: Reflecting On What We Can Do

The OPTIONS method provides for limited reflection on an HTTP server. Recall that URLs, which are the keys in our key-value store, have a host component which tells us which server can resolve the URL. We can also simply ask a given host to tell us about a resource, or even about the server itself. In practice, OPTIONS isn’t used much, except with CORS.

Choose Your Own Method

The protocol is extensible, so it’s possible to define other methods. For example, WebDAV adds COPY and MOVE methods, allowing you to copy or move a resource from one key (URL) to another. However, most of the time, you’re better off just sticking with the core methods because (a) their behavior is well-defined (see below) and (b) there’s lot of software out there that takes advantage of this behavior.

In the end, though, the real stars of the show are GET, PUT, and DELETE. The POST method is the workhorse that takes over where the key-value abstraction leaves off, making it possible to request that a server take an arbitrary action.

Safety and Idempotence: Not a Sex-Ed Class

As we discussed earlier, GET, PUT, and DELETE are well-defined, which makes it possible to reason about them. We know that GET isn’t going to delete anything. We know that DELETE will. Thus, we know that it’s safe to call GET, but not DELETE. It’s also not safe to call PUT, because the server will, if possible, replace the value of the given resource with whatever we send it.

Safe

However, we don’t know much at all about POST, because its behavior isn’t well-defined. The server might do any number of things, including creating, updating or even deleting resources. It might debit a bank account or call you a taxi. None of which have to do with our key-value store. It’s basically a remote procedure call. As such, HTTP makes no guarantees about POST. Thus, we say POST, unlike GET, isn’t safe.

Idempotent

If we can call a method repeatedly and not worry too much about it, that means it’s idempotent. This is good to know because it affects how you might use them. For example, you can write retry code fearlessly with idempotent methods. Even if it turns out your original request went through, redundant requests don’t really hurt anything.

Obviously, safe methods are all idempotent. It doesn’t matter how many times you call them because they’re safe. Equally obviously, DELETE is also idempotent, because you can only delete something once, and after that, it’s overkill. But, again, POST isn’t idempotent, because we don’t actually know for sure what it’s doing. You might double the charges to someone’s bank account or call two taxis. So, when using POST, you have to be careful. This, again, is why it’s useful to prefer GET, PUT, and DELETE when you can.

`POST` as Create

For a long time, Web developers were obsessed with mapping HTTP methods to CRUD database operations. Obviously, since HTTP sees the Internet as a giant key-value store, this was a doomed effort: there’s no create, the C in CRUD. That’s by design, not accident, because, again, the underlying model is closer to a hash-table than a relational database.Now, there’s an entirely different question about why a key-value store is the right model. Or, put another way, why doesn’t HTTP have a first-class create method in the first place? The short answer is that it’s redundant since you can already implicitly create a new resource with PUT (which can return a 201 Created).

The confusion was partly due to an idiomatic use of POST to create a new resource. This is completely valid thing to do, but it’s important to understand that POST doesn’t actually mean create. Since HTTP doesn’t define its behavior, it can be used to create new resources, and often is. But you could also, if you knew the key (URL), simply PUT a value to it.

Until Next Time…

This is a nice segue into our next topic, URLs. These are the keys in our global key-value store and, like much of the rest of HTTP, they’re surprisingly misunderstood.