HTTP Made Simple, Part 2: Method Safety, And Idempotence
In part 1, we said that HTTP views the world as a distributed key-value store.You can make a pretty strong argument that the HTTP model is actually a distributed hash table. The protocol's designers likely didn't think in those terms (the first documented use of the term was in 1986 in reference to Linda, so it's possible those ideas influenced HTTP somehow), but the simliarities are profound. DHT schemes usually rely on a two tier system for resolving keys. A key is hashed to a secondary server, which can actually resolve the reference. This precisely mirrors what HTTP does with URLs, which include a host component. Effectively, HTTP leverages TCP/IP and DNS to resolve a reference to a specific server. This partly explains why it scales so well. It's an extremely clever idea that we tend to take entirely for granted today. HTTP lacks recovery capabilities associated with DHTs, but, even there, redirects arguably perform a similar function. The URLs are the keys and the values are resources. Resources, in turn, are actually dictionaries of different representations, or formats, for the resource. For example, a video resource might have different encodings, each of which is a representation that can be accessed if you know its media type. In Part 2, we're going to begin to explore this idea in more detail, starting with the verb we've conspicuously ignored up until now:
Which One Of These Methods Does Not Belong?
The core operations for a key-value store are get, put, and delete. As you'd expect, each of these correspond to well-defined HTTP verbs. And by well-defined, I mean that they're more than just window-dressing to indicate intent. For example, a client might cache the response to a
GET request exactly because it's defined to allow that.
But HTTP includes a fourth verb,
POST, which provides for cases where strict key-value store semantics don't suffice. Rather than take the pedantic tack of insisting that everything fit into a single abstraction, HTTP gives you
POST as a fallback.Unfortunately, for historical reasons, this led developers to misunderstand and overuse
POST, which, in turn, contributed heavily to the confusion that surrounds HTTP to this day.
The Supporting Cast
So that accounts for the existence of
POST. What about
OPTIONS, and so forth? It's easy when looking at all these methods to lose sight of the underlying abstraction that HTTP provides. It's important to understand that these other methods exist largely in support of
PATCH: Small Alterations
Let's start with
PATCH, which is a variation on
PUT. For large resources, we might not want to update the entire resource. We can specify byte ranges using HTTP headers, but sometimes even this isn't enough. Sometimes we want to provide a logical description of an update, such as update the name and date-of-birth, which doesn't strictly correspond to a byte range. For these cases, we can use
HEAD: Tell Me About Yourself
HEAD method is an analogous variation on
GET. We might not want to
GET an entire resource, we might just be interested in information about the resource. For example, does it even exist? When was it last modified? The
HEAD method works just like
GET, except it doesn't actually return the resource, just the headers, or metadata, about the resource.
OPTIONS: Reflecting On What We Can Do
OPTIONS method provides for limited reflection on an HTTP server. Recall that URLs, which are the keys in our key-value store, have a host component which tells us which server can resolve the URL. We can also simply ask a given host to tell us about a resource, or even about the server itself. In practice,
OPTIONS isn't used much, except with CORS.
Choose Your Own Method
The protocol is extensible, so it's possible to define other methods. For example, WebDAV adds
MOVE methods, allowing you to copy or move a resource from one key (URL) to another. However, most of the time, you're better off just sticking with the core methods because (a) their behavior is well-defined (see below) and (b) there's lot of software out there that takes advantage of this behavior.
In the end, though, the real stars of the show are
POST method is the workhorse that takes over where the key-value abstraction leaves off, making it possible to request that a server take an arbitrary action.
Safety and Idempotence: Not a Sex-Ed Class
As we discussed earlier,
DELETE are well-defined, which makes it possible to reason about them. We know that
GET isn't going to delete anything. We know that
DELETE will. Thus, we know that it's safe to call
GET, but not
DELETE. It's also not safe to call
PUT, because the server will, if possible, replace the value of the given resource with whatever we send it.
However, we don't know much at all about
POST, because its behavior isn't well-defined. The server might do any number of things, including creating, updating or even deleting resources. It might debit a bank account or call you a taxi. None of which have to do with our key-value store. It's basically a remote procedure call. As such, HTTP makes no guarantees about
POST. Thus, we say
GET, isn't safe.
If we can call a method repeatedly and not worry too much about it, that means it's idempotent. This is good to know because it affects how you might use them. For example, you can write retry code fearlessly with idempotent methods. Even if it turns out your original request went through, redundant requests don't really hurt anything.
Obviously, safe methods are all idempotent. It doesn't matter how many times you call them because they're safe. Equally obviously,
DELETE is also idempotent, because you can only delete something once, and after that, it's overkill. But, again,
POST isn't idempotent, because we don't actually know for sure what it's doing. You might double the charges to someone's bank account or call two taxis. So, when using
POST, you have to be careful. This, again, is why it's useful to prefer
DELETE when you can.
POST as Create
For a long time, Web developers were obsessed with mapping HTTP methods to CRUD database operations. Obviously, since HTTP sees the Internet as a giant key-value store, this was a doomed effort: there's no create, the C in CRUD. That's by design, not accident, because, again, the underlying model is closer to a hash-table than a relational database.Now, there's an entirely different question about why a key-value store is the right model. Or, put another way, why doesn't HTTP have a first-class
create method in the first place? The short answer is that it's redundant since you can already implicitly create a new resource with
PUT (which can return a
The confusion was partly due to an idiomatic use of
POST to create a new resource. This is completely valid thing to do, but it's important to understand that
POST doesn't actually mean create. Since HTTP doesn't define its behavior, it can be used to create new resources, and often is. But you could also, if you knew the key (URL), simply
PUT a value to it.
Until Next Time…
This is a nice segue into our next topic, URLs. These are the keys in our global key-value store and, like much of the rest of HTTP, they're surprisingly misunderstood.