Fixing `max number of clients` Errors In AWS Elasticache Redis

More is better, right?

More is not always better.

We recently saw an issue where an AWS Elasticache Redis cluster primary node would not accept connections and responded with this error: ERR max number of clients reached

CloudWatch showed that Redis had more than 60,000 current clients.

If this happens to you, here’s how to fix it.

Background

In this particular AWS environment, EC2 instances launch and terminate throughout each day, in a way which results in unused Redis connections remaining open. But by default, AWS Elasticache Redis won’t close idle connections, due to its configuration out of the box.

So thousands of new connections would open, but no connections would ever close. After thousands and thousands of connections opened without closing, Redis would no longer accept new connections.

Two Immediate Solutions

If you’re just trying to put out the fire, there are two ways to immediately address this issue:

  1. restarting the primary node or
  2. failing over to another node in the replication group.

Restarting a node is trivial, but you’ll lose the contents of the cache in the reboot. You might want to check out the documentation for failing over within a Multi-AZ replication group, because it’s non-trivial.

Elasticache Redis Configuration Changes

While forcing a node restart or failover could solve the immediate problem, for a more stable solution, you may want to change the default Elasticache Redis configuration parameters.

Here are three related AWS Elasticache Redis configuration parameters. AWS doesn’t give you the option of changing the first one, maxclients, but you should be aware of it. But you can configure the other two, timeout and tcp-keepalive, so you might want to experiment with them and choose what is best for you.

maxclients

The maxclients option sets the max number of client connections allowed at the same time. Once you reach the limit, Redis will close all new connections with this error: max number of clients reached.

AWS sets this option to 65000. You can modify this if you run Redis on your own server, but within Elasticache, this option isn’t modifiable.

Also, inside the Linux systems upon which Elasticache runs Redis, AWS configures the following limits for the Redis user in /etc/security/limits.conf

#<domain>      <type>  <item>         <value>
#
rdsdb           soft    nofile          65536
rdsdb           hard    nofile          65536

Again, you can’t edit maxclients, but it’s good to know about it. Elasticache disables the next two options by default, but we recommend that you set them up.

timeout

The timeout option will close a connection after a client has been idle for N seconds. You can set this to 3600 seconds, or whatever works best for you.

tcp-keepalive

The tcp-keepalive option will detect dead peers. You can set this to 60 seconds, or whatever works best for you.