WebSocket Load Balancing Explained

Load balancing HTTP is straightforward. Requests are stateless and short-lived, so round-robin across your server fleet works fine. Each request can go to any server; none of them need to remember anything about previous requests.

WebSockets break all of these assumptions. Connections are persistent — lasting minutes or hours. They're stateful — a server holds a specific client's connection and can only deliver messages through that specific socket. And they're long-lived enough that connection distribution across servers drifts significantly over time.

Here's how WebSocket load balancing actually works.

Why Round-Robin Fails

Imagine two servers. A client connects to Server 1. Later, your application publishes a message that should reach that client. The publishing code fires a request to the load balancer. Round-robin sends it to Server 2. Server 2 has no connection to this client. The message is never delivered.

Client ──── WebSocket ──── Server 1 (has the connection)
                                  ✓ can deliver messages

Application publishes ──── Load Balancer ──── Server 2 (no connection)
                                                    ✗ message dropped

With HTTP, every server can handle every request because requests carry all necessary state. With WebSockets, the state — the open socket — lives on one specific server.

Solution 1: Sticky Sessions

Sticky sessions (session affinity) tell the load balancer to always route a given client to the same backend server. Once a client is assigned to Server 1, every subsequent request from that client goes to Server 1.

nginx with IP hash

upstream ws_backends {
    ip_hash;
    server backend1.internal:8080;
    server backend2.internal:8080;
    server backend3.internal:8080;
}

server {
    listen 443 ssl;
    ssl_certificate /etc/ssl/certs/server.pem;
    ssl_certificate_key /etc/ssl/private/server.key;

    location / {
        proxy_pass http://ws_backends;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;
    }
}

proxy_http_version 1.1 and the Upgrade/Connection headers are mandatory for WebSocket proxying. Without them, nginx will not forward the upgrade request correctly.

Cookie-based stickiness with HAProxy

IP hashing fails when many clients share an IP (corporate NAT, mobile carriers). Cookie-based stickiness is more reliable:

backend ws_servers
    balance roundrobin
    cookie SERVERID insert indirect nocache
    server ws1 10.0.0.1:8080 check cookie ws1
    server ws2 10.0.0.2:8080 check cookie ws2
    server ws3 10.0.0.3:8080 check cookie ws3

HAProxy injects a SERVERID cookie during the HTTP upgrade; subsequent requests carry the cookie, telling the load balancer which backend to use.

Downsides of sticky sessions

Uneven load. Servers that handle long-lived connections accumulate them over time. A server that got hit during a traffic peak stays overloaded even as traffic drops. Servers added later receive fewer connections than they should.

Failover. When a sticky server dies, all its connections are lost. Clients reconnect and the load balancer distributes them — but any in-memory subscription state is gone. Clients must re-subscribe.

Deployments. Rolling restarts require graceful connection draining. Without it, clients see abrupt disconnects on every deploy.

Solution 2: Stateless Servers with Redis Pub/Sub

The more robust approach eliminates stickiness entirely. Make your WebSocket servers stateless with respect to message delivery by using Redis as the fanout layer.

Every server subscribes to Redis for each channel it has local connections on. When any server receives a message to publish, it sends to Redis. Redis delivers it to all subscribed servers. Each server fans out to its local connections.

Server 1 has clients A, B subscribed to "chat:room-1"
Server 2 has clients C, D subscribed to "chat:room-1"

Publisher → redis PUBLISH "chat:room-1" message
                    ↓
  Server 1 listener → A, B receive message
  Server 2 listener → C, D receive message

Now any server can handle any client. The load balancer can use pure round-robin. A dead server means its local clients reconnect to any healthy server — no sticky routing required.

const redisSubscriber = new Redis(process.env.REDIS_URL);

function subscribeToChannel(channelName) {
  redisSubscriber.subscribe(`app:${channelName}`, (message) => {
    const localConns = localSubscribers.get(channelName) ?? new Set();
    for (const conn of localConns) {
      conn.send(message);
    }
  });
}

// Publishing — any server can do this
const redisPublisher = new Redis(process.env.REDIS_URL);
await redisPublisher.publish(`app:${channelName}`, JSON.stringify(payload));

L4 vs L7 Load Balancing

Most HTTP load balancers operate at Layer 7 — they parse HTTP headers, route based on URL paths, and terminate TLS. This flexibility comes with overhead.

For WebSockets, once the connection is established, all communication is binary frames — not HTTP. An L7 load balancer adds parsing overhead for the upgrade but adds no value for the ongoing connection. L4 (TCP-level) load balancers are faster for this workload:

# HAProxy — L4 TCP mode for WebSockets
frontend websockets
    bind *:443 ssl crt /etc/ssl/certs/bundle.pem
    mode tcp
    default_backend ws_pool

backend ws_pool
    mode tcp
    balance leastconn
    option tcp-check
    server ws1 10.0.0.1:8080 check
    server ws2 10.0.0.2:8080 check
    server ws3 10.0.0.3:8080 check

leastconn is better than roundrobin for WebSockets — it routes each new connection to the server with the fewest active connections, self-balancing over time as long-lived connections skew distribution.

Graceful Connection Draining

Rolling deployments disconnect clients unless you drain connections first:

Mark as draining — stop accepting new WebSocket upgrades (return 503 for new requests)
Notify clients — send a close frame with code 4000 ("server restarting")
Clients reconnect — well-implemented clients treat code 4000 as a signal to reconnect immediately, not wait for exponential backoff
Wait for drain — wait for connection count to reach zero or a timeout (30 seconds)
Shut down — stop the process

process.on('SIGTERM', async () => {
  console.log('Draining connections...');
  server.close(); // stop accepting new HTTP connections

  for (const conn of activeConnections) {
    conn.close(4000, 'Server restarting');
  }

  await waitForConnectionsToClose({ timeout: 30_000 });
  process.exit(0);
});

Health Checks

Expose a health endpoint that checks Redis connectivity — since Redis failure affects message delivery even if the WebSocket server process is healthy:

app.get('/health', async (req, res) => {
  try {
    await redis.ping();
    res.json({ status: 'ok', connections: activeConnections.size });
  } catch {
    res.status(503).json({ status: 'degraded', reason: 'redis unreachable' });
  }
});

OS-Level Tuning

For high connection counts, the OS needs tuning beyond the application:

# Raise file descriptor limit (each WebSocket = one fd)
sysctl -w fs.file-max=2000000

# TCP keepalive to detect dead connections
sysctl -w net.ipv4.tcp_keepalive_time=60
sysctl -w net.ipv4.tcp_keepalive_intvl=10
sysctl -w net.ipv4.tcp_keepalive_probes=6

# Increase accept queue
sysctl -w net.core.somaxconn=65535

# Per-process limit
ulimit -n 1000000

The Managed Alternative

Getting load balancing right — sticky sessions or Redis fanout, L4 vs L7, graceful draining, health checks, OS tuning — is real infrastructure work. It's solvable, but it's time you're not spending on your product.

Managed WebSocket infrastructure like Apinator runs a distributed fleet of servers, handles connection routing, Redis fanout, and rolling deployments internally. Your application publishes events over HTTP. The connection and routing layer is not your problem.

For teams where WebSocket infrastructure is a means rather than an end, that trade-off is almost always worth making.