WebSocket Load Balancing Explained
Load balancing WebSockets is trickier than HTTP — persistent connections and stateful servers mean round-robin doesn't cut it. Here's what actually works.
Load balancing HTTP is straightforward. Requests are stateless and short-lived, so round-robin across your server fleet works fine. Each request can go to any server; none of them need to remember anything about previous requests.
WebSockets break all of these assumptions. Connections are persistent — lasting minutes or hours. They're stateful — a server holds a specific client's connection and can only deliver messages through that specific socket. And they're long-lived enough that connection distribution across servers drifts significantly over time.
Here's how WebSocket load balancing actually works.
Why Round-Robin Fails
Imagine two servers. A client connects to Server 1. Later, your application publishes a message that should reach that client. The publishing code fires a request to the load balancer. Round-robin sends it to Server 2. Server 2 has no connection to this client. The message is never delivered.
Client ──── WebSocket ──── Server 1 (has the connection)
✓ can deliver messages
Application publishes ──── Load Balancer ──── Server 2 (no connection)
✗ message dropped
With HTTP, every server can handle every request because requests carry all necessary state. With WebSockets, the state — the open socket — lives on one specific server.
Solution 1: Sticky Sessions
Sticky sessions (session affinity) tell the load balancer to always route a given client to the same backend server. Once a client is assigned to Server 1, every subsequent request from that client goes to Server 1.
nginx with IP hash
upstream ws_backends {
ip_hash;
server backend1.internal:8080;
server backend2.internal:8080;
server backend3.internal:8080;
}
server {
listen 443 ssl;
ssl_certificate /etc/ssl/certs/server.pem;
ssl_certificate_key /etc/ssl/private/server.key;
location / {
proxy_pass http://ws_backends;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
}
}
proxy_http_version 1.1 and the Upgrade/Connection headers are mandatory for WebSocket proxying. Without them, nginx will not forward the upgrade request correctly.
Cookie-based stickiness with HAProxy
IP hashing fails when many clients share an IP (corporate NAT, mobile carriers). Cookie-based stickiness is more reliable:
backend ws_servers
balance roundrobin
cookie SERVERID insert indirect nocache
server ws1 10.0.0.1:8080 check cookie ws1
server ws2 10.0.0.2:8080 check cookie ws2
server ws3 10.0.0.3:8080 check cookie ws3
HAProxy injects a SERVERID cookie during the HTTP upgrade; subsequent requests carry the cookie, telling the load balancer which backend to use.
Downsides of sticky sessions
Uneven load. Servers that handle long-lived connections accumulate them over time. A server that got hit during a traffic peak stays overloaded even as traffic drops. Servers added later receive fewer connections than they should.
Failover. When a sticky server dies, all its connections are lost. Clients reconnect and the load balancer distributes them — but any in-memory subscription state is gone. Clients must re-subscribe.
Deployments. Rolling restarts require graceful connection draining. Without it, clients see abrupt disconnects on every deploy.
Solution 2: Stateless Servers with Redis Pub/Sub
The more robust approach eliminates stickiness entirely. Make your WebSocket servers stateless with respect to message delivery by using Redis as the fanout layer.
Every server subscribes to Redis for each channel it has local connections on. When any server receives a message to publish, it sends to Redis. Redis delivers it to all subscribed servers. Each server fans out to its local connections.
Server 1 has clients A, B subscribed to "chat:room-1"
Server 2 has clients C, D subscribed to "chat:room-1"
Publisher → redis PUBLISH "chat:room-1" message
↓
Server 1 listener → A, B receive message
Server 2 listener → C, D receive message
Now any server can handle any client. The load balancer can use pure round-robin. A dead server means its local clients reconnect to any healthy server — no sticky routing required.
const redisSubscriber = new Redis(process.env.REDIS_URL);
function subscribeToChannel(channelName) {
redisSubscriber.subscribe(`app:${channelName}`, (message) => {
const localConns = localSubscribers.get(channelName) ?? new Set();
for (const conn of localConns) {
conn.send(message);
}
});
}
// Publishing — any server can do this
const redisPublisher = new Redis(process.env.REDIS_URL);
await redisPublisher.publish(`app:${channelName}`, JSON.stringify(payload));
L4 vs L7 Load Balancing
Most HTTP load balancers operate at Layer 7 — they parse HTTP headers, route based on URL paths, and terminate TLS. This flexibility comes with overhead.
For WebSockets, once the connection is established, all communication is binary frames — not HTTP. An L7 load balancer adds parsing overhead for the upgrade but adds no value for the ongoing connection. L4 (TCP-level) load balancers are faster for this workload:
# HAProxy — L4 TCP mode for WebSockets
frontend websockets
bind *:443 ssl crt /etc/ssl/certs/bundle.pem
mode tcp
default_backend ws_pool
backend ws_pool
mode tcp
balance leastconn
option tcp-check
server ws1 10.0.0.1:8080 check
server ws2 10.0.0.2:8080 check
server ws3 10.0.0.3:8080 check
leastconn is better than roundrobin for WebSockets — it routes each new connection to the server with the fewest active connections, self-balancing over time as long-lived connections skew distribution.
Graceful Connection Draining
Rolling deployments disconnect clients unless you drain connections first:
- Mark as draining — stop accepting new WebSocket upgrades (return 503 for new requests)
- Notify clients — send a close frame with code
4000("server restarting") - Clients reconnect — well-implemented clients treat code 4000 as a signal to reconnect immediately, not wait for exponential backoff
- Wait for drain — wait for connection count to reach zero or a timeout (30 seconds)
- Shut down — stop the process
process.on('SIGTERM', async () => {
console.log('Draining connections...');
server.close(); // stop accepting new HTTP connections
for (const conn of activeConnections) {
conn.close(4000, 'Server restarting');
}
await waitForConnectionsToClose({ timeout: 30_000 });
process.exit(0);
});
Health Checks
Expose a health endpoint that checks Redis connectivity — since Redis failure affects message delivery even if the WebSocket server process is healthy:
app.get('/health', async (req, res) => {
try {
await redis.ping();
res.json({ status: 'ok', connections: activeConnections.size });
} catch {
res.status(503).json({ status: 'degraded', reason: 'redis unreachable' });
}
});
OS-Level Tuning
For high connection counts, the OS needs tuning beyond the application:
# Raise file descriptor limit (each WebSocket = one fd)
sysctl -w fs.file-max=2000000
# TCP keepalive to detect dead connections
sysctl -w net.ipv4.tcp_keepalive_time=60
sysctl -w net.ipv4.tcp_keepalive_intvl=10
sysctl -w net.ipv4.tcp_keepalive_probes=6
# Increase accept queue
sysctl -w net.core.somaxconn=65535
# Per-process limit
ulimit -n 1000000
The Managed Alternative
Getting load balancing right — sticky sessions or Redis fanout, L4 vs L7, graceful draining, health checks, OS tuning — is real infrastructure work. It's solvable, but it's time you're not spending on your product.
Managed WebSocket infrastructure like Apinator runs a distributed fleet of servers, handles connection routing, Redis fanout, and rolling deployments internally. Your application publishes events over HTTP. The connection and routing layer is not your problem.
For teams where WebSocket infrastructure is a means rather than an end, that trade-off is almost always worth making.