How to Scale WebSockets to 1 Million Connections

Most scaling advice on the internet is written for HTTP APIs. Stateless requests, horizontal replicas, throw a load balancer in front, done. WebSockets are a different animal. The moment you need more than a few thousand concurrent connections, the assumptions that make HTTP easy to scale actively work against you.

This post walks through the real constraints — what breaks, why it breaks, and what you have to build to get past them.

Why WebSockets Don't Scale Like HTTP

HTTP is stateless by design. A request comes in, a server handles it, the connection closes. Any server in your pool can handle any request, because there's no persistent relationship between the client and a specific process.

WebSockets flip that model. A WebSocket connection is a long-lived TCP socket that stays open for the entire session — minutes, hours, sometimes days. The server process that accepted the handshake holds that connection in memory. It tracks subscriptions, presence state, authentication context, and message history. That state lives in RAM on one specific machine.

The consequence is that WebSocket servers are inherently stateful. You can't route a returning client to any server in the pool and expect it to work. You also can't restart a process without disconnecting every client it's holding.

The Single-Server Ceiling

Before worrying about distribution, you'll hit limits on a single machine. A typical Node.js or Go process can hold somewhere between 10,000 and 100,000 concurrent WebSocket connections before memory becomes the bottleneck. The range is wide because it depends on how much per-connection state you're storing, your message rate, and whether your event loop or goroutine scheduler is struggling.

What breaks first:

Memory: Each open socket consumes RAM for the send/receive buffers, your application state, and OS overhead. At scale, this adds up fast.
File descriptors: Every socket is a file descriptor. Linux processes have a default limit of 1,024 open file descriptors. You'll hit this long before you hit memory.
Event loop saturation: In single-threaded runtimes like Node.js, a spike in messages can block the loop and cause latency for every connected client.
CPU: Broadcasting a message to 50,000 subscribers means serializing and writing to 50,000 sockets. On a single core, that's a real bottleneck.

The ceiling isn't a hard number — it's a combination of these constraints meeting your workload.

OS-Level Tuning: File Descriptors and Socket Limits

Before scaling out, you should max out what a single instance can do. The first thing to adjust is the file descriptor limit.

# Check current limits
ulimit -n

# Raise the limit for the current session
ulimit -n 1000000

# Persist it in /etc/security/limits.conf
echo "* soft nofile 1000000" >> /etc/security/limits.conf
echo "* hard nofile 1000000" >> /etc/security/limits.conf

You also need to tune kernel networking parameters:

# /etc/sysctl.conf
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1

These changes let a single process accept far more connections, but they don't eliminate the per-process memory limit. At some point, you need more machines.

Sticky Sessions: Necessary but Not Sufficient

When you add a second server, you immediately face the statefulness problem. A client connected to Server A sends a message. If the next request (say, a channel auth call) lands on Server B, Server B has no idea who this client is.

The common first solution is sticky sessions — also called session affinity. The load balancer uses the client's IP address or a cookie to always route a given client to the same backend server.

# nginx upstream with ip_hash for sticky sessions
upstream websocket_backends {
    ip_hash;
    server backend1.internal:8080;
    server backend2.internal:8080;
    server backend3.internal:8080;
}

This works up to a point. The problem is that sticky sessions create uneven load distribution. If your users are concentrated behind a corporate NAT or a mobile carrier (which shares one IP across thousands of users), all of them land on the same server. One node gets overloaded while others sit idle.

More critically, sticky sessions do nothing to solve cross-node message delivery. If a client on Server A publishes a message to a channel, clients subscribed to that same channel on Server B never receive it. Your servers are isolated islands.

Redis Pub/Sub for Cross-Node Fanout

The standard solution for cross-node broadcasting is a shared message bus. Redis Pub/Sub is the most common choice.

The architecture looks like this:

Every server node subscribes to the Redis topics corresponding to channels that have local subscribers.
When a client publishes a message, the receiving server publishes it to Redis.
Every other server that has subscribers for that channel receives the message from Redis and delivers it to local clients.

Client A (on Node 1) publishes to "chat:room-42"
  → Node 1 publishes to Redis topic "realtime:{app_id}:chat:room-42"
  → Node 2 receives from Redis, delivers to local subscribers
  → Node 3 receives from Redis, delivers to local subscribers

This is the pub/sub fanout pattern. It decouples message delivery from which server holds the connection.

A key optimization is ref-counted Redis subscriptions. You don't want every node subscribed to every Redis topic — that creates massive overhead at scale. Instead, each node subscribes to a Redis topic only when the first local client joins that channel, and unsubscribes when the last local client leaves. This keeps the Redis subscription count proportional to active channels per node, not total channels.

Load Balancing: L4 vs L7

WebSocket connections start as HTTP and upgrade via the Upgrade: websocket header. This means your load balancer needs to understand the upgrade handshake.

Layer 4 (TCP) load balancing is simpler and faster — the load balancer forwards TCP packets without inspecting HTTP. It works fine for WebSockets, but you lose the ability to route based on HTTP headers or cookies, which makes sticky sessions harder to implement.

Layer 7 (HTTP) load balancing inspects the application protocol. nginx and HAProxy both support WebSocket proxying at L7 with sticky session support via cookies or headers.

frontend websocket
    bind *:443 ssl crt /etc/ssl/cert.pem
    default_backend ws_servers

backend ws_servers
    balance leastconn
    cookie SERVERID insert indirect nocache
    option http-server-close
    option forwardfor
    server node1 10.0.0.1:8080 check cookie node1
    server node2 10.0.0.2:8080 check cookie node2
    server node3 10.0.0.3:8080 check cookie node3

leastconn is the right balancing algorithm for WebSockets — it sends new connections to whichever server currently has the fewest, rather than round-robin, which ignores connection duration.

For nginx, you need to pass the upgrade headers explicitly:

location /ws {
    proxy_pass http://websocket_backends;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_read_timeout 3600s;
}

The proxy_read_timeout is critical — the default is 60 seconds, which will kill long-lived connections.

What You're Actually Building

By the time you've handled all of the above, you've built:

OS-level tuning scripts per server type
A Redis cluster with pub/sub fanout logic
A load balancer configuration with sticky session handling
Connection draining logic for zero-downtime deploys
Presence state management across nodes (who's online, on which server)
Reconnection handling on the client when servers restart
Monitoring for connection counts, Redis subscription lag, and memory per node

This is a significant amount of infrastructure to own and operate. It's also the kind of infrastructure that fails in subtle ways under load — Redis fanout lag, sticky session expiry during reconnects, file descriptor leaks in long-running processes.

Managed Infrastructure Is the Pragmatic Choice

Most teams don't need to build this. The problem has been solved, and the operational cost of running it yourself rarely makes sense unless WebSocket infrastructure is your core product.

Apinator handles all of this — connection scaling, cross-region Redis fanout, presence state, load balancing, OS tuning, and reconnection. You publish events via a simple HTTP API and subscribe on the client with a small SDK. The infrastructure scales to millions of connections without any of the above configuration in your codebase.

If you're early in building a realtime feature, the time you'd spend on sticky sessions and Redis pub/sub is almost certainly better spent on the product itself.