How to Build a Live Collaborative Document Editor
Learn how to build a Google Docs-style collaborative editor with real-time cursor tracking, live edits, and presence indicators using WebSockets.
Collaborative editing feels like magic when it works well. Two people type into the same document at the same time, their cursors visible to each other, changes appearing instantly without either person losing their work. Building that experience involves solving a set of genuinely hard problems — and understanding which ones you actually need to solve for your use case.
What Makes Collaborative Editing Hard
The core difficulty is concurrent state mutation. When two users edit the same region of a document at the same time, their changes can conflict. If Alice deletes a paragraph while Bob is editing a sentence inside it, a naive system will produce garbage.
There are several layers of complexity to manage:
- Conflict resolution: Deciding whose edit wins, or how to merge both.
- Presence: Knowing who else is in the document and where their cursor is.
- Ordering: Events arrive over the network in unpredictable order; the system must apply them consistently.
- Persistence vs. delivery: Real-time delivery (WebSockets) and durable storage (a database) are separate concerns that must stay in sync.
The gold standard for conflict resolution is Operational Transformation (OT), the algorithm behind Google Docs. OT transforms incoming operations against already-applied local operations so both sides converge to the same state. It is also notoriously difficult to implement correctly.
For many real applications, a simpler model works fine. A last-write-wins approach — where each discrete operation (inserting text at position N, deleting N characters at position M) is applied in the order it arrives — works well when conflicts are rare and the edit granularity is fine. Tools like Quill.js and ProseMirror represent document changes as structured deltas or steps that compose cleanly, making last-write-wins viable for a large class of editors.
This post focuses on that simpler, practical architecture.
Architecture Overview
The system has three parts:
- A rich text editor in the browser (Quill, TipTap, CodeMirror, etc.) that produces a delta for every local change.
- A WebSocket channel that broadcasts those deltas to every other connected client in real time.
- A backend that persists the canonical document state to a database and publishes change events.
WebSockets handle delivery. The database handles persistence. These responsibilities must not be confused. A client that reconnects mid-session fetches the current document from the database, then subscribes to live events going forward.
For the WebSocket layer, we use Apinator, a managed WebSocket infrastructure platform that handles connection scaling, authentication, and cross-region fanout without requiring you to run your own WebSocket servers.
Setting Up the Presence Channel
Every document gets its own presence channel: presence-doc-{documentId}. Presence channels are authenticated and maintain a member list — exactly what you need to show avatars and colored cursors.
Server: Auth Endpoint
import Apinator from "@apinator/server";
const apinator = new Apinator({
appId: process.env.APINATOR_APP_ID,
key: process.env.APINATOR_KEY,
secret: process.env.APINATOR_SECRET,
});
// POST /auth/channel
app.post("/auth/channel", requireAuth, (req, res) => {
const { socket_id, channel_name } = req.body;
const user = req.user;
const auth = apinator.authorizeChannel(socket_id, channel_name, {
user_id: user.id,
user_info: {
name: user.name,
avatar: user.avatarUrl,
color: user.cursorColor, // assign a stable color per user
},
});
res.json(auth);
});
Client: Joining the Document Channel
import { RealtimeClient } from "@apinator/sdk";
const client = new RealtimeClient({
key: APINATOR_KEY,
authEndpoint: "/auth/channel",
});
const docId = "doc_abc123";
const channel = client.subscribe(`presence-doc-${docId}`);
channel.bind("apinator:subscription_succeeded", (members) => {
// members.me — current user
// members.each((member) => ...) — all members including self
renderMemberList(members);
});
channel.bind("apinator:member_added", (member) => {
addMemberAvatar(member.id, member.info);
});
channel.bind("apinator:member_removed", (member) => {
removeMemberAvatar(member.id);
removeCursor(member.id);
});
With presence in place, users see who else has the document open the moment they arrive.
Broadcasting Document Changes
When the local editor changes, send only the delta — not the full document. Deltas are small, composable, and easy to apply on the receiving end.
Client: Sending Changes
editor.on("text-change", (delta, _oldDelta, source) => {
// Only broadcast changes the local user made, not ones we applied
// from remote events (source === "api")
if (source !== "user") return;
fetch(`/api/documents/${docId}/changes`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ delta }),
});
});
Server: Persisting and Publishing
// POST /api/documents/:id/changes
app.post("/api/documents/:id/changes", requireAuth, async (req, res) => {
const { id } = req.params;
const { delta } = req.body;
const userId = req.user.id;
// 1. Apply delta to stored document (compose with existing content)
await db.documents.applyDelta(id, delta, userId);
// 2. Publish the change to all connected clients
await apinator.trigger(`presence-doc-${id}`, "doc.change", {
delta,
userId,
timestamp: Date.now(),
});
res.json({ ok: true });
});
Client: Receiving and Applying Changes
channel.bind("doc.change", ({ delta, userId }) => {
// Don't apply our own changes — the editor already has them
if (userId === currentUserId) return;
// Apply the incoming delta as an API (programmatic) change
editor.updateContents(delta, "api");
});
The "api" source flag is crucial. It tells the editor (and your own text-change handler) that this change came from a remote source, preventing it from being re-broadcast and creating an event loop.
Real-Time Cursor Tracking
Cursor positions are ephemeral and high-frequency — they should never touch your database. Apinator supports client events on presence channels, which let clients publish directly to the channel without going through your server. This is ideal for cursor positions.
Client: Publishing Cursor Position
editor.on("selection-change", (range) => {
if (!range) return;
// client- prefix enables direct client-to-client events
channel.trigger("client-cursor", {
index: range.index,
length: range.length,
});
});
Client: Rendering Remote Cursors
const remoteCursors = {}; // memberId -> cursor DOM element
channel.bind("client-cursor", (data, metadata) => {
const memberId = metadata.user_id;
const member = channel.members.get(memberId);
if (!remoteCursors[memberId]) {
remoteCursors[memberId] = createCursorElement(member.info.color, member.info.name);
}
positionCursor(remoteCursors[memberId], editor, data.index);
});
function positionCursor(cursorEl, editor, index) {
const bounds = editor.getBounds(index);
cursorEl.style.top = `${bounds.top}px`;
cursorEl.style.left = `${bounds.left}px`;
}
Each member gets a colored cursor flag positioned using the editor's own coordinate system. When a member leaves (apinator:member_removed), you remove their cursor element immediately.
Awareness vs. Persistence
It helps to be explicit about the boundary between these two concerns:
| Concern | Mechanism | Storage |
|---|---|---|
| Who is online | Presence channel membership | Ephemeral (Apinator) |
| Where their cursor is | Client events (client-cursor) |
Ephemeral (Apinator) |
| Document content | REST API + database | Persistent (PostgreSQL, etc.) |
| Change history | Delta log in database | Persistent |
WebSockets are delivery infrastructure — fast, stateless, and volatile. When a user's connection drops and they reconnect, the presence channel tells you who is back, but the document content comes from the database. Your client should fetch the latest document snapshot on mount and on reconnect, then attach to the live channel for subsequent updates.
A minimal reconnection flow looks like this:
client.connection.bind("connected", async () => {
const { content, version } = await fetch(`/api/documents/${docId}`).then((r) => r.json());
editor.setContents(content, "api");
currentVersion = version;
// Now safe to subscribe — we have the current state
subscribeToChannel(docId);
});
Keeping It Simple
Production collaborative editors add more on top of this: version vectors to detect and reject out-of-order operations, undo stacks that are aware of remote edits, and offline queuing for clients that temporarily lose connectivity. But for a large class of applications — internal tools, shared notes, lightweight wikis — the architecture above is sufficient and far simpler to reason about.
The key insight is to let each layer do what it does best: your database owns the truth, your WebSocket layer owns the moment. Apinator handles the WebSocket infrastructure so you can focus on the editing logic rather than connection management, authentication, and scaling.
Start with a presence channel, broadcast deltas, render cursors, and reach for a full OT or CRDT library only when you genuinely need it.