How to Build a Live Collaborative Document Editor

Learn how to build a Google Docs-style collaborative editor with real-time cursor tracking, live edits, and presence indicators using WebSockets.

Collaborative editing feels like magic when it works well. Two people type into the same document at the same time, their cursors visible to each other, changes appearing instantly without either person losing their work. Building that experience involves solving a set of genuinely hard problems — and understanding which ones you actually need to solve for your use case.

What Makes Collaborative Editing Hard

The core difficulty is concurrent state mutation. When two users edit the same region of a document at the same time, their changes can conflict. If Alice deletes a paragraph while Bob is editing a sentence inside it, a naive system will produce garbage.

There are several layers of complexity to manage:

  • Conflict resolution: Deciding whose edit wins, or how to merge both.
  • Presence: Knowing who else is in the document and where their cursor is.
  • Ordering: Events arrive over the network in unpredictable order; the system must apply them consistently.
  • Persistence vs. delivery: Real-time delivery (WebSockets) and durable storage (a database) are separate concerns that must stay in sync.

The gold standard for conflict resolution is Operational Transformation (OT), the algorithm behind Google Docs. OT transforms incoming operations against already-applied local operations so both sides converge to the same state. It is also notoriously difficult to implement correctly.

For many real applications, a simpler model works fine. A last-write-wins approach — where each discrete operation (inserting text at position N, deleting N characters at position M) is applied in the order it arrives — works well when conflicts are rare and the edit granularity is fine. Tools like Quill.js and ProseMirror represent document changes as structured deltas or steps that compose cleanly, making last-write-wins viable for a large class of editors.

This post focuses on that simpler, practical architecture.

Architecture Overview

The system has three parts:

  1. A rich text editor in the browser (Quill, TipTap, CodeMirror, etc.) that produces a delta for every local change.
  2. A WebSocket channel that broadcasts those deltas to every other connected client in real time.
  3. A backend that persists the canonical document state to a database and publishes change events.

WebSockets handle delivery. The database handles persistence. These responsibilities must not be confused. A client that reconnects mid-session fetches the current document from the database, then subscribes to live events going forward.

For the WebSocket layer, we use Apinator, a managed WebSocket infrastructure platform that handles connection scaling, authentication, and cross-region fanout without requiring you to run your own WebSocket servers.

Setting Up the Presence Channel

Every document gets its own presence channel: presence-doc-{documentId}. Presence channels are authenticated and maintain a member list — exactly what you need to show avatars and colored cursors.

Server: Auth Endpoint

import Apinator from "@apinator/server";

const apinator = new Apinator({
  appId: process.env.APINATOR_APP_ID,
  key: process.env.APINATOR_KEY,
  secret: process.env.APINATOR_SECRET,
});

// POST /auth/channel
app.post("/auth/channel", requireAuth, (req, res) => {
  const { socket_id, channel_name } = req.body;
  const user = req.user;

  const auth = apinator.authorizeChannel(socket_id, channel_name, {
    user_id: user.id,
    user_info: {
      name: user.name,
      avatar: user.avatarUrl,
      color: user.cursorColor, // assign a stable color per user
    },
  });

  res.json(auth);
});

Client: Joining the Document Channel

import { RealtimeClient } from "@apinator/sdk";

const client = new RealtimeClient({
  key: APINATOR_KEY,
  authEndpoint: "/auth/channel",
});

const docId = "doc_abc123";
const channel = client.subscribe(`presence-doc-${docId}`);

channel.bind("apinator:subscription_succeeded", (members) => {
  // members.me — current user
  // members.each((member) => ...) — all members including self
  renderMemberList(members);
});

channel.bind("apinator:member_added", (member) => {
  addMemberAvatar(member.id, member.info);
});

channel.bind("apinator:member_removed", (member) => {
  removeMemberAvatar(member.id);
  removeCursor(member.id);
});

With presence in place, users see who else has the document open the moment they arrive.

Broadcasting Document Changes

When the local editor changes, send only the delta — not the full document. Deltas are small, composable, and easy to apply on the receiving end.

Client: Sending Changes

editor.on("text-change", (delta, _oldDelta, source) => {
  // Only broadcast changes the local user made, not ones we applied
  // from remote events (source === "api")
  if (source !== "user") return;

  fetch(`/api/documents/${docId}/changes`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ delta }),
  });
});

Server: Persisting and Publishing

// POST /api/documents/:id/changes
app.post("/api/documents/:id/changes", requireAuth, async (req, res) => {
  const { id } = req.params;
  const { delta } = req.body;
  const userId = req.user.id;

  // 1. Apply delta to stored document (compose with existing content)
  await db.documents.applyDelta(id, delta, userId);

  // 2. Publish the change to all connected clients
  await apinator.trigger(`presence-doc-${id}`, "doc.change", {
    delta,
    userId,
    timestamp: Date.now(),
  });

  res.json({ ok: true });
});

Client: Receiving and Applying Changes

channel.bind("doc.change", ({ delta, userId }) => {
  // Don't apply our own changes — the editor already has them
  if (userId === currentUserId) return;

  // Apply the incoming delta as an API (programmatic) change
  editor.updateContents(delta, "api");
});

The "api" source flag is crucial. It tells the editor (and your own text-change handler) that this change came from a remote source, preventing it from being re-broadcast and creating an event loop.

Real-Time Cursor Tracking

Cursor positions are ephemeral and high-frequency — they should never touch your database. Apinator supports client events on presence channels, which let clients publish directly to the channel without going through your server. This is ideal for cursor positions.

Client: Publishing Cursor Position

editor.on("selection-change", (range) => {
  if (!range) return;

  // client- prefix enables direct client-to-client events
  channel.trigger("client-cursor", {
    index: range.index,
    length: range.length,
  });
});

Client: Rendering Remote Cursors

const remoteCursors = {}; // memberId -> cursor DOM element

channel.bind("client-cursor", (data, metadata) => {
  const memberId = metadata.user_id;
  const member = channel.members.get(memberId);

  if (!remoteCursors[memberId]) {
    remoteCursors[memberId] = createCursorElement(member.info.color, member.info.name);
  }

  positionCursor(remoteCursors[memberId], editor, data.index);
});

function positionCursor(cursorEl, editor, index) {
  const bounds = editor.getBounds(index);
  cursorEl.style.top = `${bounds.top}px`;
  cursorEl.style.left = `${bounds.left}px`;
}

Each member gets a colored cursor flag positioned using the editor's own coordinate system. When a member leaves (apinator:member_removed), you remove their cursor element immediately.

Awareness vs. Persistence

It helps to be explicit about the boundary between these two concerns:

Concern Mechanism Storage
Who is online Presence channel membership Ephemeral (Apinator)
Where their cursor is Client events (client-cursor) Ephemeral (Apinator)
Document content REST API + database Persistent (PostgreSQL, etc.)
Change history Delta log in database Persistent

WebSockets are delivery infrastructure — fast, stateless, and volatile. When a user's connection drops and they reconnect, the presence channel tells you who is back, but the document content comes from the database. Your client should fetch the latest document snapshot on mount and on reconnect, then attach to the live channel for subsequent updates.

A minimal reconnection flow looks like this:

client.connection.bind("connected", async () => {
  const { content, version } = await fetch(`/api/documents/${docId}`).then((r) => r.json());
  editor.setContents(content, "api");
  currentVersion = version;

  // Now safe to subscribe — we have the current state
  subscribeToChannel(docId);
});

Keeping It Simple

Production collaborative editors add more on top of this: version vectors to detect and reject out-of-order operations, undo stacks that are aware of remote edits, and offline queuing for clients that temporarily lose connectivity. But for a large class of applications — internal tools, shared notes, lightweight wikis — the architecture above is sufficient and far simpler to reason about.

The key insight is to let each layer do what it does best: your database owns the truth, your WebSocket layer owns the moment. Apinator handles the WebSocket infrastructure so you can focus on the editing logic rather than connection management, authentication, and scaling.

Start with a presence channel, broadcast deltas, render cursors, and reach for a full OT or CRDT library only when you genuinely need it.