Skip to content

fix(collaboration): memory leaks, Vue stack overflow, and Liveblocks stability (SD-1924)#2030

Open
tupizz wants to merge 8 commits intomainfrom
tadeu/fix-collaboration-stability
Open

fix(collaboration): memory leaks, Vue stack overflow, and Liveblocks stability (SD-1924)#2030
tupizz wants to merge 8 commits intomainfrom
tadeu/fix-collaboration-stability

Conversation

@tupizz
Copy link
Contributor

@tupizz tupizz commented Feb 15, 2026

Closes IT-474

CleanShot.2026-02-15.at.19.56.51.mp4

Summary

Fixes multiple collaboration bugs causing typing lag, room corruption, Vue stack overflow crashes, and user color flickering when using external providers (Liveblocks).

How SuperDoc Collaboration Works

SuperDoc collaboration uses Y.js (a CRDT library) to synchronize document state between multiple users. Here's the step-by-step flow:

1. Initialization

When a user opens a collaborative document, SuperDoc receives a Y.js Doc and a provider (e.g. Liveblocks, Hocuspocus) through the modules.collaboration config option:

User app → new SuperDoc({ modules: { collaboration: { ydoc, provider } } })
  → SuperDoc.js stores ydoc/provider on the instance
  → SuperDoc creates a Vue app (exposes itself as $superdoc global property)
  → Editor (ProseMirror) is created with ySyncPlugin bound to the ydoc

2. Real-time Editing Sync

When a user types, changes flow through two parallel sync paths:

Keystroke → ProseMirror transaction
  ├── Path A: Y.js CRDT sync (character-level)
  │   ySyncPlugin intercepts the transaction
  │   → converts PM steps to Y.js operations
  │   → provider broadcasts to other clients
  │   → remote clients receive Y.js update
  │   → ySyncPlugin converts back to PM transaction
  │   → remote editor applies the change
  │
  └── Path B: DOCX XML sync (document-level, debounced 1s)
      ydoc 'afterTransaction' listener fires
      → debounced updateYdocDocxData() runs after 1000ms
      → full DOCX export → stored in ydoc meta map
      → new joiners reconstruct document from this DOCX data

3. Cursor Awareness

Each user's cursor position is shared via the Y.js awareness protocol:

User moves cursor → PresentationEditor.#updateLocalAwarenessCursor()
  → awareness.setLocalStateField('user', { name, email, color, cursor })
  → provider broadcasts awareness state
  → remote clients receive awareness update
  → RemoteCursorManager renders colored cursors on their DomPainter overlay

4. Vue Rendering Bridge

SuperDoc uses dual rendering (hidden ProseMirror + visible DomPainter). Vue manages the toolbar and UI state:

PM transaction → SuperDoc.vue onEditorSelectionChange()
  → updates reactive refs (selectionPosition, activeSelection, toolsMenuPosition)
  → Vue re-renders toolbar with current formatting state

What Was Broken and How Each Fix Addresses It

Fix 1: Y.js Observer Memory Leaks (collaboration.js)

Problem: The collaboration extension registered 4 observers/listeners without cleanup:

  • metaMap.observe() — media file sync
  • headerFooterMap.observe() — header/footer sync
  • ydoc.on('afterTransaction') — DOCX XML sync
  • debounce timer — pending 1s timeout

When editors were destroyed and recreated (HMR, route changes, document switches), these accumulated. Each leaked afterTransaction handler ran a full DOCX export on every Y.js transaction.

Fix: Added onDestroy() lifecycle hook. Observer references are stored in a module-level WeakMap<Editor, CleanupData> (not in reactive this.options) and properly cleaned up on editor destruction. The debounce utility now supports .cancel().

Fix 2: yUndoPlugin Observer Leak (Editor.ts)

Problem: #prepareDocumentForExport created a throwaway EditorState to transform the document for DOCX export. EditorState.create() calls Plugin.init() for every plugin, and yUndoPlugin.init() registers a persistent Y.UndoManager observer on the shared ydoc. These observers were never cleaned up because the throwaway state was immediately discarded.

Fix: Use new Transform(doc) directly instead of EditorState.create(). All methods used by prepareCommentsForExport (removeMark, insert, addMark, setNodeMarkup, delete, mapping.map) are Transform methods — no Transaction-specific APIs are needed.

Fix 3: Vue traverse Stack Overflow (SuperDoc.js)

Problem: SuperDoc.js stores this.ydoc and this.provider on the instance. The instance is exposed to Vue as a global property ($superdoc). Vue's reactivity system deep-traverses all properties to make them reactive. Y.js objects have deep circular internal references (_itemparentdoc_store → items → ...) that cause infinite recursion → RangeError: Maximum call stack size exceeded.

Fix: Wrap all Y.js object assignments with markRaw() from Vue. This adds a __v_skip flag that tells Vue to never traverse the object. Applied to all 4 assignment paths (external provider, internal single-doc, internal multi-doc, internal superdoc sync).

Fix 4: User Color Flickering (SuperDoc.js)

Problem: Three competing color systems with no coordination:

  1. y-prosemirror's yCursorPlugin mutates user.color = '#ffa500' (orange) when no color is set in awareness state
  2. RemoteCursorAwareness uses getFallbackCursorColor(clientId) which assigns from a palette
  3. awarenessStatesToArray assigns from a shuffled palette via userColorMap

The external provider path in SuperDoc.js never set user.color before broadcasting awareness, so each system kept overwriting with different colors every render cycle.

Fix: Set this.config.user.color = this.colors[0] || '#4ECDC4' before calling setupAwarenessHandler, ensuring awareness state always has a stable color.

Fix 5: Liveblocks Room Corruption (App.tsx)

Problem: provider.on('sync') fires not only on initial connection but also on every reconnect. The original example code created a new SuperDoc instance on every sync event, resulting in duplicate editors writing to the same Y.js document — causing conflicting CRDT operations that permanently corrupted the Liveblocks room state (WebSocket code 1011).

Fix: Guard with if (superdocRef.current) return to ensure SuperDoc is only created once per component lifecycle.

Fix 6: Typing Lag — Cursor Awareness Overhead (PresentationEditor.ts)

Problem: Every keystroke triggered #updateLocalAwarenessCursor() synchronously, which calls awareness.setLocalStateField(). With Liveblocks, each call takes ~190ms to encode and sync awareness state over WebSocket.

Fix: Debounce cursor awareness updates to 100ms. Rapid keystrokes batch into a single update, keeping typing responsive while maintaining real-time cursor sharing.

Fix 7: Typing Lag — Vue flushJobs Blocking (SuperDoc.vue)

Problem: Each ProseMirror transaction synchronously updated Vue reactive refs (selectionPosition, activeSelection, toolsMenuPosition). Each mutation triggered Vue's flushJobs microtask, which re-evaluated hundreds of components — blocking the main thread for ~300ms per keystroke.

Fix: Defer selection state updates to requestAnimationFrame. RAF fires before the next paint, so the toolbar still reflects correct state by the time the user sees the rendered frame. Pending RAFs are cancelled on new transactions and on component unmount.

Fix 8: Repeated Full-Document Traversals (block-node.js)

Problem: The hasInitialized flag was only set to true when changes were detected. If the initial document had all valid sdBlockId values, the initialization traversal ran on every single transaction — potentially thousands of wasteful full-document walks.

Fix: Set hasInitialized = true unconditionally after the first appendTransaction call. The blockNodeInitialUpdate meta is only set when actual changes were made.

Fix 9: Liveblocks Example App (App.tsx, vite.config.js)

Problem: Multiple issues in the example app:

  • states.filter((s) => s.user) filtered ALL users because awarenessStatesToArray returns flat objects (no nested .user)
  • Badge rendering used u.user?.color instead of u.color
  • Index-based React keys caused unnecessary re-renders
  • No loading indicator while connecting
  • No Vite alias configuration for local development
  • Strict Mode cleanup didn't null refs — remount broke re-initialization

Fix: Extracted useSuperdocCollaboration custom hook, corrected property access to flat objects, stable clientId keys, proper cleanup for Strict Mode, hoisted static styles, added Vite alias config.

Test plan

  • All existing tests pass (pnpm test — 810+ tests across packages)
  • Pre-commit hooks pass (typecheck, format, lint, commitlint)
  • Verified in browser: no Vue stack overflow, stable user colors, no room corruption
  • Verified typing performance: 0.2-2ms dispatch latency, no degradation over 60s
  • Verified no DOM node count growth (stable ~1350-1380 nodes)
  • Code review: all changes verified safe by automated analysis (Transform API compatibility, lifecycle hook validity, markRaw safety, RAF edge cases)

Copilot AI review requested due to automatic review settings February 15, 2026 00:05
@tupizz tupizz force-pushed the tadeu/fix-collaboration-stability branch from cce2b91 to cf29539 Compare February 15, 2026 00:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses multiple collaboration-related stability and performance issues in SuperDoc/Super Editor (Y.js observer leaks, Vue reactivity stack overflow, cursor awareness overhead, and Liveblocks reconnect behavior), plus a few example-app fixes.

Changes:

  • Add cleanup paths for Y.js observers/listeners and avoid plugin-init observer leaks during export.
  • Reduce UI/perf regressions by marking Y.js objects as non-reactive, deferring selection updates to requestAnimationFrame, and debouncing local awareness cursor updates.
  • Fix repeated initialization/traversal behaviors and improve the Liveblocks example app reliability/config.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/superdoc/src/core/SuperDoc.js Uses markRaw() for Y.js objects; assigns stable local user color for awareness.
packages/superdoc/src/SuperDoc.vue Defers selection reactive updates via RAF; cancels RAF on unmount.
packages/superdoc/src/SuperDoc.test.js Makes RAF synchronous in tests and restores mocks after each test.
packages/super-editor/src/extensions/collaboration/collaboration.js Tracks Y.js observers/handlers and adds onDestroy() cleanup; debounce supports .cancel().
packages/super-editor/src/extensions/block-node/block-node.js Ensures initialization traversal only happens once regardless of detected changes.
packages/super-editor/src/core/presentation-editor/PresentationEditor.ts Debounces local awareness cursor updates; updates remote cursor refresh strategy after layout.
packages/super-editor/src/core/Editor.ts Uses Transform directly for export prep to avoid plugin init/leaks.
examples/collaboration/liveblocks/vite.config.js Adds local alias + fs allow-list for resolving built superdoc assets.
examples/collaboration/liveblocks/src/App.tsx Prevents duplicate SuperDoc creation on reconnect; fixes awareness state rendering and adds “Connecting…” UI.
Comments suppressed due to low confidence (1)

packages/superdoc/src/SuperDoc.vue:287

  • When returning early (e.g., skipSelectionUpdate or viewing mode), any previously scheduled requestAnimationFrame callback is left pending and can still call processSelectionChange, re-applying selection state after it was intentionally skipped/reset. Cancel selectionUpdateRafId at the start of this handler (before the early-return branches) so stale selection updates can’t run.
const onEditorSelectionChange = ({ editor, transaction }) => {
  if (skipSelectionUpdate.value) {
    // When comment is added selection will be equal to comment text
    // Should skip calculations to keep text selection for comments correct
    skipSelectionUpdate.value = false;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cce2b919a8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 304 to 306
if (selectionUpdateRafId != null) {
cancelAnimationFrame(selectionUpdateRafId);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cancel queued selection RAF before early-return paths

The new RAF batching only cancels a pending callback after the viewing-mode checks, so if a selection update is queued in editing mode and the app switches to viewing mode before the next frame, the stale callback still runs processSelectionChange and can repopulate selection/tool state right after resetSelection(). This creates intermittent stale selection UI in viewing mode; cancellation should happen before these early returns.

Useful? React with 👍 / 👎.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a pending RAF from a previous call can survive the viewing-mode early returns on lines 291/296. if the mode switches between frames, resetSelection() runs but then the old RAF fires and repopulates selection state.

move the cancelAnimationFrame block above the early returns so it always clears.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 75b85cacancelAnimationFrame is now called at the very top of onEditorSelectionChange, before any early returns. This prevents a queued RAF callback from surviving viewing-mode switches and repopulating stale selection state.

Comment on lines 472 to 474
if (!this.config.user.color) {
this.config.user.color = this.colors[0] || '#4ECDC4';
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive external-user default cursor color per user

This assigns the same fallback color to every external-collaboration user when user.color is unset (and colors is often empty by default), so all participants broadcast identical cursor colors. Because remote cursor rendering prefers awState.user.color over per-client fallback assignment, collaborators become visually indistinguishable instead of receiving unique fallback colors.

Useful? React with 👍 / 👎.

…stability

- Fix Y.js observer leaks in collaboration extension by adding onDestroy
  lifecycle hook with proper cleanup for media map, header/footer map,
  and afterTransaction listeners via module-level WeakMap
- Fix yUndoPlugin observer leak in #prepareDocumentForExport by using
  Transform directly instead of creating a throwaway EditorState
- Fix Vue traverse stack overflow by wrapping Y.js objects (ydoc, provider)
  with markRaw() before storing on the SuperDoc instance
- Fix user color blinking by assigning a stable color on the external
  provider path before awareness broadcast
- Fix Liveblocks room corruption by guarding against duplicate SuperDoc
  creation on provider reconnect (sync event fires on every reconnect)
- Debounce local cursor awareness updates (100ms) to avoid ~190ms
  Liveblocks overhead per keystroke
- Defer Vue selection state updates to RAF to prevent ~300ms flushJobs
  blocking per keystroke
- Fix block-node hasInitialized flag to prevent repeated full-document
  traversals on every transaction
- Fix debounce utility: use fn(...args) instead of fn.apply(this, args)
  and add .cancel() support for proper cleanup
- Refactor Liveblocks example: extract useSuperdocCollaboration hook,
  hoist static styles, fix Strict Mode cleanup, correct awareness
  state property access
@tupizz tupizz force-pushed the tadeu/fix-collaboration-stability branch from 2d8a9d9 to 0c45bcb Compare February 15, 2026 22:48
@tupizz tupizz self-assigned this Feb 15, 2026
@linear
Copy link

linear bot commented Feb 15, 2026

The Liveblocks example aliases superdoc to the local dist build. Since
y-prosemirror is bundled into superdoc's ES chunks (not externalized),
its `import "yjs"` resolves from packages/superdoc/node_modules — a
different physical copy than the example's own node_modules/yjs.

Two copies of yjs breaks Y.js constructor instanceof checks, producing
invalid CRDT operations that Liveblocks rejects with WebSocket code 1011.

Adding resolve.dedupe forces Vite to resolve all yjs imports from a
single location regardless of the importer's filesystem position.
@caio-pizzol caio-pizzol changed the title fix(collaboration): memory leaks, Vue stack overflow, and Liveblocks stability fix(collaboration): memory leaks, Vue stack overflow, and Liveblocks stability (SD-1924) Feb 16, 2026
selectionUpdateRafId = requestAnimationFrame(() => {
selectionUpdateRafId = null;
processSelectionChange(editor, transaction);
});
Copy link
Contributor

@caio-pizzol caio-pizzol Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the transaction object here gets captured when the event fires, but the RAF callback runs right after later.

by that time prosemirror might have already processed more keystrokes, so transaction.selection could be pointing at an old position. since editor.state.selection always reflects the latest state, safer to just use that and drop transaction from the deferred path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 75b85ca — the RAF callback now captures only editor (not transaction). By the time RAF fires, ProseMirror may have processed more keystrokes, making the captured transaction stale. processSelectionChange already reads editor.state.selection as the primary source, so dropping the transaction param is safe.

// (orange) as a default, causing color flickering between that default and
// the fallback colors used by RemoteCursorAwareness.
if (!this.config.user.color) {
this.config.user.color = this.colors[0] || '#4ECDC4';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when no colors array is passed in config (the default), this.colors[0] is undefined so every user falls back to #4ECDC4. in a multi-user session all cursors end up the same teal. the flickering fix makes sense, but the fallback could pick different colors per user -- something like hashing the user name or rotating through a default palette.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6b03467 — now uses a 24-color hex palette as default fallback (when no custom colors config is passed). Colors are assigned deterministically via djb2 hash of the user's email/name, reducing collision probability from 100% (all got colors[0]) to ~4% for any two users. Also fixed awarenessStatesToArray to prefer the user's pre-assigned color from awareness state instead of overriding it with undefined from the empty palette.


const PUBLIC_KEY = import.meta.env.VITE_LIVEBLOCKS_PUBLIC_KEY as string;
const ROOM_ID = (import.meta.env.VITE_ROOM_ID as string) || 'superdoc-room';
const ROOM_ID = (import.meta.env.VITE_ROOM_ID as string) || 'superdoc-markraw-v7';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default room changed from superdoc-room to superdoc-markraw-v7 -- looks like it was used during testing. anyone running the example without VITE_ROOM_ID set would connect to this old debug room. swap back to something generic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — the room ID was bumped during debugging to get a fresh room. It's now at superdoc-collab-v8 and configurable via VITE_ROOM_ID env var so it doesn't need to change again.

// remote cursors appear offset by the number of characters the local user typed.
if (this.#remoteCursorManager?.hasRemoteCursors()) {
this.#scheduleRemoteCursorReRender();
this.#remoteCursorManager.markDirty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing calls scheduleReRender() anymore after switching to markDirty() + scheduleUpdate(). the method and its #pendingReRenderCallback plumbing are now unused. clean up or keep intentionally for future use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 75b85ca — deleted scheduleReRender(), #pendingReRenderCallback, setReRenderCallback() from RemoteCursorManager.ts and the setReRenderCallback wiring from PresentationEditor.ts. Confirmed no callers existed.

const perfLog = (...args: unknown[]): void => {
if (!layoutDebugEnabled) return;
console.log(...args);
console.warn(...args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is debug perf logging gated behind an env var. console.warn makes every metric show as a yellow triangle in devtools mixed with real warnings. should stay console.log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 75b85ca — changed console.warn to console.log in perfLog. These are debug performance metrics gated behind SD_DEBUG_LAYOUT, not actionable warnings.

user: { name: userName, email: `${userName.toLowerCase().replace(' ', '-')}@example.com` },
modules: {
collaboration: { ydoc, provider },
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(window as any).editor = editor is handy for debugging but this is the example people copy-paste. drop it or wrap in if (import.meta.env.DEV).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 75b85ca — now gated behind import.meta.env.DEV so it's tree-shaken out of production builds.

…rder

Three fixes for Liveblocks 1011 connection errors:

1. Fix destroy order: unmount app (editors) BEFORE destroying ydoc/provider.
   Previously, #cleanupCollaboration() destroyed the ydoc while editors were
   still alive — pending debounced writes could fire against a destroyed ydoc,
   corrupting the room state. Now editors are destroyed first, triggering each
   extension's onDestroy() which cancels timers and unobserves Y.js maps.

2. Reduce DOCX sync debounce from 1s to 30s. The actual document content
   syncs in real-time via y-prosemirror's XmlFragment. The DOCX blob in the
   Y.Map is only supplementary data for new joiners' converter setup. Writing
   it every 1s generates large Y.js updates (full DOCX XML serialization)
   that accumulate as Y.Map tombstones, gradually growing the room's stored
   data until Liveblocks rejects connections.

3. Add ydoc.isDestroyed guards in updateYdocDocxData and pushHeaderFooterToYjs
   to prevent writes to a destroyed ydoc. Also re-check after the async
   exportDocx call since the ydoc may have been destroyed mid-export.

4. Force single yjs copy via Vite alias instead of resolve.dedupe (which
   doesn't work for files outside the project root).
- Fix stale transaction in RAF: capture only editor, not transaction,
  in the selection change RAF callback since ProseMirror may process
  more keystrokes before RAF fires
- Cancel pending RAF before early returns to prevent stale callbacks
  from repopulating selection state after mode switches
- Use hash-based color assignment so different users get different
  cursor colors from the palette instead of all getting colors[0]
- Change perfLog from console.warn to console.log since these are
  debug metrics, not warnings
- Remove dead scheduleReRender/setReRenderCallback code from
  RemoteCursorManager (never invoked)
- Gate window.editor assignment behind import.meta.env.DEV
y-prosemirror's cursor plugin only supports hex color format.
The previous approach using HSL caused "unsupported color format"
warnings and broken cursor rendering.

Replace with a 24-color hex palette (down from HSL's 360 hues but
still reduces collision probability to ~4% vs 12.5% with 8 colors).

Also fix awarenessStatesToArray to prefer the user's pre-assigned
color from awareness state instead of overriding with the palette
color (which was undefined when config.colors was empty).
Changed the default ROOM_ID from 'superdoc-collab-v8' to 'superdoc-room' to align with updated naming conventions in the Liveblocks collaboration example.
The Liveblocks awareness object exposes clientID on awareness.doc.clientID
instead of awareness.clientID (standard Yjs). This caused the local client
filter in normalizeAwarenessStates to fail (clientID was undefined), so the
user saw their own remote cursor label — which updated with 100ms debounce
lag, creating a stale/mispositioned cursor overlay.

Fix: Fall back to awareness.doc?.clientID when awareness.clientID is
undefined. Also use immediate rendering for selection updates to reduce
the race window where remote edits can cancel pending selection renders.
…mple

- Reintroduced the import of defineConfig in vite.config.js for proper configuration.
- Removed outdated aliases for superdoc/style.css and superdoc in Vite config.
- Updated default ROOM_ID from 'superdoc-collab-v8' to 'superdoc-room' to align with naming conventions.
@tupizz tupizz requested a review from caio-pizzol February 16, 2026 15:02
@github-actions
Copy link
Contributor

⚠️ AI Risk Review — potential issues found

  • blockNodeInitialUpdate meta flag will never be set after first initialization due to hasInitialized guard being set unconditionally
  • Potential null-reference if component destroyed during 100ms cursor debounce timeout (low probability but possible)

Via L3 deep analysis · critical risk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants