Summary
The hostedRemoteLiveLease check in server/canonical-document.ts:818-836 blocks /edit/v2 writes whenever a time-bucketed recentCollabSessionLease exists for the slug but no current-epoch live attachment is present. The lease persists for 5 minutes after every browser visit (DEFAULT_COLLAB_SESSION_TTL_SECONDS = 5 * 60), so any doc that was opened in a browser tab in the past 5 minutes — even after the tab closed and the WebSocket disconnected — is write-locked for agent edits during that window.
Comments via /ops and /bridge/comments work because they don't enforce strictLiveDoc. Only /edit/v2 (the canonical agent edit endpoint) trips the check.
Reproduce
- Start the server (any environment with
isHostedRewriteEnvironment() returning true — Fly with PROOF_PUBLIC_BASE_URL set).
- Create a doc via
POST /documents.
- Open
/d/<slug>?token=<token> in a browser. The server records a recentCollabSessionLease (5-min TTL).
- Close the browser tab. Wait 5–10 seconds (long enough that the WebSocket has disconnected and
active_collab_connections rows are gone).
- From an agent or curl, call
POST /documents/<slug>/edit/v2 with a valid baseRevision + operations payload.
Expected: write succeeds. The browser is gone; no one is actively editing; the persisted state is authoritative.
Actual: 409 LIVE_DOC_UNAVAILABLE with error: "Live canonical document is unavailable on this hosted replica; retry after refreshing state". Retries within the 5-minute window keep failing the same way. The error's accompanying snapshot shows mutationReady: true, readSource: "projection", projectionFresh: true — i.e. the read side is perfectly healthy; only the write-side liveness check is denying.
Root cause
server/ws.ts:167 computes breakdown.total = Math.max(exactEpochCount, documentLeaseBreakdown.exactEpochCount, recentLeaseCount). Once noteRecentCollabSessionLease records a 5-minute bucket entry (see server/collab.ts:10888 — fired on every collab-session lease admission, including normal browser visits via /d/:slug), recentLeaseCount = 1 for that 5 minutes.
server/canonical-document.ts:826-836:
const hostedRemoteLiveLease = collabRuntimeEnabled
&& hostedRuntime
&& collabClientBreakdown.total > 0
&& collabClientBreakdown.exactEpochCount === 0;
if (strictLiveDocRequested && hostedRemoteLiveLease) {
return {
ok: false, status: 409, code: 'LIVE_DOC_UNAVAILABLE',
error: 'Live canonical document is unavailable on this hosted replica; retry after refreshing state',
retryWithState: `/api/agent/${args.slug}/state`,
};
}
Combined: total > 0 (from recentLeaseCount) AND exactEpochCount === 0 (no live presence) → deny. The check cannot distinguish "active reconnect window — there's a live editor briefly disconnected" from "ghost lease — the editor closed their tab 30 seconds ago and isn't coming back."
waitForHostedLiveLeaseMaterialization upstream of the check doesn't help: it polls for exactEpochCount > 0 to materialize, which never happens if no one's actually reconnecting.
Suggested fix
Distinguish the two cases via documentLeaseExactCount (and possibly documentLeaseAnyEpochCount):
- Real active reconnect window: a
documentLease exists (the lease was opened during an actual session). Continue to deny — that lease holder is briefly disconnected.
- Ghost lease only:
recentLeaseCount > 0 but documentLeaseExactCount === 0 AND documentLeaseAnyEpochCount === 0. Treat as no-presence — proceed with the persisted-handle write path.
Concretely, change hostedRemoteLiveLease to also require documentLeaseExactCount > 0 || documentLeaseAnyEpochCount > 0:
const hasRealLeasePresence = collabClientBreakdown.documentLeaseExactCount > 0
|| collabClientBreakdown.documentLeaseAnyEpochCount > 0;
const hostedRemoteLiveLease = collabRuntimeEnabled
&& hostedRuntime
&& collabClientBreakdown.total > 0
&& collabClientBreakdown.exactEpochCount === 0
&& hasRealLeasePresence;
This preserves the original intent (don't write while a real lease holder is mid-reconnect) but removes the false positive when total is being inflated purely by the 5-minute recentLeaseCount bucket.
Operational mitigations available today
- Set
COLLAB_SESSION_TTL_SECONDS=30 (or some value shorter than the typical "agent retry budget"). Trades: shorter window of "ghost deny" against genuine reconnect-grace getting cut.
- Keep a browser tab open on the doc while the agent edits.
exactEpochCount > 0 defuses the check entirely.
- Wait 5 minutes after closing all browser tabs before agent-editing.
Environment
- Fork:
Studio-Intrinsic/proof-service. Will PR our fix back to upstream.
- Host: Fly,
scout-proof app, single machine, embedded collab runtime.
- Observed today via
POST /documents/9rchcu7l/edit/v2 immediately after closing all browser sessions: deterministic 409 LIVE_DOC_UNAVAILABLE, snapshot in same response shows mutationReady: true, active_collab_connections row count = 0 via direct DB query.
Related
Same-day upstream issue: #49 — different bug (in-memory loadedDocDbMeta stale on read side); fix shipped today. This issue is the next-layer write-side deny that surfaces after the read side is healthy.
Tracker
Fork PR will be linked here once opened.
Summary
The
hostedRemoteLiveLeasecheck inserver/canonical-document.ts:818-836blocks/edit/v2writes whenever a time-bucketedrecentCollabSessionLeaseexists for the slug but no current-epoch live attachment is present. The lease persists for 5 minutes after every browser visit (DEFAULT_COLLAB_SESSION_TTL_SECONDS = 5 * 60), so any doc that was opened in a browser tab in the past 5 minutes — even after the tab closed and the WebSocket disconnected — is write-locked for agent edits during that window.Comments via
/opsand/bridge/commentswork because they don't enforcestrictLiveDoc. Only/edit/v2(the canonical agent edit endpoint) trips the check.Reproduce
isHostedRewriteEnvironment()returning true — Fly withPROOF_PUBLIC_BASE_URLset).POST /documents./d/<slug>?token=<token>in a browser. The server records arecentCollabSessionLease(5-min TTL).active_collab_connectionsrows are gone).POST /documents/<slug>/edit/v2with a validbaseRevision+operationspayload.Expected: write succeeds. The browser is gone; no one is actively editing; the persisted state is authoritative.
Actual:
409 LIVE_DOC_UNAVAILABLEwitherror: "Live canonical document is unavailable on this hosted replica; retry after refreshing state". Retries within the 5-minute window keep failing the same way. The error's accompanying snapshot showsmutationReady: true,readSource: "projection",projectionFresh: true— i.e. the read side is perfectly healthy; only the write-side liveness check is denying.Root cause
server/ws.ts:167computesbreakdown.total = Math.max(exactEpochCount, documentLeaseBreakdown.exactEpochCount, recentLeaseCount). OncenoteRecentCollabSessionLeaserecords a 5-minute bucket entry (seeserver/collab.ts:10888— fired on every collab-session lease admission, including normal browser visits via/d/:slug),recentLeaseCount = 1for that 5 minutes.server/canonical-document.ts:826-836:Combined:
total > 0(fromrecentLeaseCount) ANDexactEpochCount === 0(no live presence) → deny. The check cannot distinguish "active reconnect window — there's a live editor briefly disconnected" from "ghost lease — the editor closed their tab 30 seconds ago and isn't coming back."waitForHostedLiveLeaseMaterializationupstream of the check doesn't help: it polls forexactEpochCount > 0to materialize, which never happens if no one's actually reconnecting.Suggested fix
Distinguish the two cases via
documentLeaseExactCount(and possiblydocumentLeaseAnyEpochCount):documentLeaseexists (the lease was opened during an actual session). Continue to deny — that lease holder is briefly disconnected.recentLeaseCount > 0butdocumentLeaseExactCount === 0ANDdocumentLeaseAnyEpochCount === 0. Treat as no-presence — proceed with the persisted-handle write path.Concretely, change
hostedRemoteLiveLeaseto also requiredocumentLeaseExactCount > 0 || documentLeaseAnyEpochCount > 0:This preserves the original intent (don't write while a real lease holder is mid-reconnect) but removes the false positive when
totalis being inflated purely by the 5-minuterecentLeaseCountbucket.Operational mitigations available today
COLLAB_SESSION_TTL_SECONDS=30(or some value shorter than the typical "agent retry budget"). Trades: shorter window of "ghost deny" against genuine reconnect-grace getting cut.exactEpochCount > 0defuses the check entirely.Environment
Studio-Intrinsic/proof-service. Will PR our fix back to upstream.scout-proofapp, single machine, embedded collab runtime.POST /documents/9rchcu7l/edit/v2immediately after closing all browser sessions: deterministic 409 LIVE_DOC_UNAVAILABLE, snapshot in same response showsmutationReady: true,active_collab_connectionsrow count = 0 via direct DB query.Related
Same-day upstream issue: #49 — different bug (in-memory
loadedDocDbMetastale on read side); fix shipped today. This issue is the next-layer write-side deny that surfaces after the read side is healthy.Tracker
Fork PR will be linked here once opened.