diff --git a/docs/guides/human-in-the-loop.mdx b/docs/guides/human-in-the-loop.mdx index 31a5bf4..1ab0ea2 100644 --- a/docs/guides/human-in-the-loop.mdx +++ b/docs/guides/human-in-the-loop.mdx @@ -59,6 +59,26 @@ def run_with_approval(task: str, using): Prefer `resume_run(...)` over rerunning from scratch after approval. It preserves a continuous, append-only run history. +## Resume integrity checks + +Before `ns.resume(...)` or `ns.resume_run(...)`, Noēsis validates checkpoint +consistency against the current run artifacts: + +- causal anchor integrity: checkpoint `event_offset` and `last_event_id` must + still match `events.jsonl` +- state integrity: current `state.json` hash must match checkpoint `state_hash` +- artifact integrity: current artifact digest must match checkpoint + `artifact_manifest_hash` (via `manifest.json` when present, otherwise + deterministic digest fallback) + +If any check fails, resume fails closed with `CheckpointConsistencyError`. + +Operational implication: + +- do not edit run artifacts manually between checkpoint and resume +- treat checkpoint artifacts as immutable runbook evidence +- if drift occurs, start a new run instead of forcing continuation + ## Policy-driven approval Create a policy that flags operations for approval: diff --git a/docs/reference/events.mdx b/docs/reference/events.mdx index 9e906ae..94fde88 100644 --- a/docs/reference/events.mdx +++ b/docs/reference/events.mdx @@ -624,6 +624,113 @@ Runtime events share `phase="runtime"` and use `event_type` as the stable subtyp The current lifecycle family includes `run.interrupt`, `run.checkpoint`, `run.resume`, and `run.state_projection`. +#### `run.interrupt` + +Pause intent before checkpointing or other continuation flows. + +```json +{ + "phase": "runtime", + "event_type": "run.interrupt", + "payload": { + "kind": "run.interrupt", + "status": "interrupted", + "reason": "Awaiting human approval" + }, + "caused_by": "evt_prev123" +} +``` + + +Always `run.interrupt`. + + + +Always `interrupted`. + + + +Optional operator- or policy-provided reason. + + +#### `run.checkpoint` + +Checkpoint pointer emission for same-run continuation. + +```json +{ + "phase": "runtime", + "event_type": "run.checkpoint", + "payload": { + "kind": "run.checkpoint", + "status": "paused", + "checkpoint_id": "chk_0123456789ab", + "event_offset": 42, + "checkpoint_path": "checkpoints/chk_0123456789ab/checkpoint.json" + }, + "caused_by": "evt_prev123" +} +``` + + +Always `run.checkpoint`. + + + +Always `paused`. + + + +Run-local checkpoint identifier. + + + +1-based event history anchor used by resume consistency checks. + + + +Relative path to persisted checkpoint artifact. + + +#### `run.resume` + +Resume evidence emitted before continuation execution. + +```json +{ + "phase": "runtime", + "event_type": "run.resume", + "payload": { + "kind": "run.resume", + "status": "resuming", + "checkpoint_id": "chk_0123456789ab", + "event_offset": 42, + "resume_strategy": "same_run_id" + }, + "caused_by": "evt_prev123" +} +``` + + +Always `run.resume`. + + + +Always `resuming`. + + + +Checkpoint being resumed. + + + +Copied from the resolved checkpoint anchor. + + + +Current value is `same_run_id`. + + `run.state_projection` is emitted whenever state is persisted so `state.json` outcome/link fields have explicit event evidence. @@ -670,7 +777,8 @@ Trace-backed projection of persisted `state.json.links`. For runtime lifecycle/projection events, points to the latest prior event in the -trace when available. +trace when available. For `run.resume`, explicit anchors must match either the +checkpoint's `last_event_id` or the run's current latest event ID. ### terminate diff --git a/docs/reference/python-api.mdx b/docs/reference/python-api.mdx index 6f699ad..a3b3111 100644 --- a/docs/reference/python-api.mdx +++ b/docs/reference/python-api.mdx @@ -207,7 +207,16 @@ Continuation contract: - Same run ID. - Append-only artifacts preserved. -- Resume continues post-plan by default (no replan) with anchor validation. +- Resume continues post-plan by default (no replan) with fail-closed anchor validation. + +Resume consistency checks (enforced before `run.resume` is emitted): + +- `event_offset` + `last_event_id` in checkpoint must still match current `events.jsonl`. +- `state_hash` in checkpoint must match current `state.json`. +- `artifact_manifest_hash` in checkpoint must match current artifact digest: + - if `manifest.json` exists, hash that file + - otherwise hash canonical artifact set (`events.jsonl` prefix at `event_offset`, `prompts.jsonl`, `state.json`, `summary.json`, `learn.jsonl`) when present. +- explicit `caused_by=` for `ns.resume(...)` must match either checkpoint `last_event_id` or the current latest run event ID. Adapter continuity: @@ -219,7 +228,11 @@ Adapter continuity: - `RunSealedError`: lifecycle writes and resume attempts are rejected once `final.json` seals the run. - `CheckpointNotFoundError`: `resume`/`resume_run` reference a checkpoint that does not exist. - `MissingCausalParentError`: checkpoint/interrupt cannot anchor to a causal parent event. -- `CheckpointConsistencyError`: checkpoint anchor (`event_offset`, `last_event_id`, `state_hash`) no longer matches artifacts. +- `CheckpointConsistencyError`: checkpoint anchor drift, including: + - event history mismatch (`event_offset` / `last_event_id`) + - `state.json` hash mismatch (`state_hash`) + - artifact digest mismatch (`artifact_manifest_hash`, including manifest presence/content drift) + - invalid explicit resume anchor (`caused_by` not allowed for this checkpoint/run head) - `RunLifecycleTransitionError`: lifecycle mutation violates the run state-machine contract. - `ResumeAdapterRequiredError`: `resume_run` requires explicit `using` for non-minimal checkpoints. - `ResumeAdapterMismatchError`: `resume_run` adapter does not match checkpoint adapter contract.