Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions docs/guides/human-in-the-loop.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,26 @@ def run_with_approval(task: str, using):
Prefer `resume_run(...)` over rerunning from scratch after approval. It preserves a continuous, append-only run history.
</Tip>

## Resume integrity checks

Before `ns.resume(...)` or `ns.resume_run(...)`, Noēsis validates checkpoint
consistency against the current run artifacts:

- causal anchor integrity: checkpoint `event_offset` and `last_event_id` must
still match `events.jsonl`
- state integrity: current `state.json` hash must match checkpoint `state_hash`
- artifact integrity: current artifact digest must match checkpoint
`artifact_manifest_hash` (via `manifest.json` when present, otherwise
deterministic digest fallback)

If any check fails, resume fails closed with `CheckpointConsistencyError`.

Operational implication:

- do not edit run artifacts manually between checkpoint and resume
- treat checkpoint artifacts as immutable runbook evidence
- if drift occurs, start a new run instead of forcing continuation

## Policy-driven approval

Create a policy that flags operations for approval:
Expand Down
110 changes: 109 additions & 1 deletion docs/reference/events.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,113 @@ Runtime events share `phase="runtime"` and use `event_type` as the stable subtyp
The current lifecycle family includes `run.interrupt`, `run.checkpoint`,
`run.resume`, and `run.state_projection`.

#### `run.interrupt`

Pause intent before checkpointing or other continuation flows.

```json
{
"phase": "runtime",
"event_type": "run.interrupt",
"payload": {
"kind": "run.interrupt",
"status": "interrupted",
"reason": "Awaiting human approval"
},
"caused_by": "evt_prev123"
}
```

<ResponseField name="payload.kind" type="string" required>
Always `run.interrupt`.
</ResponseField>

<ResponseField name="payload.status" type="string" required>
Always `interrupted`.
</ResponseField>

<ResponseField name="payload.reason" type="string">
Optional operator- or policy-provided reason.
</ResponseField>

#### `run.checkpoint`

Checkpoint pointer emission for same-run continuation.

```json
{
"phase": "runtime",
"event_type": "run.checkpoint",
"payload": {
"kind": "run.checkpoint",
"status": "paused",
"checkpoint_id": "chk_0123456789ab",
"event_offset": 42,
"checkpoint_path": "checkpoints/chk_0123456789ab/checkpoint.json"
},
"caused_by": "evt_prev123"
}
```

<ResponseField name="payload.kind" type="string" required>
Always `run.checkpoint`.
</ResponseField>

<ResponseField name="payload.status" type="string" required>
Always `paused`.
</ResponseField>

<ResponseField name="payload.checkpoint_id" type="string" required>
Run-local checkpoint identifier.
</ResponseField>

<ResponseField name="payload.event_offset" type="integer" required>
1-based event history anchor used by resume consistency checks.
</ResponseField>

<ResponseField name="payload.checkpoint_path" type="string" required>
Relative path to persisted checkpoint artifact.
</ResponseField>

#### `run.resume`

Resume evidence emitted before continuation execution.

```json
{
"phase": "runtime",
"event_type": "run.resume",
"payload": {
"kind": "run.resume",
"status": "resuming",
"checkpoint_id": "chk_0123456789ab",
"event_offset": 42,
"resume_strategy": "same_run_id"
},
"caused_by": "evt_prev123"
}
```

<ResponseField name="payload.kind" type="string" required>
Always `run.resume`.
</ResponseField>

<ResponseField name="payload.status" type="string" required>
Always `resuming`.
</ResponseField>

<ResponseField name="payload.checkpoint_id" type="string" required>
Checkpoint being resumed.
</ResponseField>

<ResponseField name="payload.event_offset" type="integer" required>
Copied from the resolved checkpoint anchor.
</ResponseField>

<ResponseField name="payload.resume_strategy" type="string" required>
Current value is `same_run_id`.
</ResponseField>

`run.state_projection` is emitted whenever state is persisted so `state.json`
outcome/link fields have explicit event evidence.

Expand Down Expand Up @@ -670,7 +777,8 @@ Trace-backed projection of persisted `state.json.links`.

<ResponseField name="caused_by" type="string">
For runtime lifecycle/projection events, points to the latest prior event in the
trace when available.
trace when available. For `run.resume`, explicit anchors must match either the
checkpoint's `last_event_id` or the run's current latest event ID.
</ResponseField>

### terminate
Expand Down
17 changes: 15 additions & 2 deletions docs/reference/python-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,16 @@ Continuation contract:

- Same run ID.
- Append-only artifacts preserved.
- Resume continues post-plan by default (no replan) with anchor validation.
- Resume continues post-plan by default (no replan) with fail-closed anchor validation.

Resume consistency checks (enforced before `run.resume` is emitted):

- `event_offset` + `last_event_id` in checkpoint must still match current `events.jsonl`.
- `state_hash` in checkpoint must match current `state.json`.
- `artifact_manifest_hash` in checkpoint must match current artifact digest:
- if `manifest.json` exists, hash that file
- otherwise hash canonical artifact set (`events.jsonl` prefix at `event_offset`, `prompts.jsonl`, `state.json`, `summary.json`, `learn.jsonl`) when present.
- explicit `caused_by=` for `ns.resume(...)` must match either checkpoint `last_event_id` or the current latest run event ID.

Adapter continuity:

Expand All @@ -219,7 +228,11 @@ Adapter continuity:
- `RunSealedError`: lifecycle writes and resume attempts are rejected once `final.json` seals the run.
- `CheckpointNotFoundError`: `resume`/`resume_run` reference a checkpoint that does not exist.
- `MissingCausalParentError`: checkpoint/interrupt cannot anchor to a causal parent event.
- `CheckpointConsistencyError`: checkpoint anchor (`event_offset`, `last_event_id`, `state_hash`) no longer matches artifacts.
- `CheckpointConsistencyError`: checkpoint anchor drift, including:
- event history mismatch (`event_offset` / `last_event_id`)
- `state.json` hash mismatch (`state_hash`)
- artifact digest mismatch (`artifact_manifest_hash`, including manifest presence/content drift)
- invalid explicit resume anchor (`caused_by` not allowed for this checkpoint/run head)
- `RunLifecycleTransitionError`: lifecycle mutation violates the run state-machine contract.
- `ResumeAdapterRequiredError`: `resume_run` requires explicit `using` for non-minimal checkpoints.
- `ResumeAdapterMismatchError`: `resume_run` adapter does not match checkpoint adapter contract.
Expand Down
Loading