Skip to content

[Bug] kernel: close artifact integrity gaps in trace, resume, and derivability #207

@saraeloop

Description

@saraeloop

Summary

Noēsis is now much closer to a defensible runtime kernel.

The highest-risk integrity gaps around canonical trace handling and checkpoint/resume continuity have been closed. The remaining work is narrower: finishing the authority contract between events.jsonl and persisted state.json, especially for metadata, rollup, and non-execution state slices.

This issue remains open to track that final authority-contract work separately from layout, CLI, and general cleanup.

Current status

Closed

  • events.jsonl corruption fails hard
  • resume/checkpoint validates anchored artifact integrity
  • plan is derivable from trace for current runtime semantics

Mostly closed

  • outcomes.actions is mostly derivable from trace
  • remaining caution: action timestamp ownership is still finalized in the event adapter

Still open

  • define the final authority contract for non-plan / non-action state slices
  • resolve or demote episode.started_at
  • resolve or demote episode.intuition_mode
  • resolve or demote outcomes.summary
  • resolve or demote outcomes.metrics
  • classify process, links, and similar fields as projection metadata or make them trace-backed
  • remove, demote, or fully implement dormant beliefs / memory state surfaces
  • add one final whole-state authority inventory test

Problems

1. events.jsonl corruption was not fail-fast

The canonical trace should be treated as authoritative evidence. This has now been fixed.

Resolved impact

  • corrupt traces are no longer silently accepted
  • replay and audit tooling no longer read malformed evidence as valid
  • append is rejected when the canonical trace is already damaged

2. resume/checkpoint consistency did not validate artifact_manifest_hash

Checkpoint metadata already recorded artifact digest information, but resume validation did not enforce it. This has now been fixed.

Resolved impact

  • artifact-set tampering between pause and resume is now rejected
  • checkpoint continuity is aligned with the checkpoint contract

3. state.json is not yet fully aligned with events.jsonl authority

This is the remaining open problem.

The runtime is now in much better shape:

  • plan is reconstructible from trace for current runtime semantics
  • outcomes.actions is much closer to trace-derived

But several persisted state.json slices are still richer than the event stream, projection-like, or insufficiently classified.

Open impact

  • events.jsonl is not yet the clearly enforced authority source for all persisted runtime state
  • some state fields still behave more like rollups, metadata, or dormant schema surface than canonical trace-backed state
  • ADR/runtime contract drift remains open for these remaining slices

Scope

  • Make trace corruption fail hard
  • Validate manifest integrity during resume/checkpoint consistency
  • Finish the authority contract between events.jsonl and state.json

Work items

Trace integrity

  • Treat malformed events.jsonl records as integrity failures
  • Remove silent-skip behavior for corrupted trace records
  • Reject appends to an already-corrupted canonical log
  • Add typed corruption signaling at the trace layer

Resume/checkpoint integrity

  • Validate artifact_manifest_hash during resume/checkpoint consistency checks
  • Fail resume when artifact manifest state does not match checkpoint evidence
  • Surface manifest mismatch as explicit integrity failure

Event authority vs state derivability

Plan / action path

  • Enrich action events with state-relevant action fields
  • Emit reconstructable structured plan evidence
  • Add plan-state reconstruction coverage from events.jsonl
  • Remove the remaining action timestamp ownership seam from the event adapter

Remaining state-authority slices

  • Decide which persisted state.json fields are truly trace-authoritative
  • Decide which fields are intentionally projection metadata / rollups
  • Canonically emit or explicitly demote episode.started_at
  • Canonically emit or explicitly demote episode.intuition_mode
  • Canonically emit or explicitly demote outcomes.summary
  • Canonically emit or explicitly demote outcomes.metrics
  • Classify process and links as trace-backed state or projection metadata
  • Remove, demote, or fully implement dormant beliefs / memory state surfaces
  • Enforce the chosen rule consistently in runtime behavior, docs, and ADRs

Acceptance criteria

  • Corrupted or malformed events.jsonl is surfaced as an integrity failure, not silently ignored
  • Resume/checkpoint consistency rejects mismatched artifact manifest state
  • events.jsonl and state.json have a clearly enforced authority contract
  • Every persisted state.json slice is explicitly classified as one of:
  • trace-authoritative and derivable from events.jsonl plus deterministic rules
  • projection / metadata / rollup state that is not part of the canonical trace-authority claim
  • Docs and ADR language match the enforced runtime contract

Tests

  • Add corruption regression for malformed events.jsonl
  • Add regression for truncated or partially written trace lines
  • Add regression for append rejection when the existing trace is corrupt
  • Add resume/checkpoint regression for manifest-hash mismatch
  • Add reconstruction regression for the plan portion of state.json
  • Add or tighten reconstruction regression for outcomes.actions
  • Add one final whole-state authority inventory test
  • Add or update contract tests for the final chosen authority model

Notes

This issue is intentionally separate from layout, schema, and CLI cleanup because these are kernel trust concerns, not general backlog hygiene.

At this point, the remaining gaps are mostly metadata, rollup semantics, and dormant state surfaces — not the core plan/action execution path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs:artifactsMissing ## Artifacts blockneeds:triageAwaiting maintainer triagetype:bugDefects in behavior

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions