feat(core): decision provenance + v0.7.0 release#12
Merged
Conversation
Decision provenance — signed, hash-chained DecisionRecord captured at every decision boundary (pre_tool_use / stop / subagent_start) before the action executes, so a downstream verifier can prove the rationale was bound at decision time and not retrofitted. Lives in the same AttestationChain as AttestationRecord; each subsequent attestation Evidence-links the decisions that preceded it. CaptureTier discriminates how much rationale was actually captured — production today is Tier C (Minimal); Tier B/A schemas reserved for adapter-specific deliberation surfaces (Claude reasoning streams, OpenAI Responses reasoning content, etc.). Verifiable end-to-end via `python -m agentegrity verify-decisions <chain.json>` or programmatically via `AttestationChain.verify_decision_links()`. Capture fails open: on exception, logs a warning AND emits a structured `capture_failure` FrameworkEvent so monitoring can see the gap. New symbols (all in `agentegrity` top-level): DecisionRecord, DecisionInput, RejectedAlternative, CaptureTier, ChainedRecord Protocol, Evidence (was previously package-internal), build_attestation_record helper, build_decision_record helper. _BaseAdapter and IntegrityMonitor gain optional signing_key= and record_decision(). AttestationChain gains to_json/from_json, verify_chain_detailed, verify_decision_links. Backward-incompatible changes: - AttestationRecord canonical payload now includes `record_kind`. Required so the heterogeneous chain can distinguish kinds under signature (otherwise a tamperer could flip a decision into an attestation post-signing). - Evidence.content_hash is now real SHA-256 of the canonical layer-result JSON. Was process-salted Python hash() — non- deterministic across processes and non-portable, which silently broke any cross-process tamper-evident verification. Three duplicated record-build paths (adapter base, monitor, SDK client) now share one build_attestation_record() helper. - Chains serialized pre-v0.7 fail verify_chain() after upgrade, signed or not. The in-memory recomputed content_hash (now over the new canonical bytes) doesn't match the stored chain_previous references in subsequent records. No rescue migration script: re-build from a fresh root with the new code or pin to v0.6 for legacy verification. Release machinery: - pyproject 0.6.0 → 0.7.0; src/agentegrity/__init__.py __version__; README badge + roadmap (v0.7 entry, v0.6 demoted from (current), v0.8 forward-looking); spec/threat-model.md version + date; STATUS last-reviewed. - 7 @agentegrity/* npm packages bumped + @agentegrity/client peer pin bumped where present. - Repo references renamed cogensec/agentegrity-framework → cogensec/agentegrity across 19 files (GitHub rename completed by the maintainer; old URLs redirect for now). - CHANGELOG [Unreleased] → [0.7.0] - 2026-06-08 with new compare footer. - _ADAPTERS list in `python -m agentegrity` info output now shows autogen / agno / bedrock_agents alongside the original five. Spec at spec/properties/decision-provenance.md. Three new glossary entries: Decision Record, Capture Tier, Decision Boundary. Test impact: +66 tests (414 → 480). mypy clean across 39 source files. ruff clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Decision provenance — signed, hash-chained DecisionRecord captured at every decision boundary (pre_tool_use / stop / subagent_start) before the action executes, so a downstream verifier can prove the rationale was bound at decision time and not retrofitted. Lives in the same AttestationChain as AttestationRecord; each subsequent attestation Evidence-links the decisions that preceded it. CaptureTier discriminates how much rationale was actually captured — production today is Tier C (Minimal); Tier B/A schemas reserved for adapter-specific deliberation surfaces (Claude reasoning streams, OpenAI Responses reasoning content, etc.).
Verifiable end-to-end via
python -m agentegrity verify-decisions <chain.json>or programmatically viaAttestationChain.verify_decision_links(). Capture fails open: on exception, logs a warning AND emits a structuredcapture_failureFrameworkEvent so monitoring can see the gap.New symbols (all in
agentegritytop-level): DecisionRecord, DecisionInput, RejectedAlternative, CaptureTier, ChainedRecord Protocol, Evidence (was previously package-internal), build_attestation_record helper, build_decision_record helper. _BaseAdapter and IntegrityMonitor gain optional signing_key= and record_decision(). AttestationChain gains to_json/from_json, verify_chain_detailed, verify_decision_links.Backward-incompatible changes:
record_kind. Required so the heterogeneous chain can distinguish kinds under signature (otherwise a tamperer could flip a decision into an attestation post-signing).Release machinery:
python -m agentegrityinfo output now shows autogen / agno / bedrock_agents alongside the original five.Spec at spec/properties/decision-provenance.md. Three new glossary entries: Decision Record, Capture Tier, Decision Boundary.
Test impact: +66 tests (414 → 480). mypy clean across 39 source files. ruff clean.