Reported by: Codex
Requested by: jmr.pineda
Priority: P1
Affected surfaces: Execution receipts, audit evidence, workflow detail views, MCP read models, operator trust surfaces, Central gateway audit
Constraints: Sensitive arguments and payloads may need redaction or hashing; evidence should be queryable without bloating timeline markdown; do not rely on raw provider transcripts as the main audit artifact.
Summary
SpecForge should persist structured evidence for model-facing execution-tool use and expose that evidence through operator-facing inspection surfaces so governed tool access remains auditable, explainable, and reviewable across both OSS-local execution tools and gateway-routed organization-grade tools.
Problem / opportunity
If SpecForge allows phases to use private-repo retrieval, RAG, CAG, or external connectors, operators need more than the final markdown artifact. They need to know which tools ran, why they were allowed, which sources were consulted, whether a request was handled locally or through a gateway, and what was returned at a high level. Without this, tool calling weakens the harness story and makes later review, policy enforcement, compliance export, and commercial governance much harder.
Requested behavior
Extend execution receipts and related read models so every governed tool invocation leaves structured evidence. The product should also expose an initial inspection surface that shows the effective tool policy for a phase, the tool calls that occurred, the routing path taken, the sources reached, and any redaction, denial, or truncation decisions relevant to operator understanding.
Scope
- In scope: Tool-call evidence schema; normalized logging of tool ID, policy decision, routing path, arguments summary, source targets, output summary or hashes, duration, cost or budget impact, redaction flags, and denial reasons; MCP or UI inspection surface for the latest execution.
- Out of scope: Full enterprise analytics in the first cut; storing every raw payload forever; exposing sensitive content without policy-aware redaction.
Acceptance criteria
- SpecForge persists a structured evidence record for each governed execution-tool invocation.
- The evidence model can capture allowed, denied, redacted, truncated, failed, local, and gateway-routed tool outcomes without collapsing them into plain text.
- At least one operator-facing surface can inspect tool-use evidence for a phase execution together with the effective tool policy and routing context.
- The evidence model is suitable for later PR evidence packs, audit exports, and Central governance review without redesigning the core contract.
Notes
- This work should align with the existing evidence and prompt-inspection direction already present in the harness roadmap.
- Useful persisted fields may include tool request ID, execution-tool ID, routing class, policy snapshot hash, source identifiers, argument digest, output digest, token impact, elapsed time, and operator-visible warnings.
Reported by: Codex
Requested by: jmr.pineda
Priority: P1
Affected surfaces: Execution receipts, audit evidence, workflow detail views, MCP read models, operator trust surfaces, Central gateway audit
Constraints: Sensitive arguments and payloads may need redaction or hashing; evidence should be queryable without bloating timeline markdown; do not rely on raw provider transcripts as the main audit artifact.
Summary
SpecForge should persist structured evidence for model-facing execution-tool use and expose that evidence through operator-facing inspection surfaces so governed tool access remains auditable, explainable, and reviewable across both OSS-local execution tools and gateway-routed organization-grade tools.
Problem / opportunity
If SpecForge allows phases to use private-repo retrieval, RAG, CAG, or external connectors, operators need more than the final markdown artifact. They need to know which tools ran, why they were allowed, which sources were consulted, whether a request was handled locally or through a gateway, and what was returned at a high level. Without this, tool calling weakens the harness story and makes later review, policy enforcement, compliance export, and commercial governance much harder.
Requested behavior
Extend execution receipts and related read models so every governed tool invocation leaves structured evidence. The product should also expose an initial inspection surface that shows the effective tool policy for a phase, the tool calls that occurred, the routing path taken, the sources reached, and any redaction, denial, or truncation decisions relevant to operator understanding.
Scope
Acceptance criteria
Notes