Skip to content

qa-veritas/state-triage

Repository files navigation

State Triage

Deterministic reasoning around one capable agent.

QA Veritas layer ci

A component of QA Veritas — an exploration of how AI agents reason about, verify, and operate complex systems.


Problem

The common pattern for "AI triage" is to wire a model directly to a pile of logs and hope. It fails in two predictable ways. The model miscounts — it reports "3 failures" when there were 5, because it eyeballed a wall of text instead of parsing it. And it chases the loudest error instead of the first one, so the conclusion is a downstream symptom, not the cause. A root-cause analysis you can't reproduce, built on numbers a model guessed, is not analysis. It's a confident narrative.

Core Idea

Split the investigation at the line between facts and judgment. Cheap, deterministic code establishes what happened — exact counts, statuses, the first error signature, the phase. Only then does a single capable agent reason about why. The model never re-derives what a script can compute reliably, and every finding is written into a typed, file-locked state document. The report is generated from that state, so it's reproducible: same state, same brief, every time.

The agent never counts. The counts are already in state.json.

Architecture Diagram

flowchart LR
    R[result + logs] --> P[Parse facts<br/><i>deterministic</i>]
    P --> C[Classify<br/>PASS/HANG/TEST/INFRA/<br/>FRAMEWORK/PRODUCT]
    C --> PL[Plan<br/>cheapest-first +<br/>stop conditions]
    PL --> I[Agent investigates<br/><i>judgment</i>]
    I --> S[Synthesize<br/>root-cause brief]
    P -.-> ST[(state.json<br/>typed, file-locked)]
    C -.-> ST
    PL -.-> ST
    I -.-> ST
    ST --> S
Loading

Concepts

  • Determinism around nondeterminism — a regex where a regex belongs; the model saved for judgment, never arithmetic.
  • First error, not loudest — causes precede cascades; the pipeline anchors on the first signal.
  • Failure taxonomyPASS / HANG / TEST-BUG / INFRA / FRAMEWORK / PRODUCT, because a label is only useful if it changes what you do next.
  • State as the source of truth — typed sections (deterministic_findings, evidence_sources, triage, log_investigation, root_cause) under a file lock; the brief is a render of state, not a fresh model call.
  • Graceful degradation — no live access? The plan scopes itself to logs-only instead of failing.

Examples

Classification grounded in a parsed signature, not a vibe:

$ python -m statetriage classify --result examples/result.json
class:      INFRA
confidence: high
first_error: write rejected (FORBIDDEN/8/index read-only)
rationale:  Signature matches a storage flood-stage guard; the test
            asserted nothing — the environment rejected the write.

Quick Start

pip install -e .          # or: python -m statetriage --help

# Full pipeline: parse → classify → plan → write state → brief
python -m statetriage run --result examples/result.json --log examples/run.log --state out/state.json

python -m statetriage classify --result examples/result.json   # just the label
python -m statetriage brief --state out/state.json             # render brief from state

Python 3.10+, zero third-party runtime dependencies.

Why It Matters

For engineers: triage stops being a talent that lives in two senior people and becomes a repeatable system. The brief is defensible in a review because every number traces to a parse, not a guess.

For AI agents: this is the pattern for letting a model do the part it's good at (reasoning over evidence) while fencing off the part it's bad at (counting, exact recall). The typed state file also makes the investigation composable — another tool can read root_cause and act on it.

Future Vision

  • A reference agent runner that consumes the plan and fills the log_investigation / root_cause sections.
  • A prior-incident matcher keyed by error signature — search memory before investigating fresh.
  • Signature packs loaded from YAML, so the classifier is data, not code.
  • A replay mode that re-runs synthesis over historical state to measure classifier drift.

Part of QA Veritas

QA Veritas explores AI-Native Verification Engineering — practical patterns for a future where humans and AI agents operate complex systems together. Every component serves one loop:

Memory → Reasoning → Verification → Action

QA Veritas
├── Resource Ledger                    Memory       operational truth as a git tree
├── State Triage      ◀ you are here   Reasoning    deterministic triage around an agent
├── LogLens                            Reasoning    code-aware evidence from logs
├── Intent Verify                      Verification declarative intent → observable proof
├── Runbook Forge                      Runbooks     procedures derived from verified history
├── SkillPack                          Skills       progressive-disclosure agent capability
└── Future Agents                      Agents       narrow operators that compose the above
Layer Component
Memory Resource Ledger
Reasoning State Triage (this repo) · LogLens
Verification Intent Verify
Runbooks Runbook Forge
Skills SkillPack
Writing Field notes & essays

Start at the platform overview. MIT licensed.