Deterministic reasoning around one capable agent.
A component of QA Veritas — an exploration of how AI agents reason about, verify, and operate complex systems.
The common pattern for "AI triage" is to wire a model directly to a pile of logs and hope. It fails in two predictable ways. The model miscounts — it reports "3 failures" when there were 5, because it eyeballed a wall of text instead of parsing it. And it chases the loudest error instead of the first one, so the conclusion is a downstream symptom, not the cause. A root-cause analysis you can't reproduce, built on numbers a model guessed, is not analysis. It's a confident narrative.
Split the investigation at the line between facts and judgment. Cheap, deterministic code establishes what happened — exact counts, statuses, the first error signature, the phase. Only then does a single capable agent reason about why. The model never re-derives what a script can compute reliably, and every finding is written into a typed, file-locked state document. The report is generated from that state, so it's reproducible: same state, same brief, every time.
The agent never counts. The counts are already in
state.json.
flowchart LR
R[result + logs] --> P[Parse facts<br/><i>deterministic</i>]
P --> C[Classify<br/>PASS/HANG/TEST/INFRA/<br/>FRAMEWORK/PRODUCT]
C --> PL[Plan<br/>cheapest-first +<br/>stop conditions]
PL --> I[Agent investigates<br/><i>judgment</i>]
I --> S[Synthesize<br/>root-cause brief]
P -.-> ST[(state.json<br/>typed, file-locked)]
C -.-> ST
PL -.-> ST
I -.-> ST
ST --> S
- Determinism around nondeterminism — a regex where a regex belongs; the model saved for judgment, never arithmetic.
- First error, not loudest — causes precede cascades; the pipeline anchors on the first signal.
- Failure taxonomy —
PASS / HANG / TEST-BUG / INFRA / FRAMEWORK / PRODUCT, because a label is only useful if it changes what you do next. - State as the source of truth — typed sections (
deterministic_findings,evidence_sources,triage,log_investigation,root_cause) under a file lock; the brief is a render of state, not a fresh model call. - Graceful degradation — no live access? The plan scopes itself to logs-only instead of failing.
Classification grounded in a parsed signature, not a vibe:
$ python -m statetriage classify --result examples/result.json
class: INFRA
confidence: high
first_error: write rejected (FORBIDDEN/8/index read-only)
rationale: Signature matches a storage flood-stage guard; the test
asserted nothing — the environment rejected the write.
pip install -e . # or: python -m statetriage --help
# Full pipeline: parse → classify → plan → write state → brief
python -m statetriage run --result examples/result.json --log examples/run.log --state out/state.json
python -m statetriage classify --result examples/result.json # just the label
python -m statetriage brief --state out/state.json # render brief from statePython 3.10+, zero third-party runtime dependencies.
For engineers: triage stops being a talent that lives in two senior people and becomes a repeatable system. The brief is defensible in a review because every number traces to a parse, not a guess.
For AI agents: this is the pattern for letting a model do the part it's good at (reasoning over evidence) while fencing off the part it's bad at (counting, exact recall). The typed state file also makes the investigation composable — another tool can read root_cause and act on it.
- A reference agent runner that consumes the plan and fills the
log_investigation/root_causesections. - A prior-incident matcher keyed by error signature — search memory before investigating fresh.
- Signature packs loaded from YAML, so the classifier is data, not code.
- A
replaymode that re-runs synthesis over historical state to measure classifier drift.
QA Veritas explores AI-Native Verification Engineering — practical patterns for a future where humans and AI agents operate complex systems together. Every component serves one loop:
Memory → Reasoning → Verification → Action
QA Veritas
├── Resource Ledger Memory operational truth as a git tree
├── State Triage ◀ you are here Reasoning deterministic triage around an agent
├── LogLens Reasoning code-aware evidence from logs
├── Intent Verify Verification declarative intent → observable proof
├── Runbook Forge Runbooks procedures derived from verified history
├── SkillPack Skills progressive-disclosure agent capability
└── Future Agents Agents narrow operators that compose the above
| Layer | Component |
|---|---|
| Memory | Resource Ledger |
| Reasoning | State Triage (this repo) · LogLens |
| Verification | Intent Verify |
| Runbooks | Runbook Forge |
| Skills | SkillPack |
| Writing | Field notes & essays |
Start at the platform overview. MIT licensed.