Provenance

An autonomous incident-response coordinator where every finding traces back to the specific arrow that produced it. Submission to the Find Evil! hackathon.

Built on weft for typed arrow composition and the SANS SIFT Workstation for the underlying forensics tooling.

The thesis

Existing AI-assisted DFIR tooling (Protocol SIFT and similar) gives an LLM agent a generic execute_shell_cmd plus a long system prompt and hopes it stays on the rails. The hackathon brief flags this as the source of the autonomy gap: hallucination, evidence spoliation risk, loops that don't terminate.

Provenance replaces that surface with a typed action space and a critic-mediated policy:

Typed arrows wrap each SIFT tool. Inputs and outputs are Go structs, not raw shell text. An agent cannot construct invalid command lines, scan outside the case directory, or run a Windows plugin on a Linux image — those misconfigurations are not expressible.
Multi-agent loop with specialist proposers (memory / disk / network) and an independent critic that scores proposals before any arrow runs. Specialists are stateless across iterations; the evidence bag is the only memory.
Deterministic coordinator in Go (no LLM in the loop body) enforces termination, audit-trail emission, and proposal selection.
Functional seams for every behavioral choice. Selection policy, critic invocation strategy, proposal collection, and termination conditions are all swappable closures.

Quick start

make all          # tidy, build, test
make test         # ~25 unit tests, no SIFT install required
make test-verbose # show every test by name
make cover        # coverage report
make run          # run the mcp-server binary

The tests require nothing beyond Go and this repository — every SIFT tool invocation is unit-tested via a fake runner.RunFunc against captured fixture output. The production binary (bin/mcp-server) expects the real SIFT tools (vol, yara, ...) on PATH and is intended to be deployed inside the SIFT Workstation VM.

Requires Go 1.23 or newer.

Layout

provenance/
├── runner/                 functional seam for subprocess execution
│   └── runner.go             RunFunc, RealRun, NewFake
├── sift/                   typed SIFT arrows wrapping forensics tools
│   ├── types.go              Process, Timeline, YaraReport, ...
│   ├── pslist.go             Volatility 3 pslist (Win/Linux dispatch)
│   ├── yara.go               YARA scan with case-root path validation
│   ├── triage.go             composed Par(memory, disk) -> synthesis arrow
│   └── fixtures/             captured tool outputs for tests
├── coord/                  the multi-agent coordinator
│   ├── types.go              EvidenceBag, Proposal, Verdict, enums
│   ├── predicate.go          DSL parser + field registry + evaluator
│   ├── seams.go              four behavioral seams + defaults
│   └── coordinator.go        the loop, exposed as a weft.Arrow
├── mcpadapter/             generic weft.Arrow -> MCP tool wrapper
│   └── register.go           one function: RegisterArrow[In, Out]
└── cmd/mcp-server/         production binary serving MCP over stdio

The functional seams (one principle, applied everywhere)

Every test/production boundary in this repo is a function type, not an interface. Function types cannot be type-asserted back to a concrete implementation — there is no concrete type behind the seam, only a closure. This makes the boundary opaque and prevents the class of bug where someone reaches through the seam to bypass it.

The seams:

Seam	Type	Where
Subprocess runner	`RunFunc`	`runner/`
Trace emission	`TraceTap`	`coord/types.go`
Proposal collection	`CollectProposalsFn`	`coord/seams.go`
Critic invocation	`InvokeCriticFn`	`coord/seams.go`
Selection policy	`SelectFn`	`coord/seams.go`
Termination check	`TerminationCondition`	`coord/seams.go`
Arrow execution	`ArrowExecutor`	`coord/types.go`

Each ships with a sensible default; each can be replaced field-by-field on the Coordinator struct before calling Run().

Architectural guardrails

These are read-only and integrity properties enforced before any subprocess is spawned or any LLM is called:

Volatility plugin dispatch is profile-driven Go code. A MemoryImage.Profile starting with "Win" selects windows.pslist; anything else selects linux.pslist. No agent-controlled string could pick the wrong plugin.
YARA target paths must resolve inside CaseRoot after cleaning and filepath.Rel normalization. ../../etc/passwd is rejected before exec.Command is called. CaseRoot must be absolute.
Predicate DSL parses at proposal-receipt time. Malformed predicates and references to unknown fields are caught before the critic or the executor sees them.
The coordinator loop is deterministic Go. Termination conditions, proposal validation, audit-trail emission — none of these run through an LLM, so none of them can be coaxed off-rail by a prompt-injection.

The audit trail (where the name comes from)

Every ExecutionStep records the proposing specialist, the arrow that ran, the rationale and predicate that were supplied, which bag fields the step populated, and timing data. Findings carry an Evidence slice naming the arrows that contributed their supporting data. A judge running a demo can trace any "high severity" claim through findings -> arrows -> trace entries -> raw tool invocations.

What's wired vs. what's stubbed

Layer	State
Typed arrows (sift/)	Real
Subprocess seam	Real
MCP tool exposure	Real
Coordinator loop	Real
Predicate DSL	Real
Termination conditions	Real
Specialist agents	Stubs
Critic agent	Stub
Arrow executor	Stub

The stubs are deterministic Go that exercise every code path in the loop. Phase 2 of the build replaces them with LLM-driven specialists and critic, plus a real executor that dispatches ArrowName to the concrete sift.* arrows.

Production roadmap

For the hackathon, this is plain Go. For the long-term shape:

Temporal would handle durable execution, retry-aware activities, workflow replay as native audit trail, and signals/queries for human-in-the-loop interaction. The coordinator's ArrowExecutor seam is the natural lift point — each arrow execution becomes a Temporal activity, the coordinator loop becomes the workflow body.
Learned coordinator policy. The proposal/verdict/outcome triples produced by every run are training data. A future version replaces the heuristic critic with a model trained on the trace corpus to predict the next-best arrow given a bag state.
Cross-case calibration. Specialists carry forward calibration scores between cases — agents whose expectations match reality more often get their proposals weighted higher in selection.

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Provenance

The thesis

Quick start

Layout

The functional seams (one principle, applied everywhere)

Architectural guardrails

The audit trail (where the name comes from)

What's wired vs. what's stubbed

Production roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cmd/mcp-server		cmd/mcp-server
coord		coord
mcpadapter		mcpadapter
runner		runner
sift		sift
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

Provenance

The thesis

Quick start

Layout

The functional seams (one principle, applied everywhere)

Architectural guardrails

The audit trail (where the name comes from)

What's wired vs. what's stubbed

Production roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages