Code is cheap. Trusted change is not.
AI agents write code fast. The bottleneck is knowing whether to merge it. PR count goes up. Confidence doesn't. The tooling, the CI systems, the review infrastructure are all built for a developer writing code in an editor. Developers already moved past that.
We build what's missing. Deterministic checks. Structured evidence at every stage. CI-native gates that tell you what actually passed before a human opens the PR.
Rust-first. Schema-validated. Actively publishing to crates.io.
We trust receipts, not agents.
The governed SDLC separates author from critic. The agent writing code is not the agent deciding if it passed. Every stage produces structured evidence: build receipts, gate results, mutation scores, requirement traceability. The human reviews artifacts, not chat transcripts.
Signal → Plan → Build (Author ⇄ Critic) → Review → Gate → Deploy → Wisdom
Agents game metrics. They write benchmarks that test struct creation instead of real workloads. They delete tests to stay green. Oppositional validation and mutation-on-diff catch what coverage alone misses. The talk covers where this breaks and what we do about it.
| Repo | What it does |
|---|---|
| demo-swarm | SDLC template producing review-ready PRs with adversarial build loops and build receipts |
| flow-studio | Runner + UI to execute and visualize swarm runs (flows, receipts, transcripts) with stepwise backends |
| agent-backplane | Backplane for Agent SDKs |
| xchecker | Spec pipelines (requirements → design → tasks) with lockfiles, safe fixups, and versioned receipts |
| cockpitctl | PR glass cockpit: compiles CI sensor receipts into a deterministic merge surface |
| rust-as-spec | AC-first, policy-gated, Nix-pinned, ADR-tracked Rust service starter |
Small, focused tools. Each one answers a specific question about the change and produces a receipt. cockpitctl compiles them into a merge surface.
| Repo | What it checks |
|---|---|
| covguard | Coverage gates with structured output |
| perfgate | Perf budgets and baseline diffs for CI/PR bots |
| lintdiff | Diff-scoped lint enforcement |
| diffguard | Diff-scoped governance linting for PR automation |
| depguard | Dependency manifest hygiene for Rust workspaces |
| semverguard | Semver checks across workspaces with filtering, scoping, and receipts |
| buildfix | Build contract repair |
| builddiag | Build contract validation for Rust repos |
| shipper | Resumable, backoff-aware publishing reliability layer for Rust workspaces |
| shiplog | Shipping packet generator: compiles GitHub activity into self-review packets with receipts |
| Repo | What it does |
|---|---|
| tokmd | Code intelligence for humans, machines, and LLMs: receipts, metrics, and insights from your codebase |
| adze | Rust-native grammar toolchain with GLR-capable parsing and typed extraction. Tree-sitter interoperable |
| perl-lsp | Rust Perl Language Server (LSP) + parser toolkit (tree-sitter + pure Rust) |
| BitNet-rs | Rust inference engine for 1-bit BitNet LLMs (GGUF + llama.cpp compatible) |
Every enterprise AI engagement starts with getting data out of systems built before the architect was born.
| Repo | What it parses |
|---|---|
| copybook-rs | COBOL copybooks: EBCDIC/ASCII to JSON and Parquet with byte-identical round trips |
| pst-rs | Outlook PST/OST: full encryption support, attachment streaming, JSON/EML/MBOX export. Python and C bindings |
| hl7v2-rs | HL7 v2: parser, validator, generator. 18 microcrates, conformance profiles, HTTP/REST with Prometheus metrics |
| Repo | What it does |
|---|---|
| OpenRacing | Rust 1kHz force-feedback + telemetry stack for sim racing wheels |
| OpenFlight | Rust real-time input + device layer for flight simulation. 250Hz axis pipeline |
| slower-whisper | Local-first conversation ETL with streaming, topic segmentation, and schema-versioned JSON |
| uselesskey | Deterministic cryptographic key and certificate fixtures for Rust tests |