Smitebot: Automated Fuzzing Campaign Manager

### Problem

Running a smite campaign today requires significant manual coordination at every stage. A typical 8-core LND campaign involves:

- Constructing AFL++ command lines with correct `-M`/`-S` flags for main and secondary runners
- Managing tmux sessions and monitoring externally via `afl-whatsup`
- After stopping: walking each runner's `queue/` directory and invoking `afl-cmin -X` manually
- Running `coverage-report.sh` with no baseline or diff against prior runs
- Finding crash files with no systematic path to reproduction, minimization, or deduplication

Several of these fail silently — a missing `-X` flag skips Nyx mode entirely, truncated `covcounters.*` files under high parallel load produce misleading coverage numbers, a missed `queue/` subdirectory silently discards corpus inputs. There is also no institutional memory across campaigns- no reliable way to know whether a new campaign covered more ground than the last, or whether a new crash is genuinely novel.

### Solution

Smitebot is a Rust CLI tool that automates the full fuzzing lifecycle — from multi-core AFL++ campaign launch and clean shutdown, to corpus minimization, crash triage with deduplication, and historical coverage diffs. The same 11-step manual workflow becomes three commands: `smitebot doctor`, `smitebot start`, `smitebot stop`.

The design is inspired by what [syzbot](https://syzkaller.appspot.com/) provides for the Linux kernel — a state-aware orchestration layer that makes fuzzing campaigns reproducible, auditable, and accessible.

---

## Architecture

Smitebot is added as a new crate to the existing workspace. It is an **orchestration layer, not a rewrite** — existing scripts (`coverage-report.sh`, `symbolize-crash.sh`, `setup-nyx.sh`) encode subtle, validated, target-specific behavior and are called as subprocesses. Smitebot adds state, coordination, and the missing pipeline pieces around them.

<img width="834" height="434" alt="Image" src="https://github.com/user-attachments/assets/372e9f74-0043-499d-9138-adb7d2803db6" />

smitebot/src/main.rs      — clap command dispatch
smitebot/src/config.rs    — TOML schema and validation
smitebot/src/state.rs     — campaign state machine + JSON persistence
smitebot/src/campaign.rs  — start / stop / status orchestration
smitebot/src/process.rs   — AFL++ process supervision, wraps ManagedProcess
smitebot/src/corpus.rs    — merge and minimize
smitebot/src/triage.rs    — crash discovery, reproduce, afl-tmin, dedup
smitebot/src/coverage.rs  — coverage-report.sh invocation, snapshot, diff
smitebot/src/doctor.rs    — prerequisite checks
smitebot/src/docker.rs    — image build wrappers

New dependencies: `clap`, `toml`, `serde_json`. Reused from workspace: `log`, `thiserror`, `serde`, `nix`.

Builds directly on PR [#32 ](https://github.com/morehouse/smite/pull/32) — `ManagedProcess::process_group(0)` is the foundation for the `stop` command.

---

## Core Concepts

### Campaign State

This is the foundational piece absent from the current codebase. Before smitebot can reliably stop a campaign, diff a coverage report, or triage crashes in context, it needs persistent knowledge of what is running.

State is written to `~/.smitebot/runs/<campaign-id>/state.json`. Key fields: `campaign_id`, `config_hash` (SHA-256 of config snapshot), `state` enum (`created/starting/running/stopping/stopped/failed`), per-runner `pid`/`pgid`, and artifact directory pointers. All writes are atomic (temp file + rename) — a killed smitebot never leaves corrupted state..

### Commands

**`smitebot doctor`**
Prerequisite checks before any campaign: x86_64 arch, `/dev/kvm` accessible, Docker daemon reachable, `afl-fuzz`/`afl-cmin`/`afl-tmin`/`afl-whatsup` on PATH, AFL++ Nyx mode built, `libnyx.so` on `LD_LIBRARY_PATH`, VMware backdoor enabled, smite scripts present and executable. `--json` flag for CI use.

**`smitebot start / stop / status`**
TOML config consistent with `aflr_cfg_smite_lnd_encrypted_bytes.toml`. Each runner spawned via `ManagedProcess::spawn()` with `process_group(0)` — pgid recorded to state, cleanup reaches grandchildren. Start rolls back all runners if any fail within 3 seconds of launch.

Nyx campaign stop is process group termination only — SIGTERM → timed wait → SIGKILL via killpg. CLN graceful stop (lightning-cli stop via docker exec) is scoped exclusively to the coverage/replay workflow, where CLN runs as a real long-lived host process and must exit cleanly for profraw files to flush correctly. This matches the existing Drop impl in cln.rs.. Status reads `fuzzer_stats` from each runner's output directory (`execs_done`, `execs_per_sec`, `corpus_count`, `saved_crashes`, `saved_hangs`) and prints a combined summary.

**`smitebot corpus merge / minimize`**
`merge` walks `$afl_out/*/queue/` via `std::fs`, collects all inputs except `README.txt`, copies with normalized names to a merged directory. `minimize` invokes `afl-cmin -X -i <merged> -o <minimized> -- <sharedir>`.

Kept as separate commands intentionally: merge is always fast and safe (pure collection, no binary required); minimize requires the target binary and Nyx sharedir and is run deliberately.

**`smitebot crashes triage`**

<img width="855" height="452" alt="Image" src="https://github.com/user-attachments/assets/5d413d39-bafb-4716-8759-761924377b9f" />

Pipeline per crash input:
1. **Discover** — walk `$afl_out/*/crashes/` across all runners
2. **Reproduce** — `docker run` in local mode with `SMITE_INPUT` set, retry ×3, classify `Reproducible`/`Flaky`
3. **Minimize** — `afl-tmin -X` for reproducible crashes
4. **Symbolize** — target-specific: CLN/LDK via `llvm-symbolizer`; LND goroutine stacks already symbolic; Eclair JVM output already symbolic
5. **Fingerprint** — SHA-256 over `target:scenario:signal:frame_0:frame_1:frame_2`
6. **Deduplicate** — check against `~/.smitebot/crashes/known.json`
7. **Report** — per-bug-group directory with raw crash, minimized input, symbolized log, metadata JSON

**`smitebot coverage`**
Wraps `coverage-report.sh` as a subprocess, preserving all existing target-specific behavior. Adds: timestamped snapshots after each run, diff against prior snapshot. Parsing uses machine-readable formats — coverage.txt (LND) — already produced by the existing pipeline via covdata merge; smitebot snapshots this file after each run and computes per-file deltas across snapshots directly.), `llvm-cov export --format=text` (CLN/LDK), `jacoco.csv` (Eclair). No HTML scraping.

### Testing Strategy

Three layers:

- **Unit tests** — config parsing, state machine transitions, fingerprint hashing, corpus file collection. No external dependencies, runs in CI.
- **Integration tests** — lifecycle commands tested with `sleep infinity` runners spawned via `process_group(0)`, matching the exact spawn path used for real fuzzers. Synthetic `state.json` files injected to test reconcile and rollback paths independently. Grandchild absence in `/proc` asserted after `smitebot stop`.
- **Fixture tests** — corpus and triage tests run against pre-populated `queue/` and `crashes/` trees. No live fuzzer required.

CLN graceful stop (`docker exec lightning-cli stop`) is tested manually against a live container and gated behind `--integration` so it does not block CI on hosts without Docker.

---

## Implementation Plan

Milestones are structured as vertical slices — each delivers a working system for a narrow set of functionality. After Milestone 2 is complete, most subsequent milestones can be developed in parallel.

- [ ] **Milestone 1: Foundation** — Crate scaffolding, TOML config schema, `smitebot doctor`, CI pipeline (`fmt`, `clippy`, `test`)
- [ ] **Milestone 2: Campaign Lifecycle** — `smitebot start` (single + multi-core), `stop` (with CLN graceful stop), `status` with `fuzzer_stats` aggregation
- [ ] **Milestone 3: Reliability** — Stale-state reconcile, rollback on partial start, integration tests for lifecycle commands
- [ ] **Milestone 4: Corpus Management** — `smitebot corpus merge` and `smitebot corpus minimize` with before/after reporting
- [ ] **Milestone 5: Crash Triage** — Discovery, reproduction, `afl-tmin` minimization, symbolization (CLN + LND), fingerprinting, dedup store
- [ ] **Milestone 6: Crash Triage (LDK + Eclair)** — Symbolization for remaining targets, report bundle generation, triage integration tests
- [ ] **Milestone 7: Coverage** — `smitebot coverage` with snapshot storage and LND diff parsing
- [ ] **Milestone 8: Polish** — End-to-end demo across all four targets, operator documentation, UX cleanup
- [ ] **Milestone 9+ (Optional):** Daemon mode with scheduled campaigns and crash alerts; per-target coverage diff for CLN/LDK/Eclair formats

If interested in a more deeper overview of Smitebot, you can checkout my proposal [here](https://docs.google.com/document/d/1aQCQq4jqdyx1dLqhPcBw-jN8FwgndsZNm1eoI8XQ-U4/edit?usp=sharing)

I'm happy to discuss any part of the design before work starts. Would love to hear your thoughts and review my plan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smitebot: Automated Fuzzing Campaign Manager #70

Problem

Solution

Architecture

Core Concepts

Campaign State

Commands

Testing Strategy

Implementation Plan

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Smitebot: Automated Fuzzing Campaign Manager #70

Description

Problem

Solution

Architecture

Core Concepts

Campaign State

Commands

Testing Strategy

Implementation Plan

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions