You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running a smite campaign today requires significant manual coordination at every stage. A typical 8-core LND campaign involves:
Constructing AFL++ command lines with correct -M/-S flags for main and secondary runners
Managing tmux sessions and monitoring externally via afl-whatsup
After stopping: walking each runner's queue/ directory and invoking afl-cmin -X manually
Running coverage-report.sh with no baseline or diff against prior runs
Finding crash files with no systematic path to reproduction, minimization, or deduplication
Several of these fail silently — a missing -X flag skips Nyx mode entirely, truncated covcounters.* files under high parallel load produce misleading coverage numbers, a missed queue/ subdirectory silently discards corpus inputs. There is also no institutional memory across campaigns- no reliable way to know whether a new campaign covered more ground than the last, or whether a new crash is genuinely novel.
Solution
Smitebot is a Rust CLI tool that automates the full fuzzing lifecycle — from multi-core AFL++ campaign launch and clean shutdown, to corpus minimization, crash triage with deduplication, and historical coverage diffs. The same 11-step manual workflow becomes three commands: smitebot doctor, smitebot start, smitebot stop.
The design is inspired by what syzbot provides for the Linux kernel — a state-aware orchestration layer that makes fuzzing campaigns reproducible, auditable, and accessible.
Architecture
Smitebot is added as a new crate to the existing workspace. It is an orchestration layer, not a rewrite — existing scripts (coverage-report.sh, symbolize-crash.sh, setup-nyx.sh) encode subtle, validated, target-specific behavior and are called as subprocesses. Smitebot adds state, coordination, and the missing pipeline pieces around them.
New dependencies: clap, toml, serde_json. Reused from workspace: log, thiserror, serde, nix.
Builds directly on PR #32 — ManagedProcess::process_group(0) is the foundation for the stop command.
Core Concepts
Campaign State
This is the foundational piece absent from the current codebase. Before smitebot can reliably stop a campaign, diff a coverage report, or triage crashes in context, it needs persistent knowledge of what is running.
State is written to ~/.smitebot/runs/<campaign-id>/state.json. Key fields: campaign_id, config_hash (SHA-256 of config snapshot), state enum (created/starting/running/stopping/stopped/failed), per-runner pid/pgid, and artifact directory pointers. All writes are atomic (temp file + rename) — a killed smitebot never leaves corrupted state..
Commands
smitebot doctor
Prerequisite checks before any campaign: x86_64 arch, /dev/kvm accessible, Docker daemon reachable, afl-fuzz/afl-cmin/afl-tmin/afl-whatsup on PATH, AFL++ Nyx mode built, libnyx.so on LD_LIBRARY_PATH, VMware backdoor enabled, smite scripts present and executable. --json flag for CI use.
smitebot start / stop / status
TOML config consistent with aflr_cfg_smite_lnd_encrypted_bytes.toml. Each runner spawned via ManagedProcess::spawn() with process_group(0) — pgid recorded to state, cleanup reaches grandchildren. Start rolls back all runners if any fail within 3 seconds of launch.
Nyx campaign stop is process group termination only — SIGTERM → timed wait → SIGKILL via killpg. CLN graceful stop (lightning-cli stop via docker exec) is scoped exclusively to the coverage/replay workflow, where CLN runs as a real long-lived host process and must exit cleanly for profraw files to flush correctly. This matches the existing Drop impl in cln.rs.. Status reads fuzzer_stats from each runner's output directory (execs_done, execs_per_sec, corpus_count, saved_crashes, saved_hangs) and prints a combined summary.
smitebot corpus merge / minimize merge walks $afl_out/*/queue/ via std::fs, collects all inputs except README.txt, copies with normalized names to a merged directory. minimize invokes afl-cmin -X -i <merged> -o <minimized> -- <sharedir>.
Kept as separate commands intentionally: merge is always fast and safe (pure collection, no binary required); minimize requires the target binary and Nyx sharedir and is run deliberately.
smitebot crashes triage
Pipeline per crash input:
Discover — walk $afl_out/*/crashes/ across all runners
Reproduce — docker run in local mode with SMITE_INPUT set, retry ×3, classify Reproducible/Flaky
Fingerprint — SHA-256 over target:scenario:signal:frame_0:frame_1:frame_2
Deduplicate — check against ~/.smitebot/crashes/known.json
Report — per-bug-group directory with raw crash, minimized input, symbolized log, metadata JSON
smitebot coverage
Wraps coverage-report.sh as a subprocess, preserving all existing target-specific behavior. Adds: timestamped snapshots after each run, diff against prior snapshot. Parsing uses machine-readable formats — coverage.txt (LND) — already produced by the existing pipeline via covdata merge; smitebot snapshots this file after each run and computes per-file deltas across snapshots directly.), llvm-cov export --format=text (CLN/LDK), jacoco.csv (Eclair). No HTML scraping.
Testing Strategy
Three layers:
Unit tests — config parsing, state machine transitions, fingerprint hashing, corpus file collection. No external dependencies, runs in CI.
Integration tests — lifecycle commands tested with sleep infinity runners spawned via process_group(0), matching the exact spawn path used for real fuzzers. Synthetic state.json files injected to test reconcile and rollback paths independently. Grandchild absence in /proc asserted after smitebot stop.
Fixture tests — corpus and triage tests run against pre-populated queue/ and crashes/ trees. No live fuzzer required.
CLN graceful stop (docker exec lightning-cli stop) is tested manually against a live container and gated behind --integration so it does not block CI on hosts without Docker.
Implementation Plan
Milestones are structured as vertical slices — each delivers a working system for a narrow set of functionality. After Milestone 2 is complete, most subsequent milestones can be developed in parallel.
Milestone 1: Foundation — Crate scaffolding, TOML config schema, smitebot doctor, CI pipeline (fmt, clippy, test)
Milestone 2: Campaign Lifecycle — smitebot start (single + multi-core), stop (with CLN graceful stop), status with fuzzer_stats aggregation
Milestone 3: Reliability — Stale-state reconcile, rollback on partial start, integration tests for lifecycle commands
Milestone 4: Corpus Management — smitebot corpus merge and smitebot corpus minimize with before/after reporting
Problem
Running a smite campaign today requires significant manual coordination at every stage. A typical 8-core LND campaign involves:
-M/-Sflags for main and secondary runnersafl-whatsupqueue/directory and invokingafl-cmin -Xmanuallycoverage-report.shwith no baseline or diff against prior runsSeveral of these fail silently — a missing
-Xflag skips Nyx mode entirely, truncatedcovcounters.*files under high parallel load produce misleading coverage numbers, a missedqueue/subdirectory silently discards corpus inputs. There is also no institutional memory across campaigns- no reliable way to know whether a new campaign covered more ground than the last, or whether a new crash is genuinely novel.Solution
Smitebot is a Rust CLI tool that automates the full fuzzing lifecycle — from multi-core AFL++ campaign launch and clean shutdown, to corpus minimization, crash triage with deduplication, and historical coverage diffs. The same 11-step manual workflow becomes three commands:
smitebot doctor,smitebot start,smitebot stop.The design is inspired by what syzbot provides for the Linux kernel — a state-aware orchestration layer that makes fuzzing campaigns reproducible, auditable, and accessible.
Architecture
Smitebot is added as a new crate to the existing workspace. It is an orchestration layer, not a rewrite — existing scripts (
coverage-report.sh,symbolize-crash.sh,setup-nyx.sh) encode subtle, validated, target-specific behavior and are called as subprocesses. Smitebot adds state, coordination, and the missing pipeline pieces around them.smitebot/src/main.rs — clap command dispatch
smitebot/src/config.rs — TOML schema and validation
smitebot/src/state.rs — campaign state machine + JSON persistence
smitebot/src/campaign.rs — start / stop / status orchestration
smitebot/src/process.rs — AFL++ process supervision, wraps ManagedProcess
smitebot/src/corpus.rs — merge and minimize
smitebot/src/triage.rs — crash discovery, reproduce, afl-tmin, dedup
smitebot/src/coverage.rs — coverage-report.sh invocation, snapshot, diff
smitebot/src/doctor.rs — prerequisite checks
smitebot/src/docker.rs — image build wrappers
New dependencies:
clap,toml,serde_json. Reused from workspace:log,thiserror,serde,nix.Builds directly on PR #32 —
ManagedProcess::process_group(0)is the foundation for thestopcommand.Core Concepts
Campaign State
This is the foundational piece absent from the current codebase. Before smitebot can reliably stop a campaign, diff a coverage report, or triage crashes in context, it needs persistent knowledge of what is running.
State is written to
~/.smitebot/runs/<campaign-id>/state.json. Key fields:campaign_id,config_hash(SHA-256 of config snapshot),stateenum (created/starting/running/stopping/stopped/failed), per-runnerpid/pgid, and artifact directory pointers. All writes are atomic (temp file + rename) — a killed smitebot never leaves corrupted state..Commands
smitebot doctorPrerequisite checks before any campaign: x86_64 arch,
/dev/kvmaccessible, Docker daemon reachable,afl-fuzz/afl-cmin/afl-tmin/afl-whatsupon PATH, AFL++ Nyx mode built,libnyx.soonLD_LIBRARY_PATH, VMware backdoor enabled, smite scripts present and executable.--jsonflag for CI use.smitebot start / stop / statusTOML config consistent with
aflr_cfg_smite_lnd_encrypted_bytes.toml. Each runner spawned viaManagedProcess::spawn()withprocess_group(0)— pgid recorded to state, cleanup reaches grandchildren. Start rolls back all runners if any fail within 3 seconds of launch.Nyx campaign stop is process group termination only — SIGTERM → timed wait → SIGKILL via killpg. CLN graceful stop (lightning-cli stop via docker exec) is scoped exclusively to the coverage/replay workflow, where CLN runs as a real long-lived host process and must exit cleanly for profraw files to flush correctly. This matches the existing Drop impl in cln.rs.. Status reads
fuzzer_statsfrom each runner's output directory (execs_done,execs_per_sec,corpus_count,saved_crashes,saved_hangs) and prints a combined summary.smitebot corpus merge / minimizemergewalks$afl_out/*/queue/viastd::fs, collects all inputs exceptREADME.txt, copies with normalized names to a merged directory.minimizeinvokesafl-cmin -X -i <merged> -o <minimized> -- <sharedir>.Kept as separate commands intentionally: merge is always fast and safe (pure collection, no binary required); minimize requires the target binary and Nyx sharedir and is run deliberately.
smitebot crashes triagePipeline per crash input:
$afl_out/*/crashes/across all runnersdocker runin local mode withSMITE_INPUTset, retry ×3, classifyReproducible/Flakyafl-tmin -Xfor reproducible crashesllvm-symbolizer; LND goroutine stacks already symbolic; Eclair JVM output already symbolictarget:scenario:signal:frame_0:frame_1:frame_2~/.smitebot/crashes/known.jsonsmitebot coverageWraps
coverage-report.shas a subprocess, preserving all existing target-specific behavior. Adds: timestamped snapshots after each run, diff against prior snapshot. Parsing uses machine-readable formats — coverage.txt (LND) — already produced by the existing pipeline via covdata merge; smitebot snapshots this file after each run and computes per-file deltas across snapshots directly.),llvm-cov export --format=text(CLN/LDK),jacoco.csv(Eclair). No HTML scraping.Testing Strategy
Three layers:
sleep infinityrunners spawned viaprocess_group(0), matching the exact spawn path used for real fuzzers. Syntheticstate.jsonfiles injected to test reconcile and rollback paths independently. Grandchild absence in/procasserted aftersmitebot stop.queue/andcrashes/trees. No live fuzzer required.CLN graceful stop (
docker exec lightning-cli stop) is tested manually against a live container and gated behind--integrationso it does not block CI on hosts without Docker.Implementation Plan
Milestones are structured as vertical slices — each delivers a working system for a narrow set of functionality. After Milestone 2 is complete, most subsequent milestones can be developed in parallel.
smitebot doctor, CI pipeline (fmt,clippy,test)smitebot start(single + multi-core),stop(with CLN graceful stop),statuswithfuzzer_statsaggregationsmitebot corpus mergeandsmitebot corpus minimizewith before/after reportingafl-tminminimization, symbolization (CLN + LND), fingerprinting, dedup storesmitebot coveragewith snapshot storage and LND diff parsingIf interested in a more deeper overview of Smitebot, you can checkout my proposal here
I'm happy to discuss any part of the design before work starts. Would love to hear your thoughts and review my plan.