smite-ir: add program minimizers and wire them into AFL++ custom trim #54
smite-ir: add program minimizers and wire them into AFL++ custom trim #54erickcestari wants to merge 6 commits into
Conversation
|
Maybe it's a good idea to draft this until #53 gets merged? That way we can avoid any accidental merges or merge conflicts in the meantime. |
Sure, I also need to rebase it. It will probably only be merged after the first milestone is finished. |
morehouse
left a comment
There was a problem hiding this comment.
I think the iterator pattern doesn't fit very will with the duplicate-load and nop minimizers -- both could be fully minimized in a single call, and we wouldn't ever expect the post_trim call to report failure for them. Both could also be easily optimized to linear algorithms.
Since the goal is to actually trim/reduce inputs, I think we should also consider deleting the nops we insert -- currently the nops continue to take up space in the input.
80c2f9c to
6d5eee6
Compare
6d5eee6 to
df29319
Compare
I've refactored almost all of the code. I've ensured that all pipeline trimming occurs once at |
2579ee6 to
872a31c
Compare
morehouse
left a comment
There was a problem hiding this comment.
It would be good to do an e2e test with this to make sure our custom trim actually runs and is useful.
| return 0; | ||
| } | ||
|
|
||
| 1 |
There was a problem hiding this comment.
We may also need to update last_sequence so we can tell if trim is actually being used by AFL.
19a5aaa to
8217cd7
Compare
8217cd7 to
d8be817
Compare
f127082 to
dbeb4b7
Compare
|
I'm starting to second-guess this new e2e test. It's become quite contrived, almost to the point that it will pass by construction. The way we define the custom coverage metric almost guarantees AFL will consider trimming a success. We already have good unit tests that cover a significant portion of what the e2e test covers, so it seems like the e2e test brings in a lot of machinery for the marginal improvement to coverage. What I really want to know is whether the minimizers are useful against a real target. I propose that we drop the e2e test and run a short LDK or LND fuzzing campaign with |
dbeb4b7 to
7dcb760
Compare
`has_side_effects` returns `true` for operations that have I/O side effects (`SendMessage` and `RecvAcceptChannel`) and therefore cannot be dropped by DCE or deduplicated by CSE. Used by both minimizers introduced in the next commit. Also derive `Hash` on `Operation` and `AcceptChannelField` so CSE can key its canonical map on `(operation, canonicalized_inputs)`.
Introduces the `Minimizer` trait (mirroring the `Mutator` trait shape)
and two implementations that shrink an IR program in place:
fn minimize(&self, program: &mut Program) -> bool;
The bool reports whether the program was modified, so callers can
skip an `==` walk over every instruction.
- `DeadCodeEliminator` keeps an instruction if it has side effects or
is referenced by a later kept instruction. A reverse pass marks
liveness; a forward pass consumes the program and rewrites the
surviving instructions' inputs to their new indices.
- `CommonSubexpressionEliminator` merges instructions that compute
the same expression. A single forward pass canonicalizes inputs as
it goes and dedupes via a `HashMap` keyed on
`(operation, canonicalized_inputs)`. SSA guarantees inputs are
already canonicalized by the time we reach each instruction, so
the merge is transitive: two compute ops whose inputs collapsed to
the same canonical loads are themselves recognized as equivalent.
Both transforms are safe in IR semantics (don't change observable
behaviour modulo `SendMessage`/`RecvAcceptChannel` side-effects), so
they don't take an oracle.
Wires the `DeadCodeEliminator` and `CommonSubexpressionEliminator` minimizers into AFL++'s custom-mutator trim ABI as a single composed pass. Both are deterministic in-process transforms safe in IR semantics, so we run them once during `afl_custom_init_trim`, serialize the result into `out_buf`, and offer it to AFL as a single candidate. `afl_custom_init_trim` returns `1` if either minimizer reports a change (or `0` if both no-op'd; AFL skips trim entirely). `afl_custom_trim` hands back the pre-serialized buffer. `afl_custom_post_trim` returns `1` unconditionally to terminate AFL's `while (stage_cur < stage_max)` loop after the single iteration. AFL itself decides whether to persist the trimmed bytes based on its coverage-cksum check; we don't need to track partial state across iterations because there's only one.
Minimal AFL++ harness binary that decodes a postcard-encoded `Program`, validates it, and publishes coverage manually to `__afl_area_ptr`. The e2e test for the custom mutator's trim pipeline drives `afl-fuzz` against this binary. The bitmap must be bit-identical across DCE/CSE-trimmed variants of the same program (so AFL's trim cksum accepts shrunk candidates) yet vary under our mutators (so AFL queues new entries). Any compiler- inserted edge whose hit count tracks `program.instructions.len()` fails the first half: DCE/CSE move the count across AFL's hit-count buckets and the cksum mismatches. `postcard::from_bytes` and `Program::validate` both contain such loops, and rustc doesn't expose a SanitizerCoverage allowlist to exclude them. So the harness is built with `RUSTFLAGS=-Cllvm-args=-sanitizer-coverage-level=0` and publishes coverage manually: for each instruction reachable (via `inputs`) from a side-effect root (`SendMessage`, `RecvAcceptChannel`), mark a slot derived from a content hash of `(operation, hashes of inputs)`. Because the hash folds input content (not indices), DCE renumbering doesn't change it; CSE merges duplicates whose hashes were already equal; `OperationParamMutator` shifts an operation's hash; `InputSwapMutator` rewires an edge and shifts the consumer's hash. This also encodes a broader smite design principle: coverage is driven only by side-effecting work. Pure setup instructions that never feed a Send/Recv produce zero coverage and AFL never queues them. The fuzzing signal lines up with the minimizer's notion of "useful work", the same reachability DCE uses, so trimming can't change coverage. The crate is workspace-excluded so AFL's link-arg insertions don't leak into the rest of the workspace. We deliberately don't use the `afl` crate: its `fuzz!` macro forces persistent + shmem delivery, which hangs during AFL's calibration when SanitizerCoverage is off. The harness calls `__afl_manual_init` and reads stdin instead.
Drives the real `afl-fuzz` binary against the `smite-ir-e2e-test` harness with our cdylib loaded as `AFL_CUSTOM_MUTATOR_LIBRARY`, then asserts every hook we export is actually used in a real fuzzing run. All signals come from AFL's own `AFL_DEBUG=1` output, so the cdylib stays instrumentation-free. Five signals checked: 1. `Found 'afl_custom_<name>'` lines at startup for all six hooks we export (init/fuzz/deinit bundled as `afl_custom_mutator`, plus describe, init_trim, trim, post_trim, splice_optout). 2. Queue filenames carry `smite-ir:<last_sequence>` from `afl_custom_describe`. Both branches of `mutate_stacked` must surface: `fresh` (regenerate) and one of `op-param` / `input-swap` (stacked mutation). 3. `[Custom Trimming] START` lines confirm `afl_custom_init_trim` is invoked. 4. `START: Max 1` confirms the DCE+CSE pipeline shrank at least one input. The seed corpus mixes a DCE-reducible program (dead `LoadAmount` appended) and a CSE-reducible one (duplicate `LoadPrivateKey` injected mid-program) so both minimizer paths can fire. 5. `[Custom Trimming] SUCCESS` confirms AFL persisted at least one trimmed candidate, i.e. the trimmed bytes' coverage cksum matched the original. Verifies DCE+CSE preserve coverage end-to-end -- relies on the harness's DCE/CSE-invariant signal. The harness is built with `RUSTFLAGS=-Cllvm-args=-sanitizer-coverage-level=0` from this test (cargo-afl appends user RUSTFLAGS to its own, and LLVM honors the last `-Cllvm-args=` seen). `AFL_MAP_SIZE` + `AFL_SKIP_BIN_CHECK` are set because sancov is off so `__afl_final_loc` is 0 and AFL wouldn't otherwise know the binary is fuzzable. Marked `#[ignore]` so `cargo test` skips it by default; spawns afl-fuzz for ~30s. Skips cleanly if `cargo-afl` isn't on `PATH`. Working files land in `/tmp/smite-e2e/` so they survive a panic for post-mortem.
Runs the smite-ir-mutator e2e test on PRs and pushes to master that touch the AFL-relevant crates (smite-ir, smite-ir-mutator, smite-ir-e2e-test, workspace manifests, or the workflow itself). Installs `cargo-afl` (cached across runs), then runs the `#[ignore]` test with `--ignored`. Kept as a separate workflow rather than a step in `rust.yml` because the AFL toolchain install + harness build adds several minutes; the fast Rust gate stays fast. On failure, tars `/tmp/smite-e2e/` (seeds, queue, AFL stdout/stderr) and uploads it as an artifact -- AFL queue filenames contain colons, which actions/upload-artifact rejects, so the tarball is required.
7dcb760 to
8611863
Compare
When AFL finds an interesting input it gets handed to us for trimming before going back into the corpus. Keeps things tidy and makes crashes much easier to read.
There's a small
Minimizertrait, same shape asMutator, one method on a unit struct, and two implementations:DeadCodeEliminatorwalks instructions in reverse with a refcount and drops anything that's both unreferenced and removable (i.e. not a network I/O op). The reverse direction lets chains collapse in one pass: once we drop a consumer its producer's count falls to zero and it goes too.CommonSubexpressionEliminatoris keyed on(operation, canonicalized_inputs). Canonicalizing each instruction's inputs through the running remap before the lookup makes the merge transitive: two duplicateLoadAmounts collapse, then theDerivePoints consuming them collapse, and so on.Both transforms are safe in IR semantics; the only ops we can't touch are
SendMessageandRecvAcceptChannel, because they actually do network I/O. OneOperation::has_side_effectspredicate gates both passes (same set, same reason).smite-ir-mutatorwires this into AFL's custom-trim ABI. Because both minimizers are deterministic and run to completion in-process, the shim composes them into a single candidate duringinit_trim, serializes it once, and hands it to AFL as a single round-trip — no phase machine, no per-phase fallback, no iterative feedback loop. If the trim is a no-op (program already minimal)init_trimreturns 0 and AFL skips trim entirely.End-to-end coverage
A new
smite-ir-e2e-testcrate is a minimal AFL++ harness that decodes a postcard-encodedProgram, validates it, and publishes coverage manually to__afl_area_ptr. A new#[ignore]d e2e test undersmite-ir-mutatorspawnsafl-fuzzagainst this binary with our cdylib loaded asAFL_CUSTOM_MUTATOR_LIBRARYand asserts every hook we export is actually used in a real fuzzing run.The bitmap has to be bit-identical across DCE/CSE-trimmed variants of the same program (so AFL's trim cksum accepts the shrunk candidate) yet vary under our mutators (so AFL queues new entries). Any compiler-inserted edge whose hit count tracks
program.instructions.len()fails the first half: DCE/CSE move the count across AFL's hit-count buckets and the cksum mismatches.postcard::from_bytesandProgram::validateboth contain such loops.So the harness is built with
RUSTFLAGS=-Cllvm-args=-sanitizer-coverage-level=0and publishes coverage manually: for each instruction reachable (viainputs) from a side-effect root (SendMessage,RecvAcceptChannel), mark a slot derived from a content hash of(operation, hashes of inputs). Because the hash folds input content (not indices), DCE renumbering doesn't change it; CSE merges duplicates whose hashes were already equal;OperationParamMutatorshifts an operation's hash;InputSwapMutatorrewires an edge and shifts the consumer's hash.This also encodes a broader smite design principle: coverage is driven only by side-effecting work. Pure setup instructions that never feed a Send/Recv produce zero coverage and AFL never queues them. The fuzzing signal lines up with the minimizer's notion of "useful work" so trimming can't change coverage.
The e2e test asserts five signals, all scraped from AFL's own
AFL_DEBUG=1output so the cdylib stays instrumentation-free:Found 'afl_custom_<name>'lines at startup for all six exported hooks.smite-ir:<last_sequence>fromafl_custom_describe, with both branches ofmutate_stackedsurfacing (freshand one ofop-param/input-swap).[Custom Trimming] STARTconfirmsafl_custom_init_trimis invoked.START: Max 1confirms the DCE+CSE pipeline shrank at least one input. The seed corpus mixes a DCE-reducible program (deadLoadAmountappended) and a CSE-reducible one (duplicateLoadPrivateKeyinjected) so both minimizer paths can fire.[Custom Trimming] SUCCESSconfirms AFL persisted at least one trimmed candidate — the trimmed bytes' coverage cksum matched the original. Verifies DCE+CSE preserve coverage end-to-end, not just that we offered a smaller candidate.A
afl-e2e.ymlworkflow runs this test on PRs that touch the AFL-relevant crates.cargo-aflis pinned viaCARGO_AFL_VERSIONand cached on that key, so bumping the env var invalidates the cache and selects the new version.