[CHAOSPLT-984][FUZZ] Setup fuzzer to use internal infra#436
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #436 +/- ##
=======================================
Coverage 85.23% 85.23%
=======================================
Files 186 186
Lines 9387 9387
Branches 4170 4170
=======================================
Hits 8001 8001
Misses 552 552
Partials 834 834
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Artifact Size Comparison 📦
|
8a5ba98 to
2b75fe2
Compare
71c4b9a to
5a78dc2
Compare
The manual trigger is useful for manual testing
Benchmarks clangBenchmark execution time: 2025-09-30 09:52:21 Comparing candidate commit ebfb278 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics. |
Benchmarks gccBenchmark execution time: 2025-09-30 09:51:17 Comparing candidate commit ebfb278 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics. |
Benchmarks clang-pgoBenchmark execution time: 2025-09-30 10:06:40 Comparing candidate commit ebfb278 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics. |
Anilm3
left a comment
There was a problem hiding this comment.
First batch, there's a lot to go through so expect a few...
Some files should be removed from the PR:
- default.profraw
- fuzzer/e2e/corpus/f6523aa50a5372e0f6916c334d31d4ce6e73d520
- fuzzer/global/corpus/3f786850e387550fdab836ed7e6dc881de23001b
…f into edouard/new-fuzzer-setup
Anilm3
left a comment
There was a problem hiding this comment.
I'm struggling a bit at the moment as AFL refuses to build with LTO, I'll continue once I manage to get past this hurdle.
|
@codex review |
There was a problem hiding this comment.
Codex Review: Here are some suggestions.
Reply with @codex fix comments to fix any unresolved comments.
About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".
| // Main macro that implements the correct AFL++ persistent mode pattern | ||
| #define AFL_FUZZ_TARGET(name, fuzz_func) \ | ||
| int main(int argc, char **argv) \ | ||
| { \ | ||
| /* Handle command line arguments for standalone mode */ \ | ||
| if (argc > 1) { \ | ||
| return ddwaf_afl::run_standalone(name, fuzz_func, argc, argv); \ | ||
| } \ | ||
| \ | ||
| /* AFL++ persistent mode loop - must be in main function */ \ | ||
| /* This runs up to AFL_LOOP_ITERATIONS iterations per process for better performance */ \ | ||
| while (__AFL_LOOP(AFL_LOOP_ITERATIONS)) { \ | ||
| if (!ddwaf_afl::run_afl_iteration(fuzz_func)) { \ |
There was a problem hiding this comment.
[P0] Include AFL runtime so __AFL_LOOP resolves
The new afl_wrapper.hpp macros call __AFL_LOOP in the generated main function, but the header doesn’t include the AFL++ definitions (afl/afl-fuzz.h) nor provide any fallback. When the fuzzers are compiled this identifier is undefined, so every target fails at compile time before any fuzzing can run. The wrapper should include the AFL++ header or provide its own declaration for __AFL_LOOP (and other AFL symbols) to keep the build usable.
Useful? React with 👍 / 👎.
Anilm3
left a comment
There was a problem hiding this comment.
Last batch of comments, I think after that it should be ready :-)
| } | ||
|
|
||
| // Utility to split input data into multiple parts (useful for complex fuzzers) | ||
| class InputSplitter { |
There was a problem hiding this comment.
nit: Please use snake case.
| @@ -0,0 +1,61 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Any concern with moving fuzzer/docker into docker/libddwaf/fuzzer?
| with: | ||
| name: afl-binaries | ||
| path: /tmp/afl-package/ | ||
| retention-days: 1 |
There was a problem hiding this comment.
Any chance we could use the fuzzer docker image instead?
| git clone --recursive https://github.com/airbus-seclab/afl-cov-fast.git /opt/afl-cov-fast | ||
| cd /opt/afl-cov-fast | ||
| git checkout 7a96b578bb227e874bf75f8cb759e8ac2b180453 | ||
| pip3 install -r requirements.txt |
There was a problem hiding this comment.
Same here on using the docker image?
| } | ||
|
|
||
| extern "C" int LLVMFuzzerTestOneInput(const uint8_t *bytes, size_t size) | ||
| const std::vector<std::string_view> dialects = { |
There was a problem hiding this comment.
| const std::vector<std::string_view> dialects = { | |
| const std::array<std::string_view, 8> dialects = { |
What
This PRs updates some of the fuzzers (and adds one new fuzzer called
e2e, which aims to improve theglobalfuzzer) and enable them to run in our internal CI.Wrapping with AFL++
In order to benefit from much improved fuzzing capabilities, i've added a macro-based wrapper from libfuzzer to AFL++. This, with minimal change in the codebase, allow to use both AFL++ and libfuzzer if necessary. (or any other fuzzer using the "libfuzzer interface").
This is mostly the only thing needed for wrapping with
afl++AFL_FUZZ_TARGET_WITH_INIT(name, LLVMFuzzerTestOneInput, LLVMFuzzerInitialize)CI jobs
We now have 2 differents CI systems:
Github
I've slightly modified the build process in github actions: it builds everything first, then start all fuzzers.
Because of a small tweak in the
sql_tokenizerfuzzer (adding the selection of the sql flavor directly in the input), I have removed a few duplicated runs of the the corresponding fuzzer.Gitlab
The gitlab CI is only there to build the binaries. We run the fuzzer, on our own, dedicated, infrastructure. The CI setup allows us to build on the
masterbranch every day, and then start a "long lasting" (1h for now, could be changed) fuzzer campaign every day.Corpus management
I believe the current in-repo corpus management is not optimal for short fuzz in CI. The high number of items in the corpus makes it not human friendly, but it is not "fuzzer friendly" either, as it contains a lot of duplicated "code path".
In a subsequent PR, i'll push a large minimification of the inputs, and remove the
.gitignorefiles. This will reduce the number of inputs while dramatically increasing the number of code path covered.If you are interested to download API's inputs, we have
fuzzydog input get libddwaf-e2e <id>command to pull inputs from our internal API (WIP for zip file).Bug reporting
Our internal infra is integrated with Datadog's Error tracking, logs, metrics and WF automation. If there's a bug to report, it'll run a workflow that enriches the report, propose a fix and sends all that to your slack channel.
Note on the
globalfuzzerThe
globalfuzzer, uses a randomized configuration, which makes the corpus impossible to reuse.The
e2efuzzer solves this gap by hardcoding a config. Thee2efuzzer reaches 50%+ of the codebase, which isn't fantastic, but considering a few "unneeded coverage" (in tests, debug mode and such) it isn't too bad.I've added a point in the next steps to fix the randomized config from that global
fuzzer.UBSan fixes
I've fixed a couple
UBSANtrigger in the previous fuzz harness. They were triggered by the following snippet:Replacing it with a memcpy fixes the issues
Next steps
I've spent a bunch of time trying to get this repository onboarded to our fuzzing infra, improving code coverage, fixing bugs in our C++ support, but haven't reached a "final state" yet. So, there's a few things that I'd like to go back to in the near future (JIRA card link to be added):
globalfuzzer to not depend on randomized ruleset, which kills the coverage guidance efficiency across runs.e2eandglobalfuzzer, but keep the local fuzzer short lived, as their state space is relatively small. Currently all will run for 1h every day.