Skip to content

[CHAOSPLT-984][FUZZ] Setup fuzzer to use internal infra#436

Closed
edznux-dd wants to merge 12 commits into
masterfrom
edouard/new-fuzzer-setup
Closed

[CHAOSPLT-984][FUZZ] Setup fuzzer to use internal infra#436
edznux-dd wants to merge 12 commits into
masterfrom
edouard/new-fuzzer-setup

Conversation

@edznux-dd
Copy link
Copy Markdown
Collaborator

@edznux-dd edznux-dd commented Jul 31, 2025

What

This PRs updates some of the fuzzers (and adds one new fuzzer called e2e, which aims to improve the global fuzzer) and enable them to run in our internal CI.

Wrapping with AFL++

In order to benefit from much improved fuzzing capabilities, i've added a macro-based wrapper from libfuzzer to AFL++. This, with minimal change in the codebase, allow to use both AFL++ and libfuzzer if necessary. (or any other fuzzer using the "libfuzzer interface").

This is mostly the only thing needed for wrapping with afl++

AFL_FUZZ_TARGET_WITH_INIT(name, LLVMFuzzerTestOneInput, LLVMFuzzerInitialize)

CI jobs

We now have 2 differents CI systems:

  • github workflow, just like before, are now running the fuzzer for 60 seconds on every push.
  • gitlab CI, runs on our internal runner, simply build the binaries and then upload them to our fuzzing infrastructure. This allows us to reach to our internal API and trigger new runs.

Github

I've slightly modified the build process in github actions: it builds everything first, then start all fuzzers.
Because of a small tweak in the sql_tokenizer fuzzer (adding the selection of the sql flavor directly in the input), I have removed a few duplicated runs of the the corresponding fuzzer.

Gitlab

The gitlab CI is only there to build the binaries. We run the fuzzer, on our own, dedicated, infrastructure. The CI setup allows us to build on the master branch every day, and then start a "long lasting" (1h for now, could be changed) fuzzer campaign every day.

Corpus management

I believe the current in-repo corpus management is not optimal for short fuzz in CI. The high number of items in the corpus makes it not human friendly, but it is not "fuzzer friendly" either, as it contains a lot of duplicated "code path".
In a subsequent PR, i'll push a large minimification of the inputs, and remove the .gitignore files. This will reduce the number of inputs while dramatically increasing the number of code path covered.

If you are interested to download API's inputs, we have fuzzydog input get libddwaf-e2e <id> command to pull inputs from our internal API (WIP for zip file).

Bug reporting

Our internal infra is integrated with Datadog's Error tracking, logs, metrics and WF automation. If there's a bug to report, it'll run a workflow that enriches the report, propose a fix and sends all that to your slack channel.

Note on the global fuzzer

The global fuzzer, uses a randomized configuration, which makes the corpus impossible to reuse.
The e2e fuzzer solves this gap by hardcoding a config. The e2e fuzzer reaches 50%+ of the codebase, which isn't fantastic, but considering a few "unneeded coverage" (in tests, debug mode and such) it isn't too bad.

I've added a point in the next steps to fix the randomized config from that global fuzzer.

UBSan fixes

I've fixed a couple UBSAN trigger in the previous fuzz harness. They were triggered by the following snippet:

    const auto param_size = *reinterpret_cast<const std::size_t *>(data);

Replacing it with a memcpy fixes the issues

    std::size_t param_size;
    std::memcpy(&param_size, data, sizeof(std::size_t));

Next steps

I've spent a bunch of time trying to get this repository onboarded to our fuzzing infra, improving code coverage, fixing bugs in our C++ support, but haven't reached a "final state" yet. So, there's a few things that I'd like to go back to in the near future (JIRA card link to be added):

  • Use a base image in CI instead of rebuilding AFL++ every time. It's slow and costly for no reason.
  • Changing the bug report channel back to your team's own. (this require some improvements in our crash reporter)
  • Add a small script utility to pull the list of inputs from our infrastructure, and push them (minified) into the repository.
  • Fix the global fuzzer to not depend on randomized ruleset, which kills the coverage guidance efficiency across runs.
  • Generate "complete" (all settings) config that is able to be used by both the global and e2e fuzzers. Generating one that is both easy to maintain, exhaustive, and "optimized" in size (to avoid the very large cost of the initialization of the WAF) is non trivial.
  • Add a map of "fuzzer => duration" to allow for daily Xh fuzzer for e2e and global fuzzer, but keep the local fuzzer short lived, as their state space is relatively small. Currently all will run for 1h every day.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.23%. Comparing base (b9b159e) to head (ebfb278).
⚠️ Report is 26 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #436   +/-   ##
=======================================
  Coverage   85.23%   85.23%           
=======================================
  Files         186      186           
  Lines        9387     9387           
  Branches     4170     4170           
=======================================
  Hits         8001     8001           
  Misses        552      552           
  Partials      834      834           
Flag Coverage Δ
waf_test 85.23% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jul 31, 2025

Artifact Size Comparison 📦

Artifact Previous Release This PR Difference
darwin-arm64::libddwaf.a 89560336 90005160 0.00%
darwin-arm64::libddwaf.a.stripped 4545616 4579568 0.00%
darwin-arm64::libddwaf.dylib 1960784 1963024 0.00%
darwin-universal::libddwaf.a 180532808 181418872 0.00%
darwin-universal::libddwaf.a.stripped 9690824 9765392 0.00%
darwin-universal::libddwaf.dylib 4139856 4142096 0.00%
darwin-x86_64::libddwaf.a 90972424 91413664 0.00%
darwin-x86_64::libddwaf.a.stripped 5145160 5185776 0.00%
darwin-x86_64::libddwaf.dylib 2147776 2150896 0.00%
linux-aarch64::libddwaf.a 72647358 73157156 0.00%
linux-aarch64::libddwaf.a.stripped 11778162 11850392 0.00%
linux-aarch64::libddwaf.so 2453368 2463248 0.00%
linux-armv7::libddwaf.a 64279684 64716944 0.00%
linux-armv7::libddwaf.a.stripped 10783356 10851172 0.00%
linux-armv7::libddwaf.so 2138996 2148636 0.00%
linux-i386::libddwaf.a 62427430 62841140 0.00%
linux-i386::libddwaf.a.stripped 9321362 9378668 0.00%
linux-i386::libddwaf.so 2382908 2392756 0.00%
linux-x86_64::libddwaf.a 73129494 73643724 0.00%
linux-x86_64::libddwaf.a.stripped 11599138 11669544 0.00%
linux-x86_64::libddwaf.so 2649632 2660048 0.00%
windows-arm64::ddwaf.dll 4769280 4788224 0.00%
windows-arm64::ddwaf.lib 11698 11698 0.00%
windows-arm64::ddwaf_static.lib 57431616 57913412 0.00%
windows-win32::ddwaf.dll 3356160 3368448 0.00%
windows-win32::ddwaf.lib 11922 11922 0.00%
windows-win32::ddwaf_static.lib 49113786 49534220 0.00%
windows-x64::ddwaf.dll 4088832 4101632 0.00%
windows-x64::ddwaf.lib 11698 11698 0.00%
windows-x64::ddwaf_static.lib 56948354 57409498 0.00%
windows-x86_64::libddwaf.a 6464848 6539844 0.01%
windows-x86_64::libddwaf.dll 18860246 18890803 0.00%
windows-x86_64::libddwaf.dll.a 31948 31948 0.00%

@edznux-dd edznux-dd force-pushed the edouard/new-fuzzer-setup branch 2 times, most recently from 8a5ba98 to 2b75fe2 Compare August 7, 2025 18:15
@edznux-dd edznux-dd force-pushed the edouard/new-fuzzer-setup branch from 71c4b9a to 5a78dc2 Compare August 8, 2025 08:31
The manual trigger is useful for manual testing
@edznux-dd edznux-dd marked this pull request as ready for review August 8, 2025 08:52
@edznux-dd edznux-dd requested a review from a team as a code owner August 8, 2025 08:52
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Aug 8, 2025

Benchmarks clang

Benchmark execution time: 2025-09-30 09:52:21

Comparing candidate commit ebfb278 in PR branch edouard/new-fuzzer-setup with baseline commit b9b159e in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Aug 8, 2025

Benchmarks gcc

Benchmark execution time: 2025-09-30 09:51:17

Comparing candidate commit ebfb278 in PR branch edouard/new-fuzzer-setup with baseline commit b9b159e in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Aug 8, 2025

Benchmarks clang-pgo

Benchmark execution time: 2025-09-30 10:06:40

Comparing candidate commit ebfb278 in PR branch edouard/new-fuzzer-setup with baseline commit b9b159e in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

Copy link
Copy Markdown
Collaborator

@Anilm3 Anilm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch, there's a lot to go through so expect a few...

Some files should be removed from the PR:

  • default.profraw
  • fuzzer/e2e/corpus/f6523aa50a5372e0f6916c334d31d4ce6e73d520
  • fuzzer/global/corpus/3f786850e387550fdab836ed7e6dc881de23001b

Comment thread .github/workflows/fuzz.yml Outdated
Comment thread .github/workflows/fuzz.yml Outdated
Comment thread .github/workflows/fuzz.yml Outdated
Comment thread .github/workflows/fuzz.yml Outdated
Comment thread .gitlab/fuzzing.yml Outdated
Comment thread fuzzer/cmake/embed_resources.cmake
Comment thread fuzzer/CMakeLists.txt
Comment thread fuzzer/cmdi_detector/src/main.cpp Outdated
Comment thread fuzzer/sql_tokenizer/src/main.cpp Outdated
Comment thread fuzzer/CMakeLists.txt Outdated
Copy link
Copy Markdown
Collaborator

@Anilm3 Anilm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling a bit at the moment as AFL refuses to build with LTO, I'll continue once I manage to get past this hurdle.

Comment thread fuzzer/sql_tokenizer/src/main.cpp Outdated
Comment thread fuzzer/sqli_detector/src/main.cpp Outdated
Comment thread fuzzer/shi_detector_array/src/main.cpp Outdated
Comment thread fuzzer/shi_detector_array/src/main.cpp Outdated
Comment thread fuzzer/global/scripts/build_corpus.py
@Anilm3
Copy link
Copy Markdown
Collaborator

Anilm3 commented Sep 8, 2025

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines +91 to +103
// Main macro that implements the correct AFL++ persistent mode pattern
#define AFL_FUZZ_TARGET(name, fuzz_func) \
int main(int argc, char **argv) \
{ \
/* Handle command line arguments for standalone mode */ \
if (argc > 1) { \
return ddwaf_afl::run_standalone(name, fuzz_func, argc, argv); \
} \
\
/* AFL++ persistent mode loop - must be in main function */ \
/* This runs up to AFL_LOOP_ITERATIONS iterations per process for better performance */ \
while (__AFL_LOOP(AFL_LOOP_ITERATIONS)) { \
if (!ddwaf_afl::run_afl_iteration(fuzz_func)) { \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P0] Include AFL runtime so __AFL_LOOP resolves

The new afl_wrapper.hpp macros call __AFL_LOOP in the generated main function, but the header doesn’t include the AFL++ definitions (afl/afl-fuzz.h) nor provide any fallback. When the fuzzers are compiled this identifier is undefined, so every target fails at compile time before any fuzzing can run. The wrapper should include the AFL++ header or provide its own declaration for __AFL_LOOP (and other AFL symbols) to keep the build usable.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@Anilm3 Anilm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last batch of comments, I think after that it should be ready :-)

Comment thread fuzzer/common/utils.hpp
}

// Utility to split input data into multiple parts (useful for complex fuzzers)
class InputSplitter {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please use snake case.

Comment thread fuzzer/docker/build.sh
@@ -0,0 +1,61 @@
#!/bin/bash
Copy link
Copy Markdown
Collaborator

@Anilm3 Anilm3 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concern with moving fuzzer/docker into docker/libddwaf/fuzzer?

with:
name: afl-binaries
path: /tmp/afl-package/
retention-days: 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we could use the fuzzer docker image instead?

Comment thread .gitlab/fuzzing.yml
git clone --recursive https://github.com/airbus-seclab/afl-cov-fast.git /opt/afl-cov-fast
cd /opt/afl-cov-fast
git checkout 7a96b578bb227e874bf75f8cb759e8ac2b180453
pip3 install -r requirements.txt
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here on using the docker image?

Comment thread fuzzer/CMakeLists.txt
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *bytes, size_t size)
const std::vector<std::string_view> dialects = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::vector<std::string_view> dialects = {
const std::array<std::string_view, 8> dialects = {

@edznux-dd edznux-dd closed this Oct 3, 2025
@Anilm3 Anilm3 deleted the edouard/new-fuzzer-setup branch October 28, 2025 09:38
@Anilm3 Anilm3 restored the edouard/new-fuzzer-setup branch October 28, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants