[CHAOSPLT-984][FUZZ] Setup fuzzer to use internal infra by edznux-dd · Pull Request #436 · DataDog/libddwaf

edznux-dd · 2025-07-31T10:47:41Z

What

This PRs updates some of the fuzzers (and adds one new fuzzer called e2e, which aims to improve the global fuzzer) and enable them to run in our internal CI.

Wrapping with AFL++

In order to benefit from much improved fuzzing capabilities, i've added a macro-based wrapper from libfuzzer to AFL++. This, with minimal change in the codebase, allow to use both AFL++ and libfuzzer if necessary. (or any other fuzzer using the "libfuzzer interface").

This is mostly the only thing needed for wrapping with afl++

AFL_FUZZ_TARGET_WITH_INIT(name, LLVMFuzzerTestOneInput, LLVMFuzzerInitialize)

CI jobs

We now have 2 differents CI systems:

github workflow, just like before, are now running the fuzzer for 60 seconds on every push.
gitlab CI, runs on our internal runner, simply build the binaries and then upload them to our fuzzing infrastructure. This allows us to reach to our internal API and trigger new runs.

Github

I've slightly modified the build process in github actions: it builds everything first, then start all fuzzers.
Because of a small tweak in the sql_tokenizer fuzzer (adding the selection of the sql flavor directly in the input), I have removed a few duplicated runs of the the corresponding fuzzer.

Gitlab

The gitlab CI is only there to build the binaries. We run the fuzzer, on our own, dedicated, infrastructure. The CI setup allows us to build on the master branch every day, and then start a "long lasting" (1h for now, could be changed) fuzzer campaign every day.

Corpus management

I believe the current in-repo corpus management is not optimal for short fuzz in CI. The high number of items in the corpus makes it not human friendly, but it is not "fuzzer friendly" either, as it contains a lot of duplicated "code path".
In a subsequent PR, i'll push a large minimification of the inputs, and remove the .gitignore files. This will reduce the number of inputs while dramatically increasing the number of code path covered.

If you are interested to download API's inputs, we have fuzzydog input get libddwaf-e2e <id> command to pull inputs from our internal API (WIP for zip file).

Bug reporting

Our internal infra is integrated with Datadog's Error tracking, logs, metrics and WF automation. If there's a bug to report, it'll run a workflow that enriches the report, propose a fix and sends all that to your slack channel.

Note on the `global` fuzzer

The global fuzzer, uses a randomized configuration, which makes the corpus impossible to reuse.
The e2e fuzzer solves this gap by hardcoding a config. The e2e fuzzer reaches 50%+ of the codebase, which isn't fantastic, but considering a few "unneeded coverage" (in tests, debug mode and such) it isn't too bad.

I've added a point in the next steps to fix the randomized config from that global fuzzer.

UBSan fixes

I've fixed a couple UBSAN trigger in the previous fuzz harness. They were triggered by the following snippet:

    const auto param_size = *reinterpret_cast<const std::size_t *>(data);

Replacing it with a memcpy fixes the issues

    std::size_t param_size;
    std::memcpy(&param_size, data, sizeof(std::size_t));

Next steps

I've spent a bunch of time trying to get this repository onboarded to our fuzzing infra, improving code coverage, fixing bugs in our C++ support, but haven't reached a "final state" yet. So, there's a few things that I'd like to go back to in the near future (JIRA card link to be added):

Use a base image in CI instead of rebuilding AFL++ every time. It's slow and costly for no reason.
Changing the bug report channel back to your team's own. (this require some improvements in our crash reporter)
Add a small script utility to pull the list of inputs from our infrastructure, and push them (minified) into the repository.
Fix the global fuzzer to not depend on randomized ruleset, which kills the coverage guidance efficiency across runs.
Generate "complete" (all settings) config that is able to be used by both the global and e2e fuzzers. Generating one that is both easy to maintain, exhaustive, and "optimized" in size (to avoid the very large cost of the initialization of the WAF) is non trivial.
Add a map of "fuzzer => duration" to allow for daily Xh fuzzer for e2e and global fuzzer, but keep the local fuzzer short lived, as their state space is relatively small. Currently all will run for 1h every day.

codecov-commenter · 2025-07-31T11:01:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.23%. Comparing base (b9b159e) to head (ebfb278).
⚠️ Report is 26 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #436   +/-   ##
=======================================
  Coverage   85.23%   85.23%           
=======================================
  Files         186      186           
  Lines        9387     9387           
  Branches     4170     4170           
=======================================
  Hits         8001     8001           
  Misses        552      552           
  Partials      834      834

Flag	Coverage Δ
waf_test	`85.23% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-07-31T12:40:21Z

Artifact Size Comparison 📦

Artifact	Previous Release	This PR	Difference
darwin-arm64::libddwaf.a	89560336	90005160	0.00%
darwin-arm64::libddwaf.a.stripped	4545616	4579568	0.00%
darwin-arm64::libddwaf.dylib	1960784	1963024	0.00%
darwin-universal::libddwaf.a	180532808	181418872	0.00%
darwin-universal::libddwaf.a.stripped	9690824	9765392	0.00%
darwin-universal::libddwaf.dylib	4139856	4142096	0.00%
darwin-x86_64::libddwaf.a	90972424	91413664	0.00%
darwin-x86_64::libddwaf.a.stripped	5145160	5185776	0.00%
darwin-x86_64::libddwaf.dylib	2147776	2150896	0.00%
linux-aarch64::libddwaf.a	72647358	73157156	0.00%
linux-aarch64::libddwaf.a.stripped	11778162	11850392	0.00%
linux-aarch64::libddwaf.so	2453368	2463248	0.00%
linux-armv7::libddwaf.a	64279684	64716944	0.00%
linux-armv7::libddwaf.a.stripped	10783356	10851172	0.00%
linux-armv7::libddwaf.so	2138996	2148636	0.00%
linux-i386::libddwaf.a	62427430	62841140	0.00%
linux-i386::libddwaf.a.stripped	9321362	9378668	0.00%
linux-i386::libddwaf.so	2382908	2392756	0.00%
linux-x86_64::libddwaf.a	73129494	73643724	0.00%
linux-x86_64::libddwaf.a.stripped	11599138	11669544	0.00%
linux-x86_64::libddwaf.so	2649632	2660048	0.00%
windows-arm64::ddwaf.dll	4769280	4788224	0.00%
windows-arm64::ddwaf.lib	11698	11698	0.00%
windows-arm64::ddwaf_static.lib	57431616	57913412	0.00%
windows-win32::ddwaf.dll	3356160	3368448	0.00%
windows-win32::ddwaf.lib	11922	11922	0.00%
windows-win32::ddwaf_static.lib	49113786	49534220	0.00%
windows-x64::ddwaf.dll	4088832	4101632	0.00%
windows-x64::ddwaf.lib	11698	11698	0.00%
windows-x64::ddwaf_static.lib	56948354	57409498	0.00%
windows-x86_64::libddwaf.a	6464848	6539844	0.01%
windows-x86_64::libddwaf.dll	18860246	18890803	0.00%
windows-x86_64::libddwaf.dll.a	31948	31948	0.00%

The manual trigger is useful for manual testing

pr-commenter · 2025-08-08T09:15:24Z

Benchmarks clang

Benchmark execution time: 2025-09-30 09:52:21

Comparing candidate commit ebfb278 in PR branch edouard/new-fuzzer-setup with baseline commit b9b159e in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

pr-commenter · 2025-08-08T09:16:05Z

Benchmarks gcc

Benchmark execution time: 2025-09-30 09:51:17

Comparing candidate commit ebfb278 in PR branch edouard/new-fuzzer-setup with baseline commit b9b159e in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

pr-commenter · 2025-08-08T09:29:27Z

Benchmarks clang-pgo

Benchmark execution time: 2025-09-30 10:06:40

Comparing candidate commit ebfb278 in PR branch edouard/new-fuzzer-setup with baseline commit b9b159e in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

Anilm3

First batch, there's a lot to go through so expect a few...

Some files should be removed from the PR:

default.profraw
fuzzer/e2e/corpus/f6523aa50a5372e0f6916c334d31d4ce6e73d520
fuzzer/global/corpus/3f786850e387550fdab836ed7e6dc881de23001b

…f into edouard/new-fuzzer-setup

Anilm3

I'm struggling a bit at the moment as AFL refuses to build with LTO, I'll continue once I manage to get past this hurdle.

Anilm3 · 2025-09-08T07:54:50Z

@codex review

chatgpt-codex-connector

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

chatgpt-codex-connector · 2025-09-08T08:01:01Z

+// Main macro that implements the correct AFL++ persistent mode pattern
+#define AFL_FUZZ_TARGET(name, fuzz_func)                                                           \
+    int main(int argc, char **argv)                                                                \
+    {                                                                                              \
+        /* Handle command line arguments for standalone mode */                                    \
+        if (argc > 1) {                                                                            \
+            return ddwaf_afl::run_standalone(name, fuzz_func, argc, argv);                         \
+        }                                                                                          \
+                                                                                                   \
+        /* AFL++ persistent mode loop - must be in main function */                                \
+        /* This runs up to AFL_LOOP_ITERATIONS iterations per process for better performance */    \
+        while (__AFL_LOOP(AFL_LOOP_ITERATIONS)) {                                                  \
+            if (!ddwaf_afl::run_afl_iteration(fuzz_func)) {                                        \


[P0] Include AFL runtime so __AFL_LOOP resolves

The new afl_wrapper.hpp macros call __AFL_LOOP in the generated main function, but the header doesn’t include the AFL++ definitions (afl/afl-fuzz.h) nor provide any fallback. When the fuzzers are compiled this identifier is undefined, so every target fails at compile time before any fuzzing can run. The wrapper should include the AFL++ header or provide its own declaration for __AFL_LOOP (and other AFL symbols) to keep the build usable.

Useful? React with 👍 / 👎.

Anilm3

Last batch of comments, I think after that it should be ready :-)

Anilm3 · 2025-09-30T09:20:05Z

+}
+
+// Utility to split input data into multiple parts (useful for complex fuzzers)
+class InputSplitter {


nit: Please use snake case.

Anilm3 · 2025-09-30T09:21:09Z

@@ -0,0 +1,61 @@
+#!/bin/bash


Any concern with moving fuzzer/docker into docker/libddwaf/fuzzer?

Anilm3 · 2025-09-30T09:22:37Z

+        with:
+          name: afl-binaries
+          path: /tmp/afl-package/
+          retention-days: 1


Any chance we could use the fuzzer docker image instead?

Anilm3 · 2025-09-30T09:22:49Z

+    git clone --recursive https://github.com/airbus-seclab/afl-cov-fast.git /opt/afl-cov-fast
+    cd /opt/afl-cov-fast
+    git checkout 7a96b578bb227e874bf75f8cb759e8ac2b180453
+    pip3 install -r requirements.txt


Same here on using the docker image?

Anilm3 · 2025-09-30T09:27:55Z

 }

-extern "C" int LLVMFuzzerTestOneInput(const uint8_t *bytes, size_t size)
+const std::vector<std::string_view> dialects = {


Suggested change

const std::vector<std::string_view> dialects = {

const std::array<std::string_view, 8> dialects = {

edznux-dd force-pushed the edouard/new-fuzzer-setup branch 2 times, most recently from 8a5ba98 to 2b75fe2 Compare August 7, 2025 18:15

Update fuzzers, add support for internal fuzzing infra

5a78dc2

edznux-dd force-pushed the edouard/new-fuzzer-setup branch from 71c4b9a to 5a78dc2 Compare August 8, 2025 08:31

Run on schedule + merge and manually if the user want

db6dafe

The manual trigger is useful for manual testing

edznux-dd marked this pull request as ready for review August 8, 2025 08:52

edznux-dd requested a review from a team as a code owner August 8, 2025 08:52

Merge branch 'master' into edouard/new-fuzzer-setup

f9f6266

Anilm3 reviewed Aug 15, 2025

View reviewed changes

edznux-dd added 4 commits August 22, 2025 17:48

WIP PR feedbacks

5dd8308

Merge branch 'edouard/new-fuzzer-setup' of github.com:DataDog/libddwa…

25d54d7

…f into edouard/new-fuzzer-setup

move cmake upward

ac08e78

WIP, fixing CI steps

3a3863b

Anilm3 reviewed Sep 3, 2025

View reviewed changes

Comment thread fuzzer/sql_tokenizer/src/main.cpp Outdated

Comment thread fuzzer/sqli_detector/src/main.cpp Outdated

Comment thread fuzzer/shi_detector_array/src/main.cpp Outdated

Comment thread fuzzer/shi_detector_array/src/main.cpp Outdated

Comment thread fuzzer/global/scripts/build_corpus.py

PR comments

18ab95b

chatgpt-codex-connector Bot reviewed Sep 8, 2025

View reviewed changes

Anilm3 added 3 commits September 8, 2025 09:15

Fix format

de4da91

Merge branch 'master' into edouard/new-fuzzer-setup

1ea40da

Merge branch 'master' into edouard/new-fuzzer-setup

ce07f6a

Anilm3 requested changes Sep 30, 2025

View reviewed changes

Merge branch 'master' into edouard/new-fuzzer-setup

ebfb278

edznux-dd closed this Oct 3, 2025

edznux-dd mentioned this pull request Oct 3, 2025

[CHAOSPLT-984] Add fuzzer, v2 #465

Merged

Anilm3 deleted the edouard/new-fuzzer-setup branch October 28, 2025 09:38

Anilm3 restored the edouard/new-fuzzer-setup branch October 28, 2025 09:38

	const std::vector<std::string_view> dialects = {
	const std::array<std::string_view, 8> dialects = {

Conversation

edznux-dd commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Wrapping with AFL++

CI jobs

Github

Gitlab

Corpus management

Bug reporting

Note on the global fuzzer

UBSan fixes

Next steps

Uh oh!

codecov-commenter commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Artifact Size Comparison 📦

Uh oh!

pr-commenter Bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks clang

Uh oh!

pr-commenter Bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks gcc

Uh oh!

pr-commenter Bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks clang-pgo

Uh oh!

Anilm3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Anilm3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Anilm3 commented Sep 8, 2025

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Anilm3 left a comment

Choose a reason for hiding this comment

Uh oh!

Anilm3 Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Anilm3 Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Anilm3 Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Anilm3 Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

edznux-dd commented Jul 31, 2025 •

edited

Loading

Note on the `global` fuzzer

codecov-commenter commented Jul 31, 2025 •

edited

Loading

github-actions Bot commented Jul 31, 2025 •

edited

Loading

pr-commenter Bot commented Aug 8, 2025 •

edited

Loading

pr-commenter Bot commented Aug 8, 2025 •

edited

Loading

pr-commenter Bot commented Aug 8, 2025 •

edited

Loading

Anilm3 Sep 30, 2025 •

edited

Loading