Note: This repo uses a whitelist-based
.gitignore— everything is ignored by default, and only explicitly allowed files are tracked. If you add new source files or directories you want to commit, you must explicitly allowlist them in.gitignorewith a!rule.
AAU project on runtime software attacks — research, PoC, detection, and evaluation.
The detector is a C-FLAT-style control-flow integrity monitor that runs entirely in userspace. It has two halves:
Offline — tools/build_cfg.py. Disassembles the victim ELF's .text (pyelftools + capstone), recovers basic blocks, direct/conditional-branch edges, and the set of legal indirect-call targets (functions the program calls directly or takes the address of — coarse-grained forward-edge CFI, not "any function entry"), and writes a flat text <victim>.cfg (plus a <victim>.dot call graph for eyeballing).
Online — detector/tracer. Runs the victim under ptrace single-step and validates every taken control-flow transfer inside .text against that CFG, plus a shadow call stack for returns, while folding each transfer's destination into a cumulative non-cryptographic hash (FNV-1a 64-bit) — a path "attestation token" printed at exit.
Flow per traced process:
- Fork +
PTRACE_TRACEME— child execs the victim, parent becomes the tracer; load<victim>.cfg. - One-shot
BRKatmain—PTRACE_CONTpastld.soand__libc_start_main(single-stepping them is prohibitively slow on a Pi, and isn't what we attest). When the BRK fires, the original instruction is restored and the legitimate return-into-libc address is pre-pushed to the shadow stack. - Single-step loop from
main— peek and decode the AArch64 instruction at the current PC before stepping (after a taken branch, post-step PC is the target, not PC+4); then on a non-sequential post-step PC, validate the transfer:bl/blrwhose target leaves.text→ a library call (PLT → libc): plant a one-shotBRKat the return site andPTRACE_CONTover libc (C-FLAT attests the app, not libc), then resume stepping.bl/blrinto.text→ destination must be a legal call target (cfg_is_call_target); push the return site onto the shadow stack.ret→ destination must equal the shadow-stack top.- direct / conditional branch (
b,b.<cc>,cbz/cbnz,tbz/tbnz) taken →(branch_site → destination)must be a known CFG edge. br Xninto.text→ destination must be a known basic-block start or call target (conservative — catches JOP gadget chains and wild jumps).- any violation →
[!!! ATTACK DETECTED] <kind> at 0x… : <reason>to stderr,PTRACE_KILLthe tracee, exit2.
- Clean detach — when
mainreturns and the shadow stack drains, detach and let glibc cleanup run unobserved.
On every exit (clean or aborted) the tracer prints [attestation] cfg-hash = 0x… over the executed transfers, plus a one-line counter summary (steps / calls / libcalls / rets / branches / alerts).
This is the L1 (control-flow) axis only. Because direct ret-overwrites and ROP both forge a ret, the shadow stack catches them; because the CFG model rejects an indirect call/branch to a target the program never legitimately uses, JOP-style chains that never touch a ret are caught too. Provenance of data and bounds of objects are not checked yet — see the roadmap.
What attacks are we able to detect:
- Buffer overflows / code injection —
attacks/01-stack-bof/(caught at the hijackedret) - Return-Oriented Programming —
attacks/02-rop/(3-gadget chain, caught at the first hijackedret) - Jump-Oriented Programming —
attacks/03-jop/(blr-pivot chain, caught at the pivot: not a legal call target) - Function reuse — whole-function gadgets reached via legal CFG edges (the L1 model accepts these)
- Data-only attacks —
attacks/04-data-only/is a PoC of the gap: a surgical 36-byte stack overflow flips an adjacentis_adminflag, so the legitimateif (u.is_admin) admin_panel()branch fires under attacker data. Every transfer is in the CFG and the shadow stack — L1 reports clean. Needs L2 (data provenance) to flag thatis_admin's value came from untrusted bytes. - Non-control-data overflows — same
attacks/04-data-only/PoC viewed from the other side. Alternative defence is L3 (object bounds) — prevent the write pastname[32]in the first place.
The attestation hash printed by the detector does differ between the benign and 04-data-only runs (the sequence of legal edges taken is different); a C-FLAT-style verifier with a known-good baseline would catch the divergence. The current detector emits the hash but doesn't compare against a baseline, which is why 04-data-only is listed as an open gap rather than a caught attack.
Next axes (not yet implemented): L2 data-provenance tracking and L3 object-bounds checking.
This repo uses a Nix flake to provide a reproducible dev shell.
Prerequisites: Nix with flakes enabled. If you don't have Nix, install it via nixos.org:
sh <(curl --proto '=https' --tlsv1.2 -L https://nixos.org/nix/install) --daemonGo into /etc/nix/nix.conf and add the following lines:
experimental-features = nix-command flakes
trusted-users = root <your-username>
After you edited the files, restart the nix daemon:
sudo systemctl restart nix-daemon
nix develop --command $SHELLPress y when prompted with questions about whether would you like to add the cachix pwngdb to trusted, otherwise your nix development environment it is going to build from source code the pwngdb tool (dead slow). Instead, we are going to take advantage of the cache of pwngdb itself, to directly download the binary.
This drops you into a shell with all tools and packages listed in the flake.nix available. Exit with Ctrl+D or exit.
To know if you are correctly entered in the nix shell, run:
echo $IN_NIX_SHELLyou should get: impure (not sure, every not empty result is fine).
The dev shell automatically disables all GCC/linker security features so that binaries compiled inside it are vulnerable by design. No extra flags are needed — just compile normally:
gcc vuln.c -o vuln
checksec vuln # everything should show as disabledDisabled features: stack canary, PIE, NX (executable stack), RELRO, FORTIFY_SOURCE, and control-flow enforcement (CET).
If you have direnv installed, the shell activates automatically when you cd into the repo:
Give direnv permissions:
direnv allowIf you want to remove these permissions:
direnv disallowInside the dev shell, build everything, recover a victim's CFG, and run the tracer manually against an attack:
make build
# recover the static CFG once per victim (writes attacks/01-stack-bof/victim.cfg)
python3 tools/build_cfg.py attacks/01-stack-bof/victim
# benign run: payload is a clean line of text
echo "hello" | ./detector/tracer attacks/01-stack-bof/victim
# attack run: payload is the exploit's stdout
python3 attacks/01-stack-bof/exploit.py | ./detector/tracer attacks/01-stack-bof/victimThe tracer takes the victim path as argv and reads the victim's stdin
from the pipe. It expects <victim>.cfg next to the binary (override
with --cfg PATH); make test regenerates these automatically, but a
manual run needs build_cfg.py first. Exit codes: 0 clean, 1
tracer error, 2 attack detected. On detection it prints
[!!! ATTACK DETECTED] <kind> at 0x… : <reason> to stderr (e.g.
ret at 0x… : expected 0x…, got 0x…, or blr at 0x… : destination 0x… is not a legal call target); on every exit it prints
[attestation] cfg-hash = 0x….
If you are not already in the dev shell, prefix each command with
nix develop -c (e.g. nix develop -c make build).
The repo has a top-level Makefile that delegates to each component
(detector/, every attacks/*/). The test harness exercises every
attack against the current detector and reports pass/fail per case.
# enter the nix shell, if you are not already into
# to check if you are in a nix shell run
# echo $IN_NIX_SHELL
# if retrieves non-empty string, then you are in a nix shell
# otherwise run
# nix develop -c $SHELL
make # build detector + every attack
make test # build (if needed) and run the matrix
make clean # clean every componentThe harness regenerates each <victim>.cfg first (via build_cfg.py,
echoing its function / BB / edge / indirect-call-target counts), then runs
the matrix. Expected make test output:
[harness] [build_cfg] 6 functions, 34 basic blocks, 16 edges, 2 indirect-call targets -> victim.cfg, victim.dot
[harness] [build_cfg] 8 functions, 38 basic blocks, 16 edges, 2 indirect-call targets -> victim.cfg, victim.dot
[harness] [build_cfg] 8 functions, 39 basic blocks, 16 edges, 3 indirect-call targets -> victim.cfg, victim.dot
[ ok ] 01-stack-bof :: benign ( 12ms) exit=0
[ ok ] 01-stack-bof :: attack ( 1089ms) exit=2
[ ok ] 02-rop :: benign ( 36ms) exit=0
[ ok ] 02-rop :: attack ( 1084ms) exit=2
[ ok ] 03-jop :: benign ( 12ms) exit=0
[ ok ] 03-jop :: attack ( 1063ms) exit=2
6 passed, 0 failed
Benign cases must exit 0 with an [attestation] cfg-hash line and no
alert; attack cases must exit 2 with [!!! ATTACK DETECTED].
(Attack runs the detector catches are ~100× slower — single-stepping the victim to the hijack point.) Shell exit code is 0 on success, 1 if any case fails.
- Create
attacks/NN-name/withvictim.c,exploit.py,Makefile(mirror the layout ofattacks/01-stack-bof/). - Append two entries (one
benign, oneattack) to theTESTSlist intools/run_tests.py. make testwill build the new attack, recover its CFG, and run it against the detector automatically.
python3 tools/run_tests.py -v # dump tracer stderr on every case
python3 tools/run_tests.py -k 01-stack-bof # only run cases matching substringtools/run_tests.py is the correctness gate (pass/fail, one run per case).
tools/run_eval.py is the separate measurement harness that produces the
numbers in the report: it repeats each case N times, writes per-run CSVs to
eval/raw/<date>/, and renders a eval/summary-<date>.md with detection
rate, false-positive rate, wall-clock overhead, and time-to-detection.
make build # tracer + victims must exist first
python3 tools/run_eval.py # N=30, date = today, out -> eval/
python3 tools/run_eval.py --n 50 --measure-pwn-startup
python3 tools/run_eval.py --attacks 01-stack-bof,02-rop # subsetFor each attack it measures four conditions — victim alone (no tracer, benign
input), tracer+victim benign, and tracer+victim attack — and answers four
research questions: RQ-1 detection rate, RQ-2 false-positive rate,
RQ-3 overhead ratio wall(tracer+victim) / wall(victim alone), and
RQ-4 time-to-detection. Attack payloads are pre-generated once and reused
across the N runs (the exploits are deterministic, no PIE), so pwntools
startup is excluded from the timing; it is measured separately with
--measure-pwn-startup.
Latest run (eval/summary-2026-05-19.md, N=50, Raspberry Pi 4 / Cortex-A72,
ondemand governor):
| Attack | Detection | FPR | Overhead |
|---|---|---|---|
01-stack-bof |
100% (50/50) | 0% | 2.41x |
02-rop |
100% (50/50) | 0% | 2.65x |
03-jop |
100% (50/50) | 0% | 2.23x |
04-data-only |
0% (expected — L1 gap) | 0% | 3.13x |
The 04-data-only row is a documented gap, not a failure: a non-control-data
overwrite that the L1 control-flow detector cannot observe (see the roadmap).
Benign overhead is single-step ptrace on a short program (~2–3x wall clock,
absolute cost ~10 ms); the attack runs the detector catches finish in well
under 10 ms of tracer time.