Skip to content

dragonfly-clf/audit-to-foundry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audit-to-Foundry Validation Pipeline

This repository is a research artifact for turning natural-language smart contract audit findings into executable Foundry validation tests. Given an audit finding and the corresponding contract project, the pipeline extracts vulnerability-relevant code context, prompts an LLM to generate a Foundry test, executes the generated test, and optionally applies a differential validation stage when the finding can be checked through before/after state observations.

The goal is not to replace human auditors or automate offensive exploitation. The goal is to make audit claims, privileged-risk scenarios, and security assumptions more reproducible, inspectable, and falsifiable through executable artifacts.

Project Highlights

  • Research focus: converting audit-report findings into runnable Foundry validation tests
  • Ethereum relevance: improves reproducibility, reviewability, and falsifiability of audit claims
  • Implemented artifact: code pipeline, curated examples, and runnable generated tests are already included in the repository
  • Evidence included: benchmark-style cases and real audit-report cases are both documented under example/README.md

Why This Matters for Ethereum

Ethereum security work depends heavily on audit reports, but audit findings are usually published as natural-language claims rather than runnable evidence. That makes it difficult to:

  • reproduce a reported issue quickly
  • compare findings across tools and auditors
  • falsify a claim when the described exploit path is incomplete
  • build evaluation datasets for future security research

This project explores a workflow where an audit finding is treated as a hypothesis to be operationalized into code. For Ethereum researchers and reviewers, that means a report can be transformed into something closer to executable evidence rather than remaining only a narrative conclusion.

Core Contributions

  1. Audit finding to executable test generation
    The pipeline converts audit-report excerpts into Foundry .t.sol tests that import and interact with the original target contract.

  2. Static-analysis-assisted context extraction
    Slither-based analysis and custom helpers recover functions, call relationships, public state, and source snippets relevant to the finding.

  3. Dynamic execution, iterative repair, and differential validation
    Generated tests are executed in Foundry, repaired using compiler/runtime feedback, and optionally extended with before/after observations when the vulnerability supports differential checking.

What Is Implemented in This Repo

The current repository contains the following real components:

  • scripts/main.py Main research pipeline entrypoint for preprocessing, generation, execution, and differential validation.
  • scripts/MySlither/ Slither-based code analysis utilities used to recover CFG and call-graph related context.
  • scripts/tools/ Prompt assembly, Foundry project helpers, test correction helpers, statistics utilities, and smaller support scripts.
  • scripts/Preprocess/ Preprocessing utilities for report filtering and metadata extraction. Some of these scripts remain experimental and are not part of the main reproduction path.
  • scripts/example/ Example report and generated-test pairs used in prompting.
  • example/ Curated runnable artifacts for both benchmark-style and audit-report-driven cases.

The repository is intentionally kept as a research codebase under scripts/ rather than repackaged as a production-grade Python library.

Curated Examples And Generated Artifacts

To make the repository easier to review, it also includes a curated artifact gallery under example/README.md. This directory contains ready-to-inspect Foundry projects generated or organized from:

  • SmartBugs benchmark-style cases
  • real audit-report findings

These examples are important for the research story of the repository. They show that the project does not stop at describing a pipeline architecture. It already produces concrete, runnable outputs that a reviewer can inspect directly:

  • vulnerable source projects
  • generated validation tests
  • verification-oriented logs or execution records

For Ethereum Foundation style review, this matters because it strengthens three signals at once:

  • the work addresses a real Ethereum security problem
  • the outputs are concrete rather than aspirational
  • the artifact is inspectable without reconstructing the entire experiment stack first

If you want the fastest route through the repository, start with:

  1. example/smartbugs/reentrance/README.md
  2. example/smartbugs/phishable/README.md
  3. example/audit_reports/equality_1/README.md
  4. example/audit_reports/1-shf_1/README.md

Repository Status

This repository should be read as an open research artifact:

  • appropriate for understanding the pipeline design
  • appropriate for reproducing or extending experiment-oriented workflows
  • appropriate for reviewing the implementation choices behind the artifact
  • not presented as a production-ready tool

Several components still reflect research iteration rather than hardened software packaging. This is intentional and documented rather than hidden.

How To Run

Requirements

  • Python 3.10+
  • Foundry
  • Slither
  • An OpenAI-compatible API endpoint
  • RPC endpoints if fork-based execution is required by the generated tests

Install Python dependencies:

pip install -r requirements.txt

Configure Environment

Copy .env.example into your local environment setup and provide values for the required variables:

  • OPENAI_API_KEY
  • OPENAI_BASE_URL
  • LLM_MODEL
  • RPC_ENDPOINTS_FILE
  • AUDIT_REPORT_DIR
  • CONTRACT_PROJECT_DIR
  • PIPELINE_OUTPUT_DIR

The main script also supports command-line overrides:

python3 scripts/main.py \
  --reports-dir /path/to/audit_reports \
  --contracts-dir /path/to/contract_projects \
  --output-dir /path/to/output \
  --model your-model-name \
  --rpc-file /path/to/rpc_endpoints.json \
  --example no_differential_\$solar_1

Run python3 scripts/main.py --help for the full argument list and default behavior.

Expected Inputs

The current artifact assumes:

  • a directory of audit-report files
  • a directory of target contract projects
  • preprocessing artifacts under scripts/Preprocess/save/
  • example prompt pairs under scripts/example/

The preprocessing utilities in scripts/Preprocess/ help prepare these inputs, but they should be treated as artifact support scripts rather than a polished CLI suite.

Pipeline Overview

At a high level, the implemented workflow is:

  1. Select findings of interest from audit-report inputs.
  2. Extract relevant Solidity code context and public state.
  3. Build prompts for Foundry validation test generation.
  4. Execute generated tests and iteratively repair them with runtime feedback.
  5. Apply differential validation when the finding supports before/after comparison.
  6. Save generated tests, logs, and verification artifacts under the configured output directory.

Example Coverage

The curated examples currently demonstrate two kinds of evidence:

  • benchmark-aligned vulnerability reproduction for canonical bugs such as reentrancy and tx.origin misuse
  • real audit-finding operationalization for report-driven cases such as privileged fee misconfiguration and authorized-user asset movement

This distinction is useful because benchmark cases show the pipeline can handle widely recognized vulnerability patterns, while audit-report cases show the more ambitious research contribution: turning natural-language findings, including owner-controlled or authorized-caller risk scenarios, into executable validation tests.

Limitations

The current artifact has clear limits and those limits are part of the honest evaluation of the repository:

  • It works best for findings that can be reproduced dynamically.
  • Differential validation is not applicable to every vulnerability class.
  • Reproducibility still depends on local environment setup, dataset preparation, and toolchain availability.
  • Some helper scripts remain experimental and are preserved mainly for transparency and extension.
  • Test quality depends on model behavior, prompt quality, contract complexity, and whether the report contains enough operational detail.
  • Some report-driven cases describe privileged-risk or owner-abuse scenarios rather than permissionless exploits; the examples are documented accordingly and should be read with those trust assumptions in mind.

Safety

This repository is intended for:

  • defensive validation of smart contract audit findings
  • benchmarking and reproducibility research
  • audit artifact inspection

It is not intended as:

  • a general-purpose offensive exploitation toolkit
  • a guarantee of real-world exploitability
  • a substitute for manual review by experienced auditors

Any reproduction involving deployed systems, live infrastructure, or forked chain state should be done only with appropriate authorization.

Open-Source Checklist

Before publishing or sharing derived work, review:

  • local path assumptions
  • RPC endpoint files
  • API credentials and .env handling
  • dataset-specific identifiers that should not be made public
  • generated logs and artifacts that may contain environment-specific details

This repository now uses requirements.txt and environment-based configuration rather than requiring inline credential edits in source files.

About

This repository is a research artifact for turning natural-language smart contract audit findings into executable Foundry validation tests.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors