Exploit Pattern RAG

SCOUT's Exploit Pattern RAG is the AEG-first knowledge layer for curated exploit pattern reuse, not a raw public-PoC retrieval system.

SCOUT does not retrieve raw public PoCs for copy-based exploitation. It can use public PoC metadata as a high-value seed to derive normalized exploit patterns, then adapts those patterns against evidence recovered from the target firmware.

Current architecture

Curated, retrievable cards live under:

data/exploit_references/patterns/<pattern-id>/
  exploit.json
  pattern.md
  poc_sample.py

The runtime loader/retriever/contamination guard lives in src/aiedge/exploit_rag/. exploit_autopoc consumes the package and injects only the top-ranked pattern context into the lab-only PoC prompt.

SCOUT runs also include an exploit_intel stage. It reads stages/cve_scan/cve_matches.json, fetches metadata-only public PoC/advisory intelligence for the top CVEs, and writes AutoPoC-ready seeds to:

stages/exploit_intel/exploit_intel.json
stages/exploit_intel/autopoc_seeds.json

exploit_autopoc loads those seeds as additional candidates, but they remain bounded by the normal exploit profile gate, authorization gate, RAG contamination rules, PoC runner, reproducibility, and false-positive controls.

The current public corpus is intentionally small and curated. Candidate ingestion expands the upstream pool, but AutoPoC still retrieves only promoted pattern cards.

Each promoted card may also carry validation_evidence entries. These record whether a pattern has vulnerable/control pair evidence (synthetic_pair or real_firmware_pair) and keep SCOUT from treating metadata-only pattern reuse as an AEG platform proof.

PoC-in-GitHub seed ingestion

PoC-in-GitHub is valuable because it maps CVEs to public proof-of-concept repositories at scale. SCOUT uses it as an upstream metadata source:

PoC-in-GitHub CVE JSON
  -> unreviewed candidate seed
  -> draft pattern card (`scripts/draft_exploit_pattern_card.py`)
  -> human reviewer / curated extractor
  -> normalized retrievable exploit pattern card
  -> AutoPoC retrieval

The importer deliberately does not clone repositories, execute PoC code, or make raw PoC source retrievable. Candidate JSON can be converted into a non-retrievable draft card, but a human reviewer must promote it into a curated pattern card before AutoPoC can use it.

Seed firmware-relevant candidates with:

# Use the curated firmware/network-appliance CVE seed list.
python scripts/import_poc_in_github_candidates.py --dry-run

# Import one explicit CVE into data/exploit_references/candidates/poc_in_github/.
python scripts/import_poc_in_github_candidates.py --cve CVE-2024-1781

Default seed list:

data/exploit_references/firmware_seed_cves.json

Candidate output:

data/exploit_references/candidates/poc_in_github/cve-*.json

Candidates are enriched by default with Aqua Security vuln-list-update generated NVD metadata from aquasecurity/vuln-list when that metadata is available. The enrichment contributes summary, CWE, CVSS, CPE, and advisory references; it does not make public exploit code retrievable.

For air-gapped or reproducible labs, point SCOUT at a local vuln-list checkout that was populated by vuln-list-update:

export AIEDGE_VULN_LIST_DIR=/opt/vuln-list
python scripts/import_poc_in_github_candidates.py --cve CVE-2023-1389 --vuln-list-dir "$AIEDGE_VULN_LIST_DIR"

Runtime knobs for the integrated stage:

# Disable external-intel enrichment entirely.
export AIEDGE_EXPLOIT_INTEL_ENABLED=0

# Limit the number of cve_scan CVEs enriched per run.
export AIEDGE_EXPLOIT_INTEL_MAX_CVES=12

# Toggle individual sources.
export AIEDGE_EXPLOIT_INTEL_POC_IN_GITHUB=1
export AIEDGE_EXPLOIT_INTEL_VULN_LIST_UPDATE=1

Draft a review artifact from a candidate:

python scripts/draft_exploit_pattern_card.py data/exploit_references/candidates/poc_in_github/cve-2024-1781.json

Draft output:

data/exploit_references/drafts/<pattern-id>/
  exploit.json   # promotion.status=draft_requires_human_review
  pattern.md     # reviewer checklist, no raw PoC source

Promotion contract

A public PoC candidate can become a retrievable SCOUT AEG pattern only after the reviewer extracts target-independent structure:

family, entry channel, bridge channel, trigger model, and sink
source-to-sink reasoning and preconditions
non-destructive verification tactics
preconditions, adaptation rules, and forbidden reuse constraints

Do not promote target-specific endpoints, credentials, target hosts, payload literals, or vendor-specific magic constants as reusable tactics.

Check the current evidence state with:

python scripts/check_exploit_pattern_evidence.py

Use stricter release checks when appropriate:

# Require every curated card to have vulnerable/control pair evidence.
python scripts/check_exploit_pattern_evidence.py --require-all

# Require at least one real firmware known-vulnerable/patched pattern.
python scripts/check_exploit_pattern_evidence.py --require-real-firmware-pair

Record new pair evidence only after both sides have completed SCOUT run directories:

# Dry-run: validate the known-vulnerable run passes and the patched/control run fails closed.
python scripts/record_pattern_pair_evidence.py cgi_param_cmd_injection \
  --kind real_firmware_pair \
  --vulnerable-run-dir aiedge-runs/<known-vulnerable-run> \
  --control-run-dir aiedge-runs/<patched-control-run> \
  --artifact docs/pov/<stable-pair-evidence>.json \
  --vulnerable-firmware-sha256 <sha256> \
  --control-firmware-sha256 <sha256> \
  --cve CVE-YYYY-NNNN

# Apply only after the dry-run evidence JSON is reviewed.
python scripts/record_pattern_pair_evidence.py cgi_param_cmd_injection \
  --kind real_firmware_pair \
  --vulnerable-run-dir aiedge-runs/<known-vulnerable-run> \
  --control-run-dir aiedge-runs/<patched-control-run> \
  --evidence-id <stable-pair-id> \
  --artifact docs/pov/<stable-pair-evidence>.json \
  --vulnerable-firmware-sha256 <sha256> \
  --control-firmware-sha256 <sha256> \
  --cve CVE-YYYY-NNNN \
  --apply

The recorder refuses to count missing control artifacts as evidence and also rejects controls that fail only an FPR/non-dynamic check. At least one dynamic proof check (autopoc_runner_pass, poc_validation_reproducible, or verified_chain_pass) must fail on the patched/control side. For real_firmware_pair, it additionally requires a stable artifact reference, both firmware SHA-256 values, and either a CVE or target-family label.

As of this update, the original generic cards retain synthetic vulnerable/control pair evidence through scripts/run_aeg_synthetic_pair.py, and netgear_passwordrecovered_auth_bypass carries the first real known-vulnerable/patched firmware pair evidence for Netgear R7000 CVE-2017-5521. Release-level AEG claims should continue to cite the stable pair artifact in docs/pov/netgear-r7000-cve-2017-5521_real_pair.json and rerun python scripts/check_exploit_pattern_evidence.py --require-real-firmware-pair.

For platform-level readiness, run the integrated fail-closed audit instead of checking card counters alone:

./scout aeg-readiness --out docs/pov/aeg_platform_readiness.json

That audit ties the pattern-card aggregate to the stable real-firmware pair report, checks SHA/pattern-family binding, and verifies the vulnerable/pass vs patched/dynamic-fail-closed separation captured by the committed evidence.

E2E validation before platform claims

Pattern-card and retriever tests are necessary but insufficient. A SCOUT AEG claim requires a completed lab run that passes the dynamic/FP gate in docs/aeg_e2e_validation.md: AutoPoC runner pass, reproducible poc_validation, verified_chain isolation, run-level FPR ceiling, and no high/critical FP verdict for the AEG finding.

For internal red-team operation, promoted patterns may feed controlled weaponization packages only after the same pair evidence is bound to firmware hashes, target preconditions, private package hashes, cleanup evidence, and scope metadata. See controlled_weaponization_layer.md.

Safety boundary

Allowed:

fetch PoC-in-GitHub JSON metadata
record candidate repo metadata and CVE context
derive target-independent exploit structure during a separate curation step
use promoted patterns as high-level tactics for private, scope-bound controlled weaponization packages after AEG pair evidence passes

Forbidden in the SCOUT ingestion/retrieval path:

cloning public PoC repositories automatically
executing public PoC code
placing raw PoC source in the AutoPoC prompt
copying reference endpoints, credentials, payload literals, or target hosts into generated PoCs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploit Pattern RAG

Current architecture

PoC-in-GitHub seed ingestion

Promotion contract

E2E validation before platform claims

Safety boundary

FilesExpand file tree

exploit-pattern-rag.md

Latest commit

History

exploit-pattern-rag.md

File metadata and controls

Exploit Pattern RAG

Current architecture

PoC-in-GitHub seed ingestion

Promotion contract

E2E validation before platform claims

Safety boundary