SCOUT's Exploit Pattern RAG is the AEG-first knowledge layer for curated exploit pattern reuse, not a raw public-PoC retrieval system.
SCOUT does not retrieve raw public PoCs for copy-based exploitation. It can use public PoC metadata as a high-value seed to derive normalized exploit patterns, then adapts those patterns against evidence recovered from the target firmware.
Curated, retrievable cards live under:
data/exploit_references/patterns/<pattern-id>/
exploit.json
pattern.md
poc_sample.py
The runtime loader/retriever/contamination guard lives in src/aiedge/exploit_rag/. exploit_autopoc consumes the package and injects only the top-ranked pattern context into the lab-only PoC prompt.
SCOUT runs also include an exploit_intel stage. It reads
stages/cve_scan/cve_matches.json, fetches metadata-only public PoC/advisory
intelligence for the top CVEs, and writes AutoPoC-ready seeds to:
stages/exploit_intel/exploit_intel.json
stages/exploit_intel/autopoc_seeds.json
exploit_autopoc loads those seeds as additional candidates, but they remain
bounded by the normal exploit profile gate, authorization gate, RAG
contamination rules, PoC runner, reproducibility, and false-positive controls.
The current public corpus is intentionally small and curated. Candidate ingestion expands the upstream pool, but AutoPoC still retrieves only promoted pattern cards.
Each promoted card may also carry validation_evidence entries. These record
whether a pattern has vulnerable/control pair evidence (synthetic_pair or
real_firmware_pair) and keep SCOUT from treating metadata-only pattern reuse as
an AEG platform proof.
PoC-in-GitHub is valuable because it maps CVEs to public proof-of-concept repositories at scale. SCOUT uses it as an upstream metadata source:
PoC-in-GitHub CVE JSON
-> unreviewed candidate seed
-> draft pattern card (`scripts/draft_exploit_pattern_card.py`)
-> human reviewer / curated extractor
-> normalized retrievable exploit pattern card
-> AutoPoC retrieval
The importer deliberately does not clone repositories, execute PoC code, or make raw PoC source retrievable. Candidate JSON can be converted into a non-retrievable draft card, but a human reviewer must promote it into a curated pattern card before AutoPoC can use it.
Seed firmware-relevant candidates with:
# Use the curated firmware/network-appliance CVE seed list.
python scripts/import_poc_in_github_candidates.py --dry-run
# Import one explicit CVE into data/exploit_references/candidates/poc_in_github/.
python scripts/import_poc_in_github_candidates.py --cve CVE-2024-1781Default seed list:
data/exploit_references/firmware_seed_cves.json
Candidate output:
data/exploit_references/candidates/poc_in_github/cve-*.json
Candidates are enriched by default with Aqua Security
vuln-list-update generated
NVD metadata from aquasecurity/vuln-list
when that metadata is available. The enrichment contributes summary, CWE, CVSS,
CPE, and advisory references; it does not make public exploit code retrievable.
For air-gapped or reproducible labs, point SCOUT at a local vuln-list checkout
that was populated by vuln-list-update:
export AIEDGE_VULN_LIST_DIR=/opt/vuln-list
python scripts/import_poc_in_github_candidates.py --cve CVE-2023-1389 --vuln-list-dir "$AIEDGE_VULN_LIST_DIR"Runtime knobs for the integrated stage:
# Disable external-intel enrichment entirely.
export AIEDGE_EXPLOIT_INTEL_ENABLED=0
# Limit the number of cve_scan CVEs enriched per run.
export AIEDGE_EXPLOIT_INTEL_MAX_CVES=12
# Toggle individual sources.
export AIEDGE_EXPLOIT_INTEL_POC_IN_GITHUB=1
export AIEDGE_EXPLOIT_INTEL_VULN_LIST_UPDATE=1Draft a review artifact from a candidate:
python scripts/draft_exploit_pattern_card.py data/exploit_references/candidates/poc_in_github/cve-2024-1781.jsonDraft output:
data/exploit_references/drafts/<pattern-id>/
exploit.json # promotion.status=draft_requires_human_review
pattern.md # reviewer checklist, no raw PoC source
A public PoC candidate can become a retrievable SCOUT AEG pattern only after the reviewer extracts target-independent structure:
- family, entry channel, bridge channel, trigger model, and sink
- source-to-sink reasoning and preconditions
- non-destructive verification tactics
- preconditions, adaptation rules, and forbidden reuse constraints
Do not promote target-specific endpoints, credentials, target hosts, payload literals, or vendor-specific magic constants as reusable tactics.
Check the current evidence state with:
python scripts/check_exploit_pattern_evidence.pyUse stricter release checks when appropriate:
# Require every curated card to have vulnerable/control pair evidence.
python scripts/check_exploit_pattern_evidence.py --require-all
# Require at least one real firmware known-vulnerable/patched pattern.
python scripts/check_exploit_pattern_evidence.py --require-real-firmware-pairRecord new pair evidence only after both sides have completed SCOUT run directories:
# Dry-run: validate the known-vulnerable run passes and the patched/control run fails closed.
python scripts/record_pattern_pair_evidence.py cgi_param_cmd_injection \
--kind real_firmware_pair \
--vulnerable-run-dir aiedge-runs/<known-vulnerable-run> \
--control-run-dir aiedge-runs/<patched-control-run> \
--artifact docs/pov/<stable-pair-evidence>.json \
--vulnerable-firmware-sha256 <sha256> \
--control-firmware-sha256 <sha256> \
--cve CVE-YYYY-NNNN
# Apply only after the dry-run evidence JSON is reviewed.
python scripts/record_pattern_pair_evidence.py cgi_param_cmd_injection \
--kind real_firmware_pair \
--vulnerable-run-dir aiedge-runs/<known-vulnerable-run> \
--control-run-dir aiedge-runs/<patched-control-run> \
--evidence-id <stable-pair-id> \
--artifact docs/pov/<stable-pair-evidence>.json \
--vulnerable-firmware-sha256 <sha256> \
--control-firmware-sha256 <sha256> \
--cve CVE-YYYY-NNNN \
--applyThe recorder refuses to count missing control artifacts as evidence and also rejects controls that fail only an FPR/non-dynamic check. At least one dynamic proof check (autopoc_runner_pass, poc_validation_reproducible, or verified_chain_pass) must fail on the patched/control side.
For real_firmware_pair, it additionally requires a stable artifact reference, both firmware SHA-256 values, and either a CVE or target-family label.
As of this update, the original generic cards retain synthetic vulnerable/control pair evidence through scripts/run_aeg_synthetic_pair.py, and netgear_passwordrecovered_auth_bypass carries the first real known-vulnerable/patched firmware pair evidence for Netgear R7000 CVE-2017-5521. Release-level AEG claims should continue to cite the stable pair artifact in docs/pov/netgear-r7000-cve-2017-5521_real_pair.json and rerun python scripts/check_exploit_pattern_evidence.py --require-real-firmware-pair.
For platform-level readiness, run the integrated fail-closed audit instead of checking card counters alone:
./scout aeg-readiness --out docs/pov/aeg_platform_readiness.jsonThat audit ties the pattern-card aggregate to the stable real-firmware pair report, checks SHA/pattern-family binding, and verifies the vulnerable/pass vs patched/dynamic-fail-closed separation captured by the committed evidence.
Pattern-card and retriever tests are necessary but insufficient. A SCOUT AEG claim requires a completed lab run that passes the dynamic/FP gate in docs/aeg_e2e_validation.md: AutoPoC runner pass, reproducible poc_validation, verified_chain isolation, run-level FPR ceiling, and no high/critical FP verdict for the AEG finding.
For internal red-team operation, promoted patterns may feed controlled weaponization packages only after the same pair evidence is bound to firmware hashes, target preconditions, private package hashes, cleanup evidence, and scope metadata. See controlled_weaponization_layer.md.
Allowed:
- fetch PoC-in-GitHub JSON metadata
- record candidate repo metadata and CVE context
- derive target-independent exploit structure during a separate curation step
- use promoted patterns as high-level tactics for private, scope-bound controlled weaponization packages after AEG pair evidence passes
Forbidden in the SCOUT ingestion/retrieval path:
- cloning public PoC repositories automatically
- executing public PoC code
- placing raw PoC source in the AutoPoC prompt
- copying reference endpoints, credentials, payload literals, or target hosts into generated PoCs