Releases: R00T-Kim/SCOUT
SCOUT v3.0.0-rc1 — Hybrid Analysis Engine
SCOUT v3.0.0-rc1 — Hybrid Analysis Engine
SCOUT v3.0.0-rc1 marks the transition from a binary-centric firmware scanner to a Hybrid Analysis Engine.
This release adds first-class shell script analysis to the existing firmware analysis pipeline, closing a major blind spot in firmware auditing where high-level script logic previously remained under-analyzed compared to ELF binaries.
Highlights
- Integrated
ScriptAnalyzerinto the main SCOUT pipeline - Expanded analysis coverage from ELF binaries to shell scripts
- Added heuristic detection for insecure
eval, backticks, and unquoted variable usage - Updated inventory logic to recursively collect shell scripts
- Unified script findings with the existing report pipeline
- Preserved report usability by avoiding raw heuristic match bloat
Validation
Tested on TP-Link ER605 firmware.
- Processed 1,334 shell scripts
- Manually reviewed TOP 20 script findings
- Confirmed stable merge into unified
report.json - Verified high-impact findings including
ipsecandacme.shcommand-injection candidates
Design Constraint
SCOUT must process 1,000+ scripts without causing report bloat or analyst fatigue.
Rejected Design
Raw heuristic matches are not stored directly in the main dossier.
Reason:
- High false-positive noise
- Poor signal-to-noise ratio
- Increased report size
- Reduced analyst usability
- Unnecessary performance drag
Confidence
High.
Scope Risk
Broad.
This release changes SCOUT's analysis scope from binary-focused firmware analysis to hybrid binary/script firmware analysis.
Not Tested
- Highly obfuscated custom shell loaders
- Vendor-specific script packing schemes
- Large-scale regression across multiple firmware families
Versioning
- Git tag:
v3.0.0-rc1 - Python package version:
3.0.0rc1
v2.8.0 — Exploit Pattern RAG
v2.8.0 introduces Exploit Pattern RAG, a major upgrade to the AutoPoC generation engine.
Key Features
- Knowledge Base: Centralized repository for structured exploit patterns in
data/exploit_references/. (Standard-compliant JSON metadata). - Scoring Retriever: Multi-axis matching engine selecting best-fit patterns based on candidate-target alignment.
- Adaptation Engine: Prompts LLMs to adapt tactical patterns instead of raw code, reducing hallucination.
- Contamination Guard: Automatically detects and blocks target-specific artifact leaks from references into generated PoCs.
- User-Centric Documentation: Reorganized READMEs to emphasize practical use cases, unique advantages, and quick-start guides.
- Zero-Dependency Mandate: Restored pure Python stdlib compatibility by removing external YAML requirements.
See CHANGELOG.md for full details.
SCOUT v2.7.2 — Phase 2C++ detection engine integrity patch
Closes two follow-ups from the v2.4.0 external review (docs/upgrade_plz.md) that had been partially addressed by v2.4.1 but left cosmetic residues in the tree. No pair-eval scorecard movement expected — v2.7.1's 2/5 PASS remains the figure of record.
Changed
Phase 2C++.1 — DECOMPILED_COLOCATED_CAP = 0.45 promoted to a named constant (9488b8b)
The decompiled_colocated taint method previously hardcoded a 0.50 ceiling inline. confidence_caps.py now exposes a 5-tier cap ladder:
| Cap | Value | Evidence level |
|---|---|---|
SYMBOL_COOCCURRENCE_CAP |
0.40 | Symbols co-occur; no code path confirmed |
DECOMPILED_COLOCATED_CAP |
0.45 (new) | Body-text co-occurrence; inline CALLs visible |
STATIC_CODE_VERIFIED_CAP |
0.55 | Decompiled code inspected + LLM taint trace |
STATIC_ONLY_CAP |
0.60 | Static-reference observation ceiling |
PCODE_VERIFIED_CAP |
0.75 | P-code SSA dataflow confirmed |
Consumer impact: decompiled_colocated traces drop 0.50 → 0.45 (-0.05). ROC thresholds previously pinned at 0.50 should be retuned to 0.45 to preserve pre-v2.7.1 recall on that evidence class. priority_score weights and cve_scan's STATIC_CODE_VERIFIED_CAP=0.55 unchanged.
Fixed
Phase 2C++.2 — legacy addr_diff > 16 residues removed (36ea517)
Commit 3352783 (v2.4.1, 2026-04-11, 41 minutes after the v2.4.0 P-code taint engine landed) replaced the primary CALL-matching path with callee-name resolution, but left two residues:
src/aiedge/ghidra_analysis.py— a standalonetrace_pcode_forward()helper inside_PYGHIDRA_SCRIPTwith its owndiff > 16gate. Never invoked (the inline Strategy 1 loop at line 525-587 has always been the real path).src/aiedge/ghidra_scripts/pcode_taint.py— anelse: addr_diff = abs(...)fallback guarded byif source_api_name:.run()always passessource_api_name=source_api(line 291-294), so the fallback was unreachable at runtime.
Both are now physically removed. _trace_forward_pcode()'s source_api_name parameter is required (no default), formalising the invariant that has held for 13 days. No runtime behaviour change — the production paths have done callee-name matching since v2.4.1.
New guard-rail tests in tests/test_ghidra_dead_code_removed.py pin the removal so grep-based review no longer finds a false-positive match for the offset heuristic.
Why no re-measurement
- Gap B was runtime-effective since v2.4.1.
- Gap C's new ceiling only binds on
decompiled_colocated, which is emitted solely by the pyghidra fallback (ghidra_analysis.py:609). Environments with Ghidra 12 +analyzeHeadlesson PATH exercise the primary script path and rarely hit the fallback. - Gate 1/2/3 FAIL is driven by
findings.py's single-synthesis selection bottleneck — a 0.05 shift on a rarely-emitted method cannot cross those thresholds.
The full rationale, including git-blame trace and the deferred Gap A (interprocedural taint) decision point, is in docs/v2.7.2_release_plan.md.
Verification
pytest -qfull regression greenruff check src/ tests/cleanpyright src/0 errorsscripts/check_doc_consistency.pyOK
Pivot Option D unchanged
v2.7.2 is a half-day hygiene release, not a behavioural pivot. The compliance_report stage and four standard mappings shipped in v2.7.0 are unchanged. Phase 3'.2 CRA Audit SaaS v0.0 internal alpha still starts in 2026-05.
v2.7.1 — Phase 2C+.4 vendor corpus expansion (1/5 → 2/5 PASS)
Phase 2C+.4 vendor corpus expansion — quantitative refinement of v2.7.0's scenario C
v2.7.1 closes Phase 2C+.4 by extending the pair-eval corpus from 7 to 12 vendor/model pairs: D-Link DIR-859 (CVE-2019-17621), D-Link DIR-878 (vendor advisory), ASUS RT-AC68U (CVE-2020-15498), Linksys WRT1900AC v2 (progression), and Linksys EA6700 (progression). Phase 2D' Entry Gate scorecard transitions from 1/5 PASS → 2/5 PASS.
Phase 2D' Entry Gate scorecard (FINAL, WRT1900AC v2 ok measurement of record)
| Gate | Threshold | v2.7.0 (7-pair) | v2.7.1 FINAL (12-pair) | Verdict |
|---|---|---|---|---|
| 1 Recall | ≥ 0.40 | 0.1429 | 0.1667 (+17% rel) | ❌ FAIL |
| 2 Tier variation | ≥ 2 nonzero TP tiers | 1 (symbol_only) |
1 (back-slide) | ❌ FAIL |
| 3 Finding diversity | < 0.50 | 1.000 | 0.917 | ❌ FAIL |
| 4 Dedicated rerun | ≥ 1/N | 14/14 (Codex) | 14/14 + 12/12 (--no-llm) |
✅ PASS |
| 5 Corpus | ≥ 10 | 7 | 12 | ✅ PASS |
The +1 net gain over v2.7.0 (1/5 → 2/5) comes from Gate 5 (Corpus), which clears by manifest registration alone. Gate 1 absolute recall improves but stays well below threshold; the new TP/FP pair (DIR-859 vuln + patched both hit `aiedge.findings.web.exec_sink_overlap`) corroborates the v2.7.0 diagnosis that `findings.py`'s single-synthesis-finding selection is the structural Gate 1/3 limit.
Honest figure-of-record protocol
An intermediate 1st-pass measurement under partial WRT1900AC v2 extractions (1200-second budget) transiently showed Gate 2 PASS due to `aiedge.findings.analysis_incomplete` populating the `unknown` tier. The 2400-second budget rerun ok-state measurement reverts Gate 2 to FAIL — partial-extraction artifacts can falsely populate Gate-2 tiers, so the ok-state measurement is the figure of record. See `docs/v2.7.1_release_plan.md` for the full measurement history and Gate Diagnosis Matrix.
Pivot Option D unchanged
v2.7.1 is a quantitative refinement of v2.7.0's scenario C, not a re-pivot. The compliance-led identity remains primary. The `compliance_report` stage and four standard mappings (CRA Annex I / FDA Section 524B / ISO/SAE 21434 / UN R155) shipped in v2.7.0 are unchanged.
Notable fixes
- `scripts/score_pair_corpus.py` graceful-skip for missing pair runs — no more `StopIteration` crashes when scoring corpus growth or partial-coverage measurements; missing rows are recorded as `vulnerable_status="missing"` / `patched_status="missing"` and excluded from recall/FPR denominators.
Known issues (out of v2.7.x scope)
- DIR-878 partial extraction — SHRS-encrypted inner `.bin` not yet decrypted. `vendor_decrypt.py` extension is a follow-on task.
- Gate 1/2/3 structural limit — `findings.py` single-synthesis-finding selection is the root cause; resolution belongs to the external detection-engine track, not v2.7.x scope.
Full changelog
See `CHANGELOG.md` for the complete `[2.7.1]` section.
SCOUT v2.7.0 — Phase 2C+ close-out + scenario C sealed
Phase 2C+ close-out release. Pivot 2026-04-19 roadmap's detection-strengthening insert is merged (LATTE backward slicing, LARA pattern-based source identification, sink coverage expansion, finding-diversity release gate), with a follow-up ascii_strings wire-through fix that resurrected the inert LARA axis. The compliance-led track ships its Phase 3'.1 suite: four per-standard mappings (CRA Annex I / FDA Section 524B / ISO/SAE 21434 / UN R155) and the new compliance_report pipeline stage.
Reviewer-lane Official Measurement (14/14, Codex LATTE-on, 2026-04-20 13:33 KST)
| Gate | Threshold | Result | Verdict |
|---|---|---|---|
| 1 Detection recall | ≥ 0.40 | 0.1429 (identical to baseline) | FAIL |
| 2 Evidence tier variation | ≥ 2 nonzero TP tiers | 1 | FAIL |
| 3 Finding diversity | < 0.5 | 1.000 (14/14 on single synthesis finding id) | FAIL |
| 4 Dedicated rerun | ≥ 1/N success | 14/14 | PASS |
| 5 Pair corpus size | ≥ 10 | 7 | FAIL |
scripts/release_gate.sh → RELEASE_GOVERNANCE=FAIL. score_pair_corpus.py --pairs benchmarks/pair-eval/pairs.json → {"pairs": 7, "recall": 0.14285714, "fpr": 0.14285714}.
Scenario C Sealing
The 2C+ workstream (LARA source expansion 0 → 21-86 hits/run, LATTE slicing opt-in, sink coverage 28 → 51+, diversity gate enforcement) did not move Gate 1 or Gate 3 because findings.py's primary-finding selection always routes vulnerability evidence through the single synthesis-stage id aiedge.findings.web.exec_sink_overlap. Per the pivot document's scenario C, option D is adopted: Phase 2D' is deferred and SCOUT fully pivots to the compliance-led identity.
Gate-by-gate remediation paths (root cause / required work / track / timeline) are fixed in docs/v2.7.0_release_plan.md "Gate Diagnosis Matrix":
- Gate 1 recall — external track (detection-engine redesign, 6-12mo+)
- Gate 2 tier variation — external track (P-code engine robustness)
- Gate 3 diversity — Phase 3' research track (Option C evidence-level metric redefinition)
- Gate 4 rerun — DONE (becomes the operational backbone for Phase 3'.2 CRA Audit SaaS job queue)
- Gate 5 corpus — v2.7.1 scope (2C+.4 vendor-extraction expansion, 1-2 weeks)
What's New
- LATTE backward slicing (Phase 2C+.1) — opt-in via
AIEDGE_LATTE_SLICING=1, 32 tests - LARA URI/CGI/config-key source identification (Phase 2C+.2) — 50 patterns, with
ascii_stringswire-through fix on D-Link httpd (0 → 10-33 matches per firmware) - Sink coverage 28 → 51+ (Phase 2C+.3) — full CWE taxonomy (78/22/426/732/377/250/269/454)
- Finding diversity gate (Phase 2C+.5) —
PAIR_EVAL_DIVERSITYrelease sub-gate + pair-eval timeout diagnostic - Compliance mapping suite (Phase 3'.1 B-1~B-4) — four standards +
compliance_reportstage (43rd pipeline stage) - Reviewer-lane instrumentation scripts — sequential launcher, watcher handoff, codex/claude LATTE-on launchers
Next Steps (v2.7.1)
- 2C+.4 Vendor extraction chain expansion (DIR-859 / RT-AC68U / WRT1900ACS / DIR-878 + 1) → corpus 7 → 10+, Gate 5 resolved
- 3'.1 step B-5 release tag bundling
- Phase 3'.2 CRA Audit SaaS v0.0 internal alpha kickoff (see
wiki/projects/scout-cra-audit-saas-scope.md)
Full changelog: CHANGELOG.md
🤖 Release prepared with Claude Code
v2.6.1
[2.6.1] — 2026-04-17
Phase 2C close-out release. This point release rolls up the post-v2.6.0 foundation hardening work, publishes the fresh corpus refresh baseline, and documents the semantic / driver caveats that were previously implicit.
Added
- Fresh corpus refresh baseline (
docs/carry_over_benchmark_v2.6.md,benchmark-results/2c6-fresh-full-final/aggregate.json,scripts/aggregate_corpus_metrics.py). The 1,123-target refresh is now published as a best-view aggregate across the fresh rerun waves. Final outcome: 1110 success / 4 partial / 9 fatal; successful runs areextraction=ok 1110/1110,inventory=sufficient 1110/1110,nonzero findings 1110/1110,nonzero CVE 1089/1110. - LLM driver degradation matrix (
docs/llm_driver_degradation_matrix.md). Documents the actual contract differences between Codex CLI, Claude API, Claude Code CLI, and Ollama, especially around system-prompt delivery and temperature handling. - Confidence semantic break note (
docs/confidence_semantic_break_v2.6.md). Makes the v2.5.x → v2.6+ shift explicit:confidenceis now evidence-only;priority_score/priority_inputscarry ranking semantics.
Changed
- README / README.ko baseline messaging. Tier 1 hero numbers now point at the fresh v2.6.1 corpus refresh, while Tier 2 remains explicitly carry-over until the pair-eval lane lands. The over-broad "False negative rate ≈ 0%" phrasing is replaced with a pending pair-eval note.
- Analyst copilot wording. Public docs now split the surface into
Explainability surface,Analyst-in-the-loop channel, andAutonomous reasoning (future)instead of presenting all LLM-related behavior as one undifferentiated capability. - Release governance helper (
scripts/release.sh). The helper is upgraded from a README-only version bumper into a release close-out utility that can synchronize pyproject, README badges, and CHANGELOG headers in dry-run/apply modes.
Fixed
- Synthesis finding reasoning trail inheritance (
findings.py). Top-level synthesis findings such asaiedge.findings.web.exec_sink_overlapnow inherit matched downstream evidence lineage instead of relying only on the stage-level aggregate summary. Matching prefers run-relative binary path, falls back to binary SHA-256, emits afindings/synthesis_matchsummary entry, and appends a deterministic top-K sample of representative downstream trail entries. - SBOM stage silent schema mismatch (
sbom.py). Vendor-stock firmware no longer silently returns 0 components because of staleinventory.file_list/string_hitsassumptions. The stage now walksinventory.rootsdirectly and falls back to direct binary reads via_extract_ascii_runs. - Relative
runs_roothandling increate_run()(run.py).runs_rootis resolved before path derivation so relative output roots still wire absolute firmware paths into extraction; regression coverage lives intests/test_create_run_relative_runs_root.py.
Verification
python3 -m py_compile scripts/aggregate_corpus_metrics.pypython3 scripts/check_doc_consistency.py- fresh corpus aggregate regenerated from
benchmark-results/2c6-fresh-full-v2*waves - representative firmware smoke coverage retained from 2C.1–2C.5 (R7000 lineage / SBOM pilot / verified-chain provenance)
v2.6.0 — Phase 2B: Analyst Copilot + DAG Parallel + Calibration
Phase 2B Release
SCOUT v2.6.0 delivers three axes of change that position it as a single-firmware analyst copilot: performance (DAG parallelization PoC), analyst UX (reasoning trail + MCP override loop), and honest confidence calibration (detection vs priority separation).
Merged via PR #6 (rebase) as 6 atomic commits — any one could ship independently.
1. DAG Parallelization PoC (PR #10)
- New
stage_dag.pywith manualSTAGE_DEPS(42 entries) + Kahntopo_levels()(15 levels, max-width 7) run_stages_parallel()— ThreadPoolExecutor level-wise submit, skip-on-failed-dep,fail_fast=True/Falsemodes. Sequentialrun_stages()unchanged- New CLI flag:
--experimental-parallel [N](default 4 workers) on bothanalyzeandstagessubcommands ProgressTracker(out_of_order=True)for completion-order rendering in parallel mode
# Opt-in parallel execution
./scout analyze firmware.bin --experimental-parallel 42. Reasoning Trail Persistence (PR #11 + PR #13)
New reasoning_trail.py module captures structured ReasoningEntry records for LLM-driven finding adjustments. The adversarial_triage debate loop now records advocate / critic / decision entries with llm_model and 200-char raw_response_excerpt. The fp_verification pattern matcher records sanitizer / non-propagating / sysfile hits with per-pattern delta.
All three analyst surfaces expose the trail:
- Web viewer — collapsible
<details>section with CSS styling - Analyst markdown report — numbered "Reasoning Trail (N steps)" subsection per finding
- TUI finding detail —
render_finding_detail_with_trail()(AIEDGE_TUI_ASCII-compatible)
SARIF properties bag gains scout_reasoning_trail.
3. MCP Analyst Tools (PR #12)
4 new MCP tools for analyst-driven feedback:
| Tool | Purpose |
|---|---|
scout_get_finding_reasoning |
Fetch full reasoning trail for a finding |
scout_inject_hint |
Push an analyst hint into the feedback registry |
scout_override_verdict |
Force-set a finding verdict (confirmed / false_positive / wont_fix / needs_info) |
scout_filter_by_category |
Filter findings by vulnerability / misconfiguration / pipeline_artifact |
terminator_feedback.py extended with add_analyst_hint / get_analyst_hints / set_verdict_override (fcntl.flock-safe, assert_under_dir enforced). The adversarial_triage advocate prompt now reads analyst hints from AIEDGE_FEEDBACK_DIR and prefixes them priority-sorted — opt-in; byte-identical behavior when env var unset.
4. Detection vs Priority Calibration (PR #15)
Closes external reviewer critique that EPSS-additive confidence made SCOUT's confidence field look like a ranking heuristic instead of a true-positive probability.
New scoring.py with:
@dataclass(frozen=True)
class PriorityInputs:
detection_confidence: float
epss_score: float | None
epss_percentile: float | None
reachability: str | None
backport_present: bool
cvss_base: float | None
def compute_priority_score(inputs: PriorityInputs) -> float:
# Weights: detection 50% / EPSS 25% / reach 15% / CVSS 10%
# Backport: -0.20 penaltycve_scan.py:1140-1170 refactored: confidence now strictly capped at STATIC_CODE_VERIFIED_CAP=0.55 (static evidence only). EPSS / reachability / backport / CVSS now feed priority_score instead.
New doc: docs/scoring_calibration.md with a before/after worked example.
5. Extraction Failure Analyst Guidance (PR #14)
When extraction fails, SCOUT now emits a structured guidance block pointing the analyst at concrete next steps:
Detected encryption (entropy 7.95/8.0). Possible vendor decryption needed.
Suggested actions:
1. Check vendor_decrypt.py for known vendor formats and add a handler for this firmware.
2. Provide a pre-extracted rootfs: ./scout analyze firmware.bin --rootfs /path/to/extracted
3. Try binwalk v3 entropy mode / alternative extractor: binwalk --entropy firmware.bin
4. File an issue with the first 4 KB hex dump: xxd firmware.bin | head -64
Hint: docs/runbook.md#extraction-failure
New docs/runbook.md#extraction-failure section with symptoms/causes/remediation table.
Verification
| Metric | v2.5.0 | v2.6.0 |
|---|---|---|
| pytest | 865 | 1027 passed, 1 skipped (+162) |
| pyright | 0 errors | 0 errors, 0 warnings (baseline preserved) |
| ruff | clean | clean |
| CI | 5/5 green | 5/5 green |
New test distribution: reasoning_trail 20 · extraction_guidance 18 · mcp_analyst_tools 33 · stage_dag 14 · run_stages_parallel 14 · scoring 19 · reasoning_trail_viewer 44
R7000 smoke test (PR #15): 3 findings, all carrying priority_score + priority_inputs. cve_confidence_above_0.55_cap = 0 (detection cap enforced). priority_bucket_counts = {critical: 0, high: 0, medium: 3, low: 0}.
Design Invariants Preserved
- Additive-only on
findings.py(PR #7a pattern continues forcategory, nowreasoning_trail,priority_score,priority_inputs). No report schema version bump. All 7 downstream consumers untouched. - Sequential
run_stages()bit-identical to pre-PR state. StageContextfrozen invariant preserved (thread-safe sharing without locks).- All file writes continue through
assert_under_dir(). - v2.5.0 LLM driver contracts (system_prompt, temperature, 5-stage parser) unchanged.
- 200-char
raw_response_excerptcap enforced atReasoningEntry.__post_init__(cannot be bypassed by call sites).
Deferred to Phase 2C
- Wall-clock comparison sequential vs
--experimental-parallel 4on R7000 (PoC merged, real firmware measurement pending) - Analyst hint injection loop smoke with real LLM driver +
AIEDGE_FEEDBACK_DIR - Benchmark freeze rule enforcement (
check_baseline_metadatawiring requires fresh corpus baseline)
Upgrade Notes
Breaking: confidence field semantics changed for CVE findings — it is now strictly detection evidence (≤0.55). Consumers previously ranking by confidence should migrate to priority_score. See docs/scoring_calibration.md for the full contract.
Everything else is additive; existing consumers that ignore the new fields continue to work unchanged.
Full diff: v2.5.0...v2.6.0
PR #6: #6
🤖 Generated with Claude Code
SCOUT v2.5.0 — Structured LLM + EPSS + Observability + CRA Compliance
Strategic Roadmap Phase 1 implementation. Based on 30+ academic papers (LATTE, Operation Mango, HouseFuzz, VulnSage) and competitive analysis (Theori Xint, FirmAgent, EU CRA).
Highlights
LLM structured output (parse failure 100% → 0%)
- New
llm_prompts.py: 7 role-based system prompts (ADVOCATE/CRITIC/TAINT/CLASSIFIER/REPAIR/SYNTHESIS) + temperature constants - LLMDriver Protocol:
system_prompt+temperatureparameters wired into all 4 drivers (Codex, Claude API, Claude Code CLI, Ollama) - 5-stage JSON parser: preamble strip → fence → raw → brace-counting → common error fix; optional
required_keysschema validation - All LLM-using stages updated:
adversarial_triage,taint_propagation,semantic_classifier
Sink expansion (taint_propagation.py)
_SINK_SYMBOLS: 11 → 28 (memcpy, memmove, strcat, strncpy, gets, vsprintf, printf family, scanf family, dlopen, realpath)_FORMAT_STRING_SINKS+_is_format_string_variable()helper for variable-controlled format string detection
EPSS scoring (cve_scan.py)
- FIRST.org API integration with batched queries (30 IDs/request)
- Per-run + cross-run cache
- Confidence adjustment by EPSS percentile (≥0.10: +0.10, ≥0.01: +0.05, <0.001: -0.05)
Observability
- Separate
parse_failuresvsllm_call_failurescounters inadversarial_triageandfp_verification - LLM failure classification helpers:
quota_exhausted,driver_unavailable,driver_nonzero_exit
CI/CD & Compliance
- New
.github/actions/scout-scan/: composite GitHub Action with SARIF upload to GitHub Security tab docs/cra_compliance_mapping.md: EU Cyber Resilience Act Annex I 12 essential requirements mapped to SCOUT outputsdocs/strategic_roadmap_2026.md: 3-Phase plan (v2.5 → v3.0 → v4.0)
Bug Fixes
- CVE scan signature-only path: removed early
returnso signature-only matches go through the same enrichment / finding-candidate pipeline as NVD matches - CVE scan
compvariable bug: backport confidence adjustment now uses per-match component metadata instead of leaked outer loop variable - Semantic classifier batch size: reduced from 50 → 15 functions per LLM call to prevent JSON schema loss in long contexts
R7000 Verification (Netgear, 31MB, codex driver, 2026-04-13)
| Metric | Pre-v2.5 (1211 run) | v2.5.0 (1320 run) |
|---|---|---|
adversarial_triage parse_failures |
100/100 | 0/100 |
adversarial_triage parsed_ok |
0/100 | 100/100 |
fp_verification unverified |
97/100 | 0/100 |
fp_verification true_positives |
1 | 57 |
fp_verification false_positives |
2 | 43 |
cve_scan EPSS enriched |
0/23 | 23/23 |
Adversarial debate: 100 debated → 99 downgraded (FP) + 1 maintained (TP)
Run: `aiedge-runs/2026-04-12_1320_sha256-b28bf08e9d2c`
Stats
- 21 files changed
- 3,695 insertions / 792 deletions
- 7 new files
See CHANGELOG.md for full details.
SCOUT v2.4.1
Terminator Re-evaluation Fixes
After v2.4.0, Terminator identified 3 issues. All addressed:
Confidence Calibration
decompiled_colocated: 0.60 → 0.45 (0.50 for high-risk sinks)- Separate caps per method: pcode_colocated 0.65, decompiled_colocated 0.50, decompiled_interprocedural 0.60
addr_diff Removal
- Replaced fragile
addr_diff > 16address matching with callee name resolution viaresolve_call_target() - Robust against compiler optimizations and instruction alignment differences
Interprocedural Taint (Strategy 4)
- Cross-function source→sink detection using xref call graph
- Caller with source API calls callee with sink API →
decompiled_interproceduraltrace - 1-hop depth limit to control false positives
- Verified:
fread→vsprintfacrossFUN_00012514→FUN_00011fe0in RT-AX88U
| Metric | v2.4.0 | v2.4.1 |
|---|---|---|
| Total taint (RT-AX88U) | 15 | 16 |
| Interprocedural traces | 0 | 1 |
| decompiled_colocated conf | 0.60 | 0.45 |
🤖 Generated with Claude Code
SCOUT v2.4.0
Detection Engine Upgrade
Driven by Terminator's evaluation of ASUS RT-AX88U findings — "framework is top-tier, detection core needs depth" — this release significantly upgrades SCOUT's vulnerability detection capabilities.
Highlights
- Ghidra P-code Taint Analysis — 3-strategy dataflow tracing replaces symbol co-occurrence: P-code SSA forward taint → P-code colocated → decompiled body analysis
- 3-Tier Confidence System —
PCODE_VERIFIED_CAP = 0.75joins existing co-occurrence (0.40) and code-verified (0.55) tiers - 4 New Rule Families — SQL injection, format string, path traversal, SSRF detection (9 regex patterns across PHP/Python/C/shell)
- CGI Handler Detection — Ghidra string_refs extraction of
do_*_cgifunction names as source endpoints - SBOM Backport Detection — opkg patch revision parsing with -0.30 confidence for backported packages
- Handoff Schema —
firmware_handoff.jsonnow includes adversarial triage schema reference for downstream consumers
Verified
| Metric | v2.3.0 | v2.4.0 | Change |
|---|---|---|---|
| Taint results (RT-AX88U) | 10 | 15 | +50% |
| Max confidence | 0.40 | 0.60 | +50% |
| Ghidra-verified traces | 0 | 5 | New |
| Sanitizer detection | N/A | 2 detected | New |
| Detection rule families | 5 | 9 | +80% |
New Files
src/aiedge/ghidra_scripts/pcode_taint.py— P-code SSA forward/backward taint analysis
🤖 Generated with Claude Code