Skip to content

Releases: R00T-Kim/SCOUT

SCOUT v3.0.0-rc1 — Hybrid Analysis Engine

20 May 03:09

Choose a tag to compare

SCOUT v3.0.0-rc1 — Hybrid Analysis Engine

SCOUT v3.0.0-rc1 marks the transition from a binary-centric firmware scanner to a Hybrid Analysis Engine.

This release adds first-class shell script analysis to the existing firmware analysis pipeline, closing a major blind spot in firmware auditing where high-level script logic previously remained under-analyzed compared to ELF binaries.

Highlights

  • Integrated ScriptAnalyzer into the main SCOUT pipeline
  • Expanded analysis coverage from ELF binaries to shell scripts
  • Added heuristic detection for insecure eval, backticks, and unquoted variable usage
  • Updated inventory logic to recursively collect shell scripts
  • Unified script findings with the existing report pipeline
  • Preserved report usability by avoiding raw heuristic match bloat

Validation

Tested on TP-Link ER605 firmware.

  • Processed 1,334 shell scripts
  • Manually reviewed TOP 20 script findings
  • Confirmed stable merge into unified report.json
  • Verified high-impact findings including ipsec and acme.sh command-injection candidates

Design Constraint

SCOUT must process 1,000+ scripts without causing report bloat or analyst fatigue.

Rejected Design

Raw heuristic matches are not stored directly in the main dossier.

Reason:

  • High false-positive noise
  • Poor signal-to-noise ratio
  • Increased report size
  • Reduced analyst usability
  • Unnecessary performance drag

Confidence

High.

Scope Risk

Broad.

This release changes SCOUT's analysis scope from binary-focused firmware analysis to hybrid binary/script firmware analysis.

Not Tested

  • Highly obfuscated custom shell loaders
  • Vendor-specific script packing schemes
  • Large-scale regression across multiple firmware families

Versioning

  • Git tag: v3.0.0-rc1
  • Python package version: 3.0.0rc1

v2.8.0 — Exploit Pattern RAG

18 May 05:50

Choose a tag to compare

v2.8.0 introduces Exploit Pattern RAG, a major upgrade to the AutoPoC generation engine.

Key Features

  1. Knowledge Base: Centralized repository for structured exploit patterns in data/exploit_references/. (Standard-compliant JSON metadata).
  2. Scoring Retriever: Multi-axis matching engine selecting best-fit patterns based on candidate-target alignment.
  3. Adaptation Engine: Prompts LLMs to adapt tactical patterns instead of raw code, reducing hallucination.
  4. Contamination Guard: Automatically detects and blocks target-specific artifact leaks from references into generated PoCs.
  5. User-Centric Documentation: Reorganized READMEs to emphasize practical use cases, unique advantages, and quick-start guides.
  6. Zero-Dependency Mandate: Restored pure Python stdlib compatibility by removing external YAML requirements.

See CHANGELOG.md for full details.

SCOUT v2.7.2 — Phase 2C++ detection engine integrity patch

24 Apr 05:21

Choose a tag to compare

Closes two follow-ups from the v2.4.0 external review (docs/upgrade_plz.md) that had been partially addressed by v2.4.1 but left cosmetic residues in the tree. No pair-eval scorecard movement expected — v2.7.1's 2/5 PASS remains the figure of record.

Changed

Phase 2C++.1 — DECOMPILED_COLOCATED_CAP = 0.45 promoted to a named constant (9488b8b)

The decompiled_colocated taint method previously hardcoded a 0.50 ceiling inline. confidence_caps.py now exposes a 5-tier cap ladder:

Cap Value Evidence level
SYMBOL_COOCCURRENCE_CAP 0.40 Symbols co-occur; no code path confirmed
DECOMPILED_COLOCATED_CAP 0.45 (new) Body-text co-occurrence; inline CALLs visible
STATIC_CODE_VERIFIED_CAP 0.55 Decompiled code inspected + LLM taint trace
STATIC_ONLY_CAP 0.60 Static-reference observation ceiling
PCODE_VERIFIED_CAP 0.75 P-code SSA dataflow confirmed

Consumer impact: decompiled_colocated traces drop 0.50 → 0.45 (-0.05). ROC thresholds previously pinned at 0.50 should be retuned to 0.45 to preserve pre-v2.7.1 recall on that evidence class. priority_score weights and cve_scan's STATIC_CODE_VERIFIED_CAP=0.55 unchanged.

Fixed

Phase 2C++.2 — legacy addr_diff > 16 residues removed (36ea517)

Commit 3352783 (v2.4.1, 2026-04-11, 41 minutes after the v2.4.0 P-code taint engine landed) replaced the primary CALL-matching path with callee-name resolution, but left two residues:

  1. src/aiedge/ghidra_analysis.py — a standalone trace_pcode_forward() helper inside _PYGHIDRA_SCRIPT with its own diff > 16 gate. Never invoked (the inline Strategy 1 loop at line 525-587 has always been the real path).
  2. src/aiedge/ghidra_scripts/pcode_taint.py — an else: addr_diff = abs(...) fallback guarded by if source_api_name:. run() always passes source_api_name=source_api (line 291-294), so the fallback was unreachable at runtime.

Both are now physically removed. _trace_forward_pcode()'s source_api_name parameter is required (no default), formalising the invariant that has held for 13 days. No runtime behaviour change — the production paths have done callee-name matching since v2.4.1.

New guard-rail tests in tests/test_ghidra_dead_code_removed.py pin the removal so grep-based review no longer finds a false-positive match for the offset heuristic.

Why no re-measurement

  • Gap B was runtime-effective since v2.4.1.
  • Gap C's new ceiling only binds on decompiled_colocated, which is emitted solely by the pyghidra fallback (ghidra_analysis.py:609). Environments with Ghidra 12 + analyzeHeadless on PATH exercise the primary script path and rarely hit the fallback.
  • Gate 1/2/3 FAIL is driven by findings.py's single-synthesis selection bottleneck — a 0.05 shift on a rarely-emitted method cannot cross those thresholds.

The full rationale, including git-blame trace and the deferred Gap A (interprocedural taint) decision point, is in docs/v2.7.2_release_plan.md.

Verification

  • pytest -q full regression green
  • ruff check src/ tests/ clean
  • pyright src/ 0 errors
  • scripts/check_doc_consistency.py OK

Pivot Option D unchanged

v2.7.2 is a half-day hygiene release, not a behavioural pivot. The compliance_report stage and four standard mappings shipped in v2.7.0 are unchanged. Phase 3'.2 CRA Audit SaaS v0.0 internal alpha still starts in 2026-05.

v2.7.1 — Phase 2C+.4 vendor corpus expansion (1/5 → 2/5 PASS)

22 Apr 14:27

Choose a tag to compare

Phase 2C+.4 vendor corpus expansion — quantitative refinement of v2.7.0's scenario C

v2.7.1 closes Phase 2C+.4 by extending the pair-eval corpus from 7 to 12 vendor/model pairs: D-Link DIR-859 (CVE-2019-17621), D-Link DIR-878 (vendor advisory), ASUS RT-AC68U (CVE-2020-15498), Linksys WRT1900AC v2 (progression), and Linksys EA6700 (progression). Phase 2D' Entry Gate scorecard transitions from 1/5 PASS → 2/5 PASS.

Phase 2D' Entry Gate scorecard (FINAL, WRT1900AC v2 ok measurement of record)

Gate Threshold v2.7.0 (7-pair) v2.7.1 FINAL (12-pair) Verdict
1 Recall ≥ 0.40 0.1429 0.1667 (+17% rel) ❌ FAIL
2 Tier variation ≥ 2 nonzero TP tiers 1 (symbol_only) 1 (back-slide) ❌ FAIL
3 Finding diversity < 0.50 1.000 0.917 ❌ FAIL
4 Dedicated rerun ≥ 1/N 14/14 (Codex) 14/14 + 12/12 (--no-llm) ✅ PASS
5 Corpus ≥ 10 7 12 ✅ PASS

The +1 net gain over v2.7.0 (1/5 → 2/5) comes from Gate 5 (Corpus), which clears by manifest registration alone. Gate 1 absolute recall improves but stays well below threshold; the new TP/FP pair (DIR-859 vuln + patched both hit `aiedge.findings.web.exec_sink_overlap`) corroborates the v2.7.0 diagnosis that `findings.py`'s single-synthesis-finding selection is the structural Gate 1/3 limit.

Honest figure-of-record protocol

An intermediate 1st-pass measurement under partial WRT1900AC v2 extractions (1200-second budget) transiently showed Gate 2 PASS due to `aiedge.findings.analysis_incomplete` populating the `unknown` tier. The 2400-second budget rerun ok-state measurement reverts Gate 2 to FAIL — partial-extraction artifacts can falsely populate Gate-2 tiers, so the ok-state measurement is the figure of record. See `docs/v2.7.1_release_plan.md` for the full measurement history and Gate Diagnosis Matrix.

Pivot Option D unchanged

v2.7.1 is a quantitative refinement of v2.7.0's scenario C, not a re-pivot. The compliance-led identity remains primary. The `compliance_report` stage and four standard mappings (CRA Annex I / FDA Section 524B / ISO/SAE 21434 / UN R155) shipped in v2.7.0 are unchanged.

Notable fixes

  • `scripts/score_pair_corpus.py` graceful-skip for missing pair runs — no more `StopIteration` crashes when scoring corpus growth or partial-coverage measurements; missing rows are recorded as `vulnerable_status="missing"` / `patched_status="missing"` and excluded from recall/FPR denominators.

Known issues (out of v2.7.x scope)

  • DIR-878 partial extraction — SHRS-encrypted inner `.bin` not yet decrypted. `vendor_decrypt.py` extension is a follow-on task.
  • Gate 1/2/3 structural limit — `findings.py` single-synthesis-finding selection is the root cause; resolution belongs to the external detection-engine track, not v2.7.x scope.

Full changelog

See `CHANGELOG.md` for the complete `[2.7.1]` section.

SCOUT v2.7.0 — Phase 2C+ close-out + scenario C sealed

20 Apr 04:47

Choose a tag to compare

Phase 2C+ close-out release. Pivot 2026-04-19 roadmap's detection-strengthening insert is merged (LATTE backward slicing, LARA pattern-based source identification, sink coverage expansion, finding-diversity release gate), with a follow-up ascii_strings wire-through fix that resurrected the inert LARA axis. The compliance-led track ships its Phase 3'.1 suite: four per-standard mappings (CRA Annex I / FDA Section 524B / ISO/SAE 21434 / UN R155) and the new compliance_report pipeline stage.

Reviewer-lane Official Measurement (14/14, Codex LATTE-on, 2026-04-20 13:33 KST)

Gate Threshold Result Verdict
1 Detection recall ≥ 0.40 0.1429 (identical to baseline) FAIL
2 Evidence tier variation ≥ 2 nonzero TP tiers 1 FAIL
3 Finding diversity < 0.5 1.000 (14/14 on single synthesis finding id) FAIL
4 Dedicated rerun ≥ 1/N success 14/14 PASS
5 Pair corpus size ≥ 10 7 FAIL

scripts/release_gate.shRELEASE_GOVERNANCE=FAIL. score_pair_corpus.py --pairs benchmarks/pair-eval/pairs.json{"pairs": 7, "recall": 0.14285714, "fpr": 0.14285714}.

Scenario C Sealing

The 2C+ workstream (LARA source expansion 0 → 21-86 hits/run, LATTE slicing opt-in, sink coverage 28 → 51+, diversity gate enforcement) did not move Gate 1 or Gate 3 because findings.py's primary-finding selection always routes vulnerability evidence through the single synthesis-stage id aiedge.findings.web.exec_sink_overlap. Per the pivot document's scenario C, option D is adopted: Phase 2D' is deferred and SCOUT fully pivots to the compliance-led identity.

Gate-by-gate remediation paths (root cause / required work / track / timeline) are fixed in docs/v2.7.0_release_plan.md "Gate Diagnosis Matrix":

  • Gate 1 recall — external track (detection-engine redesign, 6-12mo+)
  • Gate 2 tier variation — external track (P-code engine robustness)
  • Gate 3 diversity — Phase 3' research track (Option C evidence-level metric redefinition)
  • Gate 4 rerunDONE (becomes the operational backbone for Phase 3'.2 CRA Audit SaaS job queue)
  • Gate 5 corpusv2.7.1 scope (2C+.4 vendor-extraction expansion, 1-2 weeks)

What's New

  • LATTE backward slicing (Phase 2C+.1) — opt-in via AIEDGE_LATTE_SLICING=1, 32 tests
  • LARA URI/CGI/config-key source identification (Phase 2C+.2) — 50 patterns, with ascii_strings wire-through fix on D-Link httpd (0 → 10-33 matches per firmware)
  • Sink coverage 28 → 51+ (Phase 2C+.3) — full CWE taxonomy (78/22/426/732/377/250/269/454)
  • Finding diversity gate (Phase 2C+.5) — PAIR_EVAL_DIVERSITY release sub-gate + pair-eval timeout diagnostic
  • Compliance mapping suite (Phase 3'.1 B-1~B-4) — four standards + compliance_report stage (43rd pipeline stage)
  • Reviewer-lane instrumentation scripts — sequential launcher, watcher handoff, codex/claude LATTE-on launchers

Next Steps (v2.7.1)

  • 2C+.4 Vendor extraction chain expansion (DIR-859 / RT-AC68U / WRT1900ACS / DIR-878 + 1) → corpus 7 → 10+, Gate 5 resolved
  • 3'.1 step B-5 release tag bundling
  • Phase 3'.2 CRA Audit SaaS v0.0 internal alpha kickoff (see wiki/projects/scout-cra-audit-saas-scope.md)

Full changelog: CHANGELOG.md

🤖 Release prepared with Claude Code

v2.6.1

17 Apr 08:14

Choose a tag to compare

[2.6.1] — 2026-04-17

Phase 2C close-out release. This point release rolls up the post-v2.6.0 foundation hardening work, publishes the fresh corpus refresh baseline, and documents the semantic / driver caveats that were previously implicit.

Added

  • Fresh corpus refresh baseline (docs/carry_over_benchmark_v2.6.md, benchmark-results/2c6-fresh-full-final/aggregate.json, scripts/aggregate_corpus_metrics.py). The 1,123-target refresh is now published as a best-view aggregate across the fresh rerun waves. Final outcome: 1110 success / 4 partial / 9 fatal; successful runs are extraction=ok 1110/1110, inventory=sufficient 1110/1110, nonzero findings 1110/1110, nonzero CVE 1089/1110.
  • LLM driver degradation matrix (docs/llm_driver_degradation_matrix.md). Documents the actual contract differences between Codex CLI, Claude API, Claude Code CLI, and Ollama, especially around system-prompt delivery and temperature handling.
  • Confidence semantic break note (docs/confidence_semantic_break_v2.6.md). Makes the v2.5.x → v2.6+ shift explicit: confidence is now evidence-only; priority_score / priority_inputs carry ranking semantics.

Changed

  • README / README.ko baseline messaging. Tier 1 hero numbers now point at the fresh v2.6.1 corpus refresh, while Tier 2 remains explicitly carry-over until the pair-eval lane lands. The over-broad "False negative rate ≈ 0%" phrasing is replaced with a pending pair-eval note.
  • Analyst copilot wording. Public docs now split the surface into Explainability surface, Analyst-in-the-loop channel, and Autonomous reasoning (future) instead of presenting all LLM-related behavior as one undifferentiated capability.
  • Release governance helper (scripts/release.sh). The helper is upgraded from a README-only version bumper into a release close-out utility that can synchronize pyproject, README badges, and CHANGELOG headers in dry-run/apply modes.

Fixed

  • Synthesis finding reasoning trail inheritance (findings.py). Top-level synthesis findings such as aiedge.findings.web.exec_sink_overlap now inherit matched downstream evidence lineage instead of relying only on the stage-level aggregate summary. Matching prefers run-relative binary path, falls back to binary SHA-256, emits a findings/synthesis_match summary entry, and appends a deterministic top-K sample of representative downstream trail entries.
  • SBOM stage silent schema mismatch (sbom.py). Vendor-stock firmware no longer silently returns 0 components because of stale inventory.file_list / string_hits assumptions. The stage now walks inventory.roots directly and falls back to direct binary reads via _extract_ascii_runs.
  • Relative runs_root handling in create_run() (run.py). runs_root is resolved before path derivation so relative output roots still wire absolute firmware paths into extraction; regression coverage lives in tests/test_create_run_relative_runs_root.py.

Verification

  • python3 -m py_compile scripts/aggregate_corpus_metrics.py
  • python3 scripts/check_doc_consistency.py
  • fresh corpus aggregate regenerated from benchmark-results/2c6-fresh-full-v2* waves
  • representative firmware smoke coverage retained from 2C.1–2C.5 (R7000 lineage / SBOM pilot / verified-chain provenance)

v2.6.0 — Phase 2B: Analyst Copilot + DAG Parallel + Calibration

13 Apr 10:12

Choose a tag to compare

Phase 2B Release

SCOUT v2.6.0 delivers three axes of change that position it as a single-firmware analyst copilot: performance (DAG parallelization PoC), analyst UX (reasoning trail + MCP override loop), and honest confidence calibration (detection vs priority separation).

Merged via PR #6 (rebase) as 6 atomic commits — any one could ship independently.

1. DAG Parallelization PoC (PR #10)

  • New stage_dag.py with manual STAGE_DEPS (42 entries) + Kahn topo_levels() (15 levels, max-width 7)
  • run_stages_parallel() — ThreadPoolExecutor level-wise submit, skip-on-failed-dep, fail_fast=True/False modes. Sequential run_stages() unchanged
  • New CLI flag: --experimental-parallel [N] (default 4 workers) on both analyze and stages subcommands
  • ProgressTracker(out_of_order=True) for completion-order rendering in parallel mode
# Opt-in parallel execution
./scout analyze firmware.bin --experimental-parallel 4

2. Reasoning Trail Persistence (PR #11 + PR #13)

New reasoning_trail.py module captures structured ReasoningEntry records for LLM-driven finding adjustments. The adversarial_triage debate loop now records advocate / critic / decision entries with llm_model and 200-char raw_response_excerpt. The fp_verification pattern matcher records sanitizer / non-propagating / sysfile hits with per-pattern delta.

All three analyst surfaces expose the trail:

  • Web viewer — collapsible <details> section with CSS styling
  • Analyst markdown report — numbered "Reasoning Trail (N steps)" subsection per finding
  • TUI finding detailrender_finding_detail_with_trail() (AIEDGE_TUI_ASCII-compatible)

SARIF properties bag gains scout_reasoning_trail.

3. MCP Analyst Tools (PR #12)

4 new MCP tools for analyst-driven feedback:

Tool Purpose
scout_get_finding_reasoning Fetch full reasoning trail for a finding
scout_inject_hint Push an analyst hint into the feedback registry
scout_override_verdict Force-set a finding verdict (confirmed / false_positive / wont_fix / needs_info)
scout_filter_by_category Filter findings by vulnerability / misconfiguration / pipeline_artifact

terminator_feedback.py extended with add_analyst_hint / get_analyst_hints / set_verdict_override (fcntl.flock-safe, assert_under_dir enforced). The adversarial_triage advocate prompt now reads analyst hints from AIEDGE_FEEDBACK_DIR and prefixes them priority-sorted — opt-in; byte-identical behavior when env var unset.

4. Detection vs Priority Calibration (PR #15)

Closes external reviewer critique that EPSS-additive confidence made SCOUT's confidence field look like a ranking heuristic instead of a true-positive probability.

New scoring.py with:

@dataclass(frozen=True)
class PriorityInputs:
    detection_confidence: float
    epss_score: float | None
    epss_percentile: float | None
    reachability: str | None
    backport_present: bool
    cvss_base: float | None

def compute_priority_score(inputs: PriorityInputs) -> float:
    # Weights: detection 50% / EPSS 25% / reach 15% / CVSS 10%
    # Backport: -0.20 penalty

cve_scan.py:1140-1170 refactored: confidence now strictly capped at STATIC_CODE_VERIFIED_CAP=0.55 (static evidence only). EPSS / reachability / backport / CVSS now feed priority_score instead.

New doc: docs/scoring_calibration.md with a before/after worked example.

5. Extraction Failure Analyst Guidance (PR #14)

When extraction fails, SCOUT now emits a structured guidance block pointing the analyst at concrete next steps:

Detected encryption (entropy 7.95/8.0). Possible vendor decryption needed.
Suggested actions:
  1. Check vendor_decrypt.py for known vendor formats and add a handler for this firmware.
  2. Provide a pre-extracted rootfs: ./scout analyze firmware.bin --rootfs /path/to/extracted
  3. Try binwalk v3 entropy mode / alternative extractor: binwalk --entropy firmware.bin
  4. File an issue with the first 4 KB hex dump: xxd firmware.bin | head -64
Hint: docs/runbook.md#extraction-failure

New docs/runbook.md#extraction-failure section with symptoms/causes/remediation table.

Verification

Metric v2.5.0 v2.6.0
pytest 865 1027 passed, 1 skipped (+162)
pyright 0 errors 0 errors, 0 warnings (baseline preserved)
ruff clean clean
CI 5/5 green 5/5 green

New test distribution: reasoning_trail 20 · extraction_guidance 18 · mcp_analyst_tools 33 · stage_dag 14 · run_stages_parallel 14 · scoring 19 · reasoning_trail_viewer 44

R7000 smoke test (PR #15): 3 findings, all carrying priority_score + priority_inputs. cve_confidence_above_0.55_cap = 0 (detection cap enforced). priority_bucket_counts = {critical: 0, high: 0, medium: 3, low: 0}.

Design Invariants Preserved

  • Additive-only on findings.py (PR #7a pattern continues for category, now reasoning_trail, priority_score, priority_inputs). No report schema version bump. All 7 downstream consumers untouched.
  • Sequential run_stages() bit-identical to pre-PR state.
  • StageContext frozen invariant preserved (thread-safe sharing without locks).
  • All file writes continue through assert_under_dir().
  • v2.5.0 LLM driver contracts (system_prompt, temperature, 5-stage parser) unchanged.
  • 200-char raw_response_excerpt cap enforced at ReasoningEntry.__post_init__ (cannot be bypassed by call sites).

Deferred to Phase 2C

  • Wall-clock comparison sequential vs --experimental-parallel 4 on R7000 (PoC merged, real firmware measurement pending)
  • Analyst hint injection loop smoke with real LLM driver + AIEDGE_FEEDBACK_DIR
  • Benchmark freeze rule enforcement (check_baseline_metadata wiring requires fresh corpus baseline)

Upgrade Notes

Breaking: confidence field semantics changed for CVE findings — it is now strictly detection evidence (≤0.55). Consumers previously ranking by confidence should migrate to priority_score. See docs/scoring_calibration.md for the full contract.

Everything else is additive; existing consumers that ignore the new fields continue to work unchanged.


Full diff: v2.5.0...v2.6.0
PR #6: #6

🤖 Generated with Claude Code

SCOUT v2.5.0 — Structured LLM + EPSS + Observability + CRA Compliance

13 Apr 04:11

Choose a tag to compare

Strategic Roadmap Phase 1 implementation. Based on 30+ academic papers (LATTE, Operation Mango, HouseFuzz, VulnSage) and competitive analysis (Theori Xint, FirmAgent, EU CRA).

Highlights

LLM structured output (parse failure 100% → 0%)

  • New llm_prompts.py: 7 role-based system prompts (ADVOCATE/CRITIC/TAINT/CLASSIFIER/REPAIR/SYNTHESIS) + temperature constants
  • LLMDriver Protocol: system_prompt + temperature parameters wired into all 4 drivers (Codex, Claude API, Claude Code CLI, Ollama)
  • 5-stage JSON parser: preamble strip → fence → raw → brace-counting → common error fix; optional required_keys schema validation
  • All LLM-using stages updated: adversarial_triage, taint_propagation, semantic_classifier

Sink expansion (taint_propagation.py)

  • _SINK_SYMBOLS: 11 → 28 (memcpy, memmove, strcat, strncpy, gets, vsprintf, printf family, scanf family, dlopen, realpath)
  • _FORMAT_STRING_SINKS + _is_format_string_variable() helper for variable-controlled format string detection

EPSS scoring (cve_scan.py)

  • FIRST.org API integration with batched queries (30 IDs/request)
  • Per-run + cross-run cache
  • Confidence adjustment by EPSS percentile (≥0.10: +0.10, ≥0.01: +0.05, <0.001: -0.05)

Observability

  • Separate parse_failures vs llm_call_failures counters in adversarial_triage and fp_verification
  • LLM failure classification helpers: quota_exhausted, driver_unavailable, driver_nonzero_exit

CI/CD & Compliance

  • New .github/actions/scout-scan/: composite GitHub Action with SARIF upload to GitHub Security tab
  • docs/cra_compliance_mapping.md: EU Cyber Resilience Act Annex I 12 essential requirements mapped to SCOUT outputs
  • docs/strategic_roadmap_2026.md: 3-Phase plan (v2.5 → v3.0 → v4.0)

Bug Fixes

  • CVE scan signature-only path: removed early return so signature-only matches go through the same enrichment / finding-candidate pipeline as NVD matches
  • CVE scan comp variable bug: backport confidence adjustment now uses per-match component metadata instead of leaked outer loop variable
  • Semantic classifier batch size: reduced from 50 → 15 functions per LLM call to prevent JSON schema loss in long contexts

R7000 Verification (Netgear, 31MB, codex driver, 2026-04-13)

Metric Pre-v2.5 (1211 run) v2.5.0 (1320 run)
adversarial_triage parse_failures 100/100 0/100
adversarial_triage parsed_ok 0/100 100/100
fp_verification unverified 97/100 0/100
fp_verification true_positives 1 57
fp_verification false_positives 2 43
cve_scan EPSS enriched 0/23 23/23

Adversarial debate: 100 debated → 99 downgraded (FP) + 1 maintained (TP)

Run: `aiedge-runs/2026-04-12_1320_sha256-b28bf08e9d2c`

Stats

  • 21 files changed
  • 3,695 insertions / 792 deletions
  • 7 new files

See CHANGELOG.md for full details.

SCOUT v2.4.1

11 Apr 07:55

Choose a tag to compare

Terminator Re-evaluation Fixes

After v2.4.0, Terminator identified 3 issues. All addressed:

Confidence Calibration

  • decompiled_colocated: 0.60 → 0.45 (0.50 for high-risk sinks)
  • Separate caps per method: pcode_colocated 0.65, decompiled_colocated 0.50, decompiled_interprocedural 0.60

addr_diff Removal

  • Replaced fragile addr_diff > 16 address matching with callee name resolution via resolve_call_target()
  • Robust against compiler optimizations and instruction alignment differences

Interprocedural Taint (Strategy 4)

  • Cross-function source→sink detection using xref call graph
  • Caller with source API calls callee with sink API → decompiled_interprocedural trace
  • 1-hop depth limit to control false positives
  • Verified: fread→vsprintf across FUN_00012514→FUN_00011fe0 in RT-AX88U
Metric v2.4.0 v2.4.1
Total taint (RT-AX88U) 15 16
Interprocedural traces 0 1
decompiled_colocated conf 0.60 0.45

🤖 Generated with Claude Code

SCOUT v2.4.0

11 Apr 07:32

Choose a tag to compare

Detection Engine Upgrade

Driven by Terminator's evaluation of ASUS RT-AX88U findings — "framework is top-tier, detection core needs depth" — this release significantly upgrades SCOUT's vulnerability detection capabilities.

Highlights

  • Ghidra P-code Taint Analysis — 3-strategy dataflow tracing replaces symbol co-occurrence: P-code SSA forward taint → P-code colocated → decompiled body analysis
  • 3-Tier Confidence SystemPCODE_VERIFIED_CAP = 0.75 joins existing co-occurrence (0.40) and code-verified (0.55) tiers
  • 4 New Rule Families — SQL injection, format string, path traversal, SSRF detection (9 regex patterns across PHP/Python/C/shell)
  • CGI Handler Detection — Ghidra string_refs extraction of do_*_cgi function names as source endpoints
  • SBOM Backport Detection — opkg patch revision parsing with -0.30 confidence for backported packages
  • Handoff Schemafirmware_handoff.json now includes adversarial triage schema reference for downstream consumers

Verified

Metric v2.3.0 v2.4.0 Change
Taint results (RT-AX88U) 10 15 +50%
Max confidence 0.40 0.60 +50%
Ghidra-verified traces 0 5 New
Sanitizer detection N/A 2 detected New
Detection rule families 5 9 +80%

New Files

  • src/aiedge/ghidra_scripts/pcode_taint.py — P-code SSA forward/backward taint analysis

🤖 Generated with Claude Code