20 May 03:09

R00T-Kim

3842fe9

SCOUT v3.0.0-rc1 — Hybrid Analysis Engine Pre-release

Pre-release

SCOUT v3.0.0-rc1 — Hybrid Analysis Engine

SCOUT v3.0.0-rc1 marks the transition from a binary-centric firmware scanner to a Hybrid Analysis Engine.

This release adds first-class shell script analysis to the existing firmware analysis pipeline, closing a major blind spot in firmware auditing where high-level script logic previously remained under-analyzed compared to ELF binaries.

Highlights

Integrated ScriptAnalyzer into the main SCOUT pipeline
Expanded analysis coverage from ELF binaries to shell scripts
Added heuristic detection for insecure eval, backticks, and unquoted variable usage
Updated inventory logic to recursively collect shell scripts
Unified script findings with the existing report pipeline
Preserved report usability by avoiding raw heuristic match bloat

Validation

Tested on TP-Link ER605 firmware.

Processed 1,334 shell scripts
Manually reviewed TOP 20 script findings
Confirmed stable merge into unified report.json
Verified high-impact findings including ipsec and acme.sh command-injection candidates

Design Constraint

SCOUT must process 1,000+ scripts without causing report bloat or analyst fatigue.

Rejected Design

Raw heuristic matches are not stored directly in the main dossier.

Reason:

High false-positive noise
Poor signal-to-noise ratio
Increased report size
Reduced analyst usability
Unnecessary performance drag

Confidence

High.

Scope Risk

Broad.

This release changes SCOUT's analysis scope from binary-focused firmware analysis to hybrid binary/script firmware analysis.

Not Tested

Highly obfuscated custom shell loaders
Vendor-specific script packing schemes
Large-scale regression across multiple firmware families

Versioning

Git tag: v3.0.0-rc1
Python package version: 3.0.0rc1

Assets 2

18 May 05:50

R00T-Kim

v2.8.0

eaaf33c

v2.8.0 — Exploit Pattern RAG Latest

Latest

v2.8.0 introduces Exploit Pattern RAG, a major upgrade to the AutoPoC generation engine.

Key Features

Knowledge Base: Centralized repository for structured exploit patterns in data/exploit_references/. (Standard-compliant JSON metadata).
Scoring Retriever: Multi-axis matching engine selecting best-fit patterns based on candidate-target alignment.
Adaptation Engine: Prompts LLMs to adapt tactical patterns instead of raw code, reducing hallucination.
Contamination Guard: Automatically detects and blocks target-specific artifact leaks from references into generated PoCs.
User-Centric Documentation: Reorganized READMEs to emphasize practical use cases, unique advantages, and quick-start guides.
Zero-Dependency Mandate: Restored pure Python stdlib compatibility by removing external YAML requirements.

See CHANGELOG.md for full details.

Assets 2

24 Apr 05:21

R00T-Kim

v2.7.2

0acc1f1

SCOUT v2.7.2 — Phase 2C++ detection engine integrity patch

Closes two follow-ups from the v2.4.0 external review (docs/upgrade_plz.md) that had been partially addressed by v2.4.1 but left cosmetic residues in the tree. No pair-eval scorecard movement expected — v2.7.1's 2/5 PASS remains the figure of record.

Changed

Phase 2C++.1 — DECOMPILED_COLOCATED_CAP = 0.45 promoted to a named constant (9488b8b)

The decompiled_colocated taint method previously hardcoded a 0.50 ceiling inline. confidence_caps.py now exposes a 5-tier cap ladder:

Cap	Value	Evidence level
`SYMBOL_COOCCURRENCE_CAP`	0.40	Symbols co-occur; no code path confirmed
`DECOMPILED_COLOCATED_CAP`	0.45 (new)	Body-text co-occurrence; inline CALLs visible
`STATIC_CODE_VERIFIED_CAP`	0.55	Decompiled code inspected + LLM taint trace
`STATIC_ONLY_CAP`	0.60	Static-reference observation ceiling
`PCODE_VERIFIED_CAP`	0.75	P-code SSA dataflow confirmed

Consumer impact: decompiled_colocated traces drop 0.50 → 0.45 (-0.05). ROC thresholds previously pinned at 0.50 should be retuned to 0.45 to preserve pre-v2.7.1 recall on that evidence class. priority_score weights and cve_scan's STATIC_CODE_VERIFIED_CAP=0.55 unchanged.

Fixed

Phase 2C++.2 — legacy addr_diff > 16 residues removed (36ea517)

Commit 3352783 (v2.4.1, 2026-04-11, 41 minutes after the v2.4.0 P-code taint engine landed) replaced the primary CALL-matching path with callee-name resolution, but left two residues:

src/aiedge/ghidra_analysis.py — a standalone trace_pcode_forward() helper inside _PYGHIDRA_SCRIPT with its own diff > 16 gate. Never invoked (the inline Strategy 1 loop at line 525-587 has always been the real path).
src/aiedge/ghidra_scripts/pcode_taint.py — an else: addr_diff = abs(...) fallback guarded by if source_api_name:. run() always passes source_api_name=source_api (line 291-294), so the fallback was unreachable at runtime.

Both are now physically removed. _trace_forward_pcode()'s source_api_name parameter is required (no default), formalising the invariant that has held for 13 days. No runtime behaviour change — the production paths have done callee-name matching since v2.4.1.

New guard-rail tests in tests/test_ghidra_dead_code_removed.py pin the removal so grep-based review no longer finds a false-positive match for the offset heuristic.

Why no re-measurement

Gap B was runtime-effective since v2.4.1.
Gap C's new ceiling only binds on decompiled_colocated, which is emitted solely by the pyghidra fallback (ghidra_analysis.py:609). Environments with Ghidra 12 + analyzeHeadless on PATH exercise the primary script path and rarely hit the fallback.
Gate 1/2/3 FAIL is driven by findings.py's single-synthesis selection bottleneck — a 0.05 shift on a rarely-emitted method cannot cross those thresholds.

The full rationale, including git-blame trace and the deferred Gap A (interprocedural taint) decision point, is in docs/v2.7.2_release_plan.md.

Verification

pytest -q full regression green
ruff check src/ tests/ clean
pyright src/ 0 errors
scripts/check_doc_consistency.py OK

Pivot Option D unchanged

v2.7.2 is a half-day hygiene release, not a behavioural pivot. The compliance_report stage and four standard mappings shipped in v2.7.0 are unchanged. Phase 3'.2 CRA Audit SaaS v0.0 internal alpha still starts in 2026-05.

Assets 2

22 Apr 14:27

R00T-Kim

v2.7.1

b524ebe

v2.7.1 — Phase 2C+.4 vendor corpus expansion (1/5 → 2/5 PASS)

Phase 2C+.4 vendor corpus expansion — quantitative refinement of v2.7.0's scenario C

v2.7.1 closes Phase 2C+.4 by extending the pair-eval corpus from 7 to 12 vendor/model pairs: D-Link DIR-859 (CVE-2019-17621), D-Link DIR-878 (vendor advisory), ASUS RT-AC68U (CVE-2020-15498), Linksys WRT1900AC v2 (progression), and Linksys EA6700 (progression). Phase 2D' Entry Gate scorecard transitions from 1/5 PASS → 2/5 PASS.

Phase 2D' Entry Gate scorecard (FINAL, WRT1900AC v2 ok measurement of record)

Gate	Threshold	v2.7.0 (7-pair)	v2.7.1 FINAL (12-pair)	Verdict
1 Recall	≥ 0.40	0.1429	0.1667 (+17% rel)	❌ FAIL
2 Tier variation	≥ 2 nonzero TP tiers	1 (`symbol_only`)	1 (back-slide)	❌ FAIL
3 Finding diversity	< 0.50	1.000	0.917	❌ FAIL
4 Dedicated rerun	≥ 1/N	14/14 (Codex)	14/14 + 12/12 (`--no-llm`)	✅ PASS
5 Corpus	≥ 10	7	12	✅ PASS

The +1 net gain over v2.7.0 (1/5 → 2/5) comes from Gate 5 (Corpus), which clears by manifest registration alone. Gate 1 absolute recall improves but stays well below threshold; the new TP/FP pair (DIR-859 vuln + patched both hit `aiedge.findings.web.exec_sink_overlap`) corroborates the v2.7.0 diagnosis that `findings.py`'s single-synthesis-finding selection is the structural Gate 1/3 limit.

Honest figure-of-record protocol

An intermediate 1st-pass measurement under partial WRT1900AC v2 extractions (1200-second budget) transiently showed Gate 2 PASS due to `aiedge.findings.analysis_incomplete` populating the `unknown` tier. The 2400-second budget rerun ok-state measurement reverts Gate 2 to FAIL — partial-extraction artifacts can falsely populate Gate-2 tiers, so the ok-state measurement is the figure of record. See `docs/v2.7.1_release_plan.md` for the full measurement history and Gate Diagnosis Matrix.

Pivot Option D unchanged

v2.7.1 is a quantitative refinement of v2.7.0's scenario C, not a re-pivot. The compliance-led identity remains primary. The `compliance_report` stage and four standard mappings (CRA Annex I / FDA Section 524B / ISO/SAE 21434 / UN R155) shipped in v2.7.0 are unchanged.

Notable fixes

`scripts/score_pair_corpus.py` graceful-skip for missing pair runs — no more `StopIteration` crashes when scoring corpus growth or partial-coverage measurements; missing rows are recorded as `vulnerable_status="missing"` / `patched_status="missing"` and excluded from recall/FPR denominators.

Known issues (out of v2.7.x scope)

DIR-878 partial extraction — SHRS-encrypted inner `.bin` not yet decrypted. `vendor_decrypt.py` extension is a follow-on task.
Gate 1/2/3 structural limit — `findings.py` single-synthesis-finding selection is the root cause; resolution belongs to the external detection-engine track, not v2.7.x scope.

Full changelog

See `CHANGELOG.md` for the complete `[2.7.1]` section.

Assets 2

20 Apr 04:47

R00T-Kim

v2.7.0

4a32d69

SCOUT v2.7.0 — Phase 2C+ close-out + scenario C sealed

Phase 2C+ close-out release. Pivot 2026-04-19 roadmap's detection-strengthening insert is merged (LATTE backward slicing, LARA pattern-based source identification, sink coverage expansion, finding-diversity release gate), with a follow-up ascii_strings wire-through fix that resurrected the inert LARA axis. The compliance-led track ships its Phase 3'.1 suite: four per-standard mappings (CRA Annex I / FDA Section 524B / ISO/SAE 21434 / UN R155) and the new compliance_report pipeline stage.

Reviewer-lane Official Measurement (14/14, Codex LATTE-on, 2026-04-20 13:33 KST)

Gate	Threshold	Result	Verdict
1 Detection recall	≥ 0.40	0.1429 (identical to baseline)	FAIL
2 Evidence tier variation	≥ 2 nonzero TP tiers	1	FAIL
3 Finding diversity	< 0.5	1.000 (14/14 on single synthesis finding id)	FAIL
4 Dedicated rerun	≥ 1/N success	14/14	PASS
5 Pair corpus size	≥ 10	7	FAIL

scripts/release_gate.sh → RELEASE_GOVERNANCE=FAIL. score_pair_corpus.py --pairs benchmarks/pair-eval/pairs.json → {"pairs": 7, "recall": 0.14285714, "fpr": 0.14285714}.

Scenario C Sealing

The 2C+ workstream (LARA source expansion 0 → 21-86 hits/run, LATTE slicing opt-in, sink coverage 28 → 51+, diversity gate enforcement) did not move Gate 1 or Gate 3 because findings.py's primary-finding selection always routes vulnerability evidence through the single synthesis-stage id aiedge.findings.web.exec_sink_overlap. Per the pivot document's scenario C, option D is adopted: Phase 2D' is deferred and SCOUT fully pivots to the compliance-led identity.

Gate-by-gate remediation paths (root cause / required work / track / timeline) are fixed in docs/v2.7.0_release_plan.md "Gate Diagnosis Matrix":

Gate 1 recall — external track (detection-engine redesign, 6-12mo+)
Gate 2 tier variation — external track (P-code engine robustness)
Gate 3 diversity — Phase 3' research track (Option C evidence-level metric redefinition)
Gate 4 rerun — DONE (becomes the operational backbone for Phase 3'.2 CRA Audit SaaS job queue)
Gate 5 corpus — v2.7.1 scope (2C+.4 vendor-extraction expansion, 1-2 weeks)

What's New

LATTE backward slicing (Phase 2C+.1) — opt-in via AIEDGE_LATTE_SLICING=1, 32 tests
LARA URI/CGI/config-key source identification (Phase 2C+.2) — 50 patterns, with ascii_strings wire-through fix on D-Link httpd (0 → 10-33 matches per firmware)
Sink coverage 28 → 51+ (Phase 2C+.3) — full CWE taxonomy (78/22/426/732/377/250/269/454)
Finding diversity gate (Phase 2C+.5) — PAIR_EVAL_DIVERSITY release sub-gate + pair-eval timeout diagnostic
Compliance mapping suite (Phase 3'.1 B-1~B-4) — four standards + compliance_report stage (43rd pipeline stage)
Reviewer-lane instrumentation scripts — sequential launcher, watcher handoff, codex/claude LATTE-on launchers

Next Steps (v2.7.1)

2C+.4 Vendor extraction chain expansion (DIR-859 / RT-AC68U / WRT1900ACS / DIR-878 + 1) → corpus 7 → 10+, Gate 5 resolved
3'.1 step B-5 release tag bundling
Phase 3'.2 CRA Audit SaaS v0.0 internal alpha kickoff (see wiki/projects/scout-cra-audit-saas-scope.md)

Full changelog: CHANGELOG.md

🤖 Release prepared with Claude Code

Assets 2

17 Apr 08:14

R00T-Kim

v2.6.1

bd9065c

v2.6.1

[2.6.1] — 2026-04-17

Phase 2C close-out release. This point release rolls up the post-v2.6.0 foundation hardening work, publishes the fresh corpus refresh baseline, and documents the semantic / driver caveats that were previously implicit.

Added

Fresh corpus refresh baseline (docs/carry_over_benchmark_v2.6.md, benchmark-results/2c6-fresh-full-final/aggregate.json, scripts/aggregate_corpus_metrics.py). The 1,123-target refresh is now published as a best-view aggregate across the fresh rerun waves. Final outcome: 1110 success / 4 partial / 9 fatal; successful runs are extraction=ok 1110/1110, inventory=sufficient 1110/1110, nonzero findings 1110/1110, nonzero CVE 1089/1110.
LLM driver degradation matrix (docs/llm_driver_degradation_matrix.md). Documents the actual contract differences between Codex CLI, Claude API, Claude Code CLI, and Ollama, especially around system-prompt delivery and temperature handling.
Confidence semantic break note (docs/confidence_semantic_break_v2.6.md). Makes the v2.5.x → v2.6+ shift explicit: confidence is now evidence-only; priority_score / priority_inputs carry ranking semantics.

Changed

README / README.ko baseline messaging. Tier 1 hero numbers now point at the fresh v2.6.1 corpus refresh, while Tier 2 remains explicitly carry-over until the pair-eval lane lands. The over-broad "False negative rate ≈ 0%" phrasing is replaced with a pending pair-eval note.
Analyst copilot wording. Public docs now split the surface into Explainability surface, Analyst-in-the-loop channel, and Autonomous reasoning (future) instead of presenting all LLM-related behavior as one undifferentiated capability.
Release governance helper (scripts/release.sh). The helper is upgraded from a README-only version bumper into a release close-out utility that can synchronize pyproject, README badges, and CHANGELOG headers in dry-run/apply modes.

Fixed

Synthesis finding reasoning trail inheritance (findings.py). Top-level synthesis findings such as aiedge.findings.web.exec_sink_overlap now inherit matched downstream evidence lineage instead of relying only on the stage-level aggregate summary. Matching prefers run-relative binary path, falls back to binary SHA-256, emits a findings/synthesis_match summary entry, and appends a deterministic top-K sample of representative downstream trail entries.
SBOM stage silent schema mismatch (sbom.py). Vendor-stock firmware no longer silently returns 0 components because of stale inventory.file_list / string_hits assumptions. The stage now walks inventory.roots directly and falls back to direct binary reads via _extract_ascii_runs.
Relative runs_root handling in create_run() (run.py). runs_root is resolved before path derivation so relative output roots still wire absolute firmware paths into extraction; regression coverage lives in tests/test_create_run_relative_runs_root.py.

Verification

python3 -m py_compile scripts/aggregate_corpus_metrics.py
python3 scripts/check_doc_consistency.py
fresh corpus aggregate regenerated from benchmark-results/2c6-fresh-full-v2* waves
representative firmware smoke coverage retained from 2C.1–2C.5 (R7000 lineage / SBOM pilot / verified-chain provenance)

Assets 2

13 Apr 10:12

R00T-Kim

v2.6.0

9b7ecf6

v2.6.0 — Phase 2B: Analyst Copilot + DAG Parallel + Calibration

Phase 2B Release

SCOUT v2.6.0 delivers three axes of change that position it as a single-firmware analyst copilot: performance (DAG parallelization PoC), analyst UX (reasoning trail + MCP override loop), and honest confidence calibration (detection vs priority separation).

Merged via PR #6 (rebase) as 6 atomic commits — any one could ship independently.

1. DAG Parallelization PoC (PR #10)

New stage_dag.py with manual STAGE_DEPS (42 entries) + Kahn topo_levels() (15 levels, max-width 7)
run_stages_parallel() — ThreadPoolExecutor level-wise submit, skip-on-failed-dep, fail_fast=True/False modes. Sequential run_stages() unchanged
New CLI flag: --experimental-parallel [N] (default 4 workers) on both analyze and stages subcommands
ProgressTracker(out_of_order=True) for completion-order rendering in parallel mode

# Opt-in parallel execution
./scout analyze firmware.bin --experimental-parallel 4

2. Reasoning Trail Persistence (PR #11 + PR #13)

New reasoning_trail.py module captures structured ReasoningEntry records for LLM-driven finding adjustments. The adversarial_triage debate loop now records advocate / critic / decision entries with llm_model and 200-char raw_response_excerpt. The fp_verification pattern matcher records sanitizer / non-propagating / sysfile hits with per-pattern delta.

All three analyst surfaces expose the trail:

Web viewer — collapsible <details> section with CSS styling
Analyst markdown report — numbered "Reasoning Trail (N steps)" subsection per finding
TUI finding detail — render_finding_detail_with_trail() (AIEDGE_TUI_ASCII-compatible)

SARIF properties bag gains scout_reasoning_trail.

3. MCP Analyst Tools (PR #12)

4 new MCP tools for analyst-driven feedback:

Tool	Purpose
`scout_get_finding_reasoning`	Fetch full reasoning trail for a finding
`scout_inject_hint`	Push an analyst hint into the feedback registry
`scout_override_verdict`	Force-set a finding verdict (confirmed / false_positive / wont_fix / needs_info)
`scout_filter_by_category`	Filter findings by `vulnerability` / `misconfiguration` / `pipeline_artifact`

terminator_feedback.py extended with add_analyst_hint / get_analyst_hints / set_verdict_override (fcntl.flock-safe, assert_under_dir enforced). The adversarial_triage advocate prompt now reads analyst hints from AIEDGE_FEEDBACK_DIR and prefixes them priority-sorted — opt-in; byte-identical behavior when env var unset.

4. Detection vs Priority Calibration (PR #15)

Closes external reviewer critique that EPSS-additive confidence made SCOUT's confidence field look like a ranking heuristic instead of a true-positive probability.

New scoring.py with:

@dataclass(frozen=True)
class PriorityInputs:
    detection_confidence: float
    epss_score: float | None
    epss_percentile: float | None
    reachability: str | None
    backport_present: bool
    cvss_base: float | None

def compute_priority_score(inputs: PriorityInputs) -> float:
    # Weights: detection 50% / EPSS 25% / reach 15% / CVSS 10%
    # Backport: -0.20 penalty

cve_scan.py:1140-1170 refactored: confidence now strictly capped at STATIC_CODE_VERIFIED_CAP=0.55 (static evidence only). EPSS / reachability / backport / CVSS now feed priority_score instead.

New doc: docs/scoring_calibration.md with a before/after worked example.

5. Extraction Failure Analyst Guidance (PR #14)

When extraction fails, SCOUT now emits a structured guidance block pointing the analyst at concrete next steps:

Detected encryption (entropy 7.95/8.0). Possible vendor decryption needed.
Suggested actions:
  1. Check vendor_decrypt.py for known vendor formats and add a handler for this firmware.
  2. Provide a pre-extracted rootfs: ./scout analyze firmware.bin --rootfs /path/to/extracted
  3. Try binwalk v3 entropy mode / alternative extractor: binwalk --entropy firmware.bin
  4. File an issue with the first 4 KB hex dump: xxd firmware.bin | head -64
Hint: docs/runbook.md#extraction-failure

New docs/runbook.md#extraction-failure section with symptoms/causes/remediation table.

Verification

Metric	v2.5.0	v2.6.0
pytest	865	1027 passed, 1 skipped (+162)
pyright	0 errors	0 errors, 0 warnings (baseline preserved)
ruff	clean	clean
CI	5/5 green	5/5 green

New test distribution: reasoning_trail 20 · extraction_guidance 18 · mcp_analyst_tools 33 · stage_dag 14 · run_stages_parallel 14 · scoring 19 · reasoning_trail_viewer 44

R7000 smoke test (PR #15): 3 findings, all carrying priority_score + priority_inputs. cve_confidence_above_0.55_cap = 0 (detection cap enforced). priority_bucket_counts = {critical: 0, high: 0, medium: 3, low: 0}.

Design Invariants Preserved

Additive-only on findings.py (PR #7a pattern continues for category, now reasoning_trail, priority_score, priority_inputs). No report schema version bump. All 7 downstream consumers untouched.
Sequential run_stages() bit-identical to pre-PR state.
StageContext frozen invariant preserved (thread-safe sharing without locks).
All file writes continue through assert_under_dir().
v2.5.0 LLM driver contracts (system_prompt, temperature, 5-stage parser) unchanged.
200-char raw_response_excerpt cap enforced at ReasoningEntry.__post_init__ (cannot be bypassed by call sites).

Deferred to Phase 2C

Wall-clock comparison sequential vs --experimental-parallel 4 on R7000 (PoC merged, real firmware measurement pending)
Analyst hint injection loop smoke with real LLM driver + AIEDGE_FEEDBACK_DIR
Benchmark freeze rule enforcement (check_baseline_metadata wiring requires fresh corpus baseline)

Upgrade Notes

Breaking: confidence field semantics changed for CVE findings — it is now strictly detection evidence (≤0.55). Consumers previously ranking by confidence should migrate to priority_score. See docs/scoring_calibration.md for the full contract.

Everything else is additive; existing consumers that ignore the new fields continue to work unchanged.

Full diff: v2.5.0...v2.6.0
PR #6: #6

🤖 Generated with Claude Code

Assets 2

13 Apr 04:11

R00T-Kim

v2.5.0

8a2105c

SCOUT v2.5.0 — Structured LLM + EPSS + Observability + CRA Compliance

Strategic Roadmap Phase 1 implementation. Based on 30+ academic papers (LATTE, Operation Mango, HouseFuzz, VulnSage) and competitive analysis (Theori Xint, FirmAgent, EU CRA).

Highlights

LLM structured output (parse failure 100% → 0%)

New llm_prompts.py: 7 role-based system prompts (ADVOCATE/CRITIC/TAINT/CLASSIFIER/REPAIR/SYNTHESIS) + temperature constants
LLMDriver Protocol: system_prompt + temperature parameters wired into all 4 drivers (Codex, Claude API, Claude Code CLI, Ollama)
5-stage JSON parser: preamble strip → fence → raw → brace-counting → common error fix; optional required_keys schema validation
All LLM-using stages updated: adversarial_triage, taint_propagation, semantic_classifier

Sink expansion (`taint_propagation.py`)

_SINK_SYMBOLS: 11 → 28 (memcpy, memmove, strcat, strncpy, gets, vsprintf, printf family, scanf family, dlopen, realpath)
_FORMAT_STRING_SINKS + _is_format_string_variable() helper for variable-controlled format string detection

EPSS scoring (`cve_scan.py`)

FIRST.org API integration with batched queries (30 IDs/request)
Per-run + cross-run cache
Confidence adjustment by EPSS percentile (≥0.10: +0.10, ≥0.01: +0.05, <0.001: -0.05)

Observability

Separate parse_failures vs llm_call_failures counters in adversarial_triage and fp_verification
LLM failure classification helpers: quota_exhausted, driver_unavailable, driver_nonzero_exit

CI/CD & Compliance

New .github/actions/scout-scan/: composite GitHub Action with SARIF upload to GitHub Security tab
docs/cra_compliance_mapping.md: EU Cyber Resilience Act Annex I 12 essential requirements mapped to SCOUT outputs
docs/strategic_roadmap_2026.md: 3-Phase plan (v2.5 → v3.0 → v4.0)

Bug Fixes

CVE scan signature-only path: removed early return so signature-only matches go through the same enrichment / finding-candidate pipeline as NVD matches
CVE scan comp variable bug: backport confidence adjustment now uses per-match component metadata instead of leaked outer loop variable
Semantic classifier batch size: reduced from 50 → 15 functions per LLM call to prevent JSON schema loss in long contexts

R7000 Verification (Netgear, 31MB, codex driver, 2026-04-13)

Metric	Pre-v2.5 (1211 run)	v2.5.0 (1320 run)
`adversarial_triage` parse_failures	100/100	0/100
`adversarial_triage` parsed_ok	0/100	100/100
`fp_verification` unverified	97/100	0/100
`fp_verification` true_positives	1	57
`fp_verification` false_positives	2	43
`cve_scan` EPSS enriched	0/23	23/23

Adversarial debate: 100 debated → 99 downgraded (FP) + 1 maintained (TP)

Run: `aiedge-runs/2026-04-12_1320_sha256-b28bf08e9d2c`

Stats

21 files changed
3,695 insertions / 792 deletions
7 new files

See CHANGELOG.md for full details.

Assets 2

11 Apr 07:55

R00T-Kim

v2.4.1

3352783

SCOUT v2.4.1

Terminator Re-evaluation Fixes

After v2.4.0, Terminator identified 3 issues. All addressed:

Confidence Calibration

decompiled_colocated: 0.60 → 0.45 (0.50 for high-risk sinks)
Separate caps per method: pcode_colocated 0.65, decompiled_colocated 0.50, decompiled_interprocedural 0.60

addr_diff Removal

Replaced fragile addr_diff > 16 address matching with callee name resolution via resolve_call_target()
Robust against compiler optimizations and instruction alignment differences

Interprocedural Taint (Strategy 4)

Cross-function source→sink detection using xref call graph
Caller with source API calls callee with sink API → decompiled_interprocedural trace
1-hop depth limit to control false positives
Verified: fread→vsprintf across FUN_00012514→FUN_00011fe0 in RT-AX88U

Metric	v2.4.0	v2.4.1
Total taint (RT-AX88U)	15	16
Interprocedural traces	0	1
decompiled_colocated conf	0.60	0.45

🤖 Generated with Claude Code

Assets 2

11 Apr 07:32

R00T-Kim

v2.4.0

5e09a00

SCOUT v2.4.0

Detection Engine Upgrade

Driven by Terminator's evaluation of ASUS RT-AX88U findings — "framework is top-tier, detection core needs depth" — this release significantly upgrades SCOUT's vulnerability detection capabilities.

Highlights

Ghidra P-code Taint Analysis — 3-strategy dataflow tracing replaces symbol co-occurrence: P-code SSA forward taint → P-code colocated → decompiled body analysis
3-Tier Confidence System — PCODE_VERIFIED_CAP = 0.75 joins existing co-occurrence (0.40) and code-verified (0.55) tiers
4 New Rule Families — SQL injection, format string, path traversal, SSRF detection (9 regex patterns across PHP/Python/C/shell)
CGI Handler Detection — Ghidra string_refs extraction of do_*_cgi function names as source endpoints
SBOM Backport Detection — opkg patch revision parsing with -0.30 confidence for backported packages
Handoff Schema — firmware_handoff.json now includes adversarial triage schema reference for downstream consumers

Verified

Metric	v2.3.0	v2.4.0	Change
Taint results (RT-AX88U)	10	15	+50%
Max confidence	0.40	0.60	+50%
Ghidra-verified traces	0	5	New
Sanitizer detection	N/A	2 detected	New
Detection rule families	5	9	+80%

New Files

src/aiedge/ghidra_scripts/pcode_taint.py — P-code SSA forward/backward taint analysis

🤖 Generated with Claude Code

Assets 2

Releases: R00T-Kim/SCOUT