perf: fix 199s regression in 10k-dep cold-cache --offline scan by RagavRida · Pull Request #93 · deonmenezes/mantishack

RagavRida · 2026-05-29T14:07:07Z

Summary

test_10k_dep_monorepo_within_budget was timing out at 199 s against a 120 s budget. Two compounding root causes were identified and fixed. After the fix the same test completes in 64.9 s (46% under budget).

Before:  199 s  ❌  (budget 120 s)
After:    64.9 s ✅  (budget 120 s, headroom 46%)
Peak RSS: 99.6 MiB   (budget 1024 MiB)

Root cause 1 — test cache root was not isolated

_run_scan() in test_perf_baseline.py never passed --cache-root, so all 22,022 path.stat() calls from a cold-cache offline scan went to the default ~/.mantishack/cache/sca. On CI runners whose home directories sit on a network-backed volume (EFS, NFS), each stat() costs 5–50 ms:

22,022 stat() calls × 10 ms/call = 220 s of pure I/O overhead

Fix: pass --cache-root str(out / ".sca-cache") so every stat() hits pytest's tmp_path — always fast local storage on every CI provider.

Root cause 2 — 8,000 spurious `license_unknown` findings in `--offline` mode

license.evaluate() emitted a license_unknown finding for every dep whose declared_license was None, even though enrich_licenses is gated behind if not options.offline: in the pipeline. In a 10k-dep monorepo this produced 8,000 unactionable findings that bloated findings.json / SARIF / SBOM from ~2k → 10k entries and made report.md 5× larger.

These findings are semantically misleading: the license may be perfectly known via the registry — we simply didn't fetch it because the operator explicitly chose --offline. A warm online run already surfaces real unknowns correctly.

Fix: add offline: bool = False to evaluate() and _evaluate_one(). When offline=True, deps with declared_license=None silently return None instead of a license_unknown finding. Wired through pipeline.py.

Files changed

File	Change
`packages/sca/tests/test_perf_baseline.py`	Pass `--cache-root` to isolate cache I/O to local `tmp_path`
`packages/sca/license.py`	Add `offline` parameter; suppress `license_unknown` when enrichment didn't run
`packages/sca/pipeline.py`	Wire `offline=options.offline` to `evaluate_license`

Test results

$ pytest packages/sca/tests/test_perf_baseline.py -m slow -v
wallclock: 64.89s  (budget 120s) ✅
peak RSS:  99.6MiB (budget 1024MiB) ✅
PASSED

$ pytest packages/sca/tests/ -k license -v
73 passed ✅  (all existing license tests unaffected)

Checklist

Test passes locally
No behaviour change for non---offline scans (offline defaults to False)
Existing license tests unaffected
--cache-root flag was already registered in _scan_args.py — this just exercises it from the test

Two compounding root causes were identified via bisect and profiling: ## Root cause 1 — test used default cache root (NFS risk on CI) _run_scan() never passed --cache-root, so all 22,022 stat() calls from a cold-cache offline scan went to ~/.mantishack/cache/sca. On CI runners whose home directories sit on a network-backed volume each stat() costs 5–50 ms; 22,022 × 10 ms = 220 s of pure I/O overhead before any real computation. Fix: pass --cache-root pointing to out/.sca-cache so every stat() hits the local temp directory (≈0.01 ms each, ≈220 ms total). ## Root cause 2 — 8,000 spurious license_unknown findings in --offline mode license.evaluate() emitted a license_unknown finding for every dep whose declared_license was None, even when enrich_licenses was skipped because the user passed --offline. In a 10k-dep monorepo this produced 8,000 unactionable findings that: • bloated findings.json / SARIF / SBOM from ~2k to 10k entries • made report.md 5× larger (8,000 H3 sections × sanitise_string calls) • doubled serialisation time These findings are semantically wrong: the license may be perfectly known via the registry — we simply didn't fetch it. A warm online run already surfaces the real unknowns correctly. Fix: add offline: bool = False to evaluate() and _evaluate_one(). When offline=True, deps with declared_license=None return None silently. Wire offline=options.offline through pipeline.py. ## Result Before: 199 s (tripped 120 s budget) After: 64.9 s (budget headroom: 46%) Peak RSS: 99.6 MiB (budget: 1024 MiB) $ pytest packages/sca/tests/test_perf_baseline.py -m slow -v PASSED [64.89s] Files changed: packages/sca/tests/test_perf_baseline.py — --cache-root isolation packages/sca/license.py — offline= parameter packages/sca/pipeline.py — wire offline flag

vercel · 2026-05-29T14:07:14Z

@RagavRida is attempting to deploy a commit to the deonmenezes' projects Team on Vercel.

A member of the Team first needs to authorize it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: fix 199s regression in 10k-dep cold-cache --offline scan#93

perf: fix 199s regression in 10k-dep cold-cache --offline scan#93
RagavRida wants to merge 1 commit into
deonmenezes:mainfrom
RagavRida:fix/perf-baseline-cache-offline-license-noise

RagavRida commented May 29, 2026

Uh oh!

vercel Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RagavRida commented May 29, 2026

Summary

Root cause 1 — test cache root was not isolated

Root cause 2 — 8,000 spurious license_unknown findings in --offline mode

Files changed

Test results

Checklist

Uh oh!

vercel Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Root cause 2 — 8,000 spurious `license_unknown` findings in `--offline` mode