Meta-analyzer silently drops LLM-confirmed findings when the model returns end_line (security: false negatives, e.g. dropped CVEs)
Summary
LLMMetaAnalyzer.apply_filter (src/skillspector/nodes/meta_analyzer.py) matches each static finding against the LLM's confirmation using a key that includes end_line. When the LLM confirms a finding as a real vulnerability but returns a non-null end_line (e.g. end_line == start_line) while the static finding carries end_line = None, none of the three lookup keys match and the confirmed finding is silently dropped (continue).
Because the meta-analyzer is a drop-by-default whitelist filter, the failure mode is a false negative: real, LLM-confirmed findings — including live OSV/CVE supply-chain findings — disappear from the report, and the skill's risk score can collapse from CRITICAL to SAFE.
Impact / Severity
- Security-relevant false negative. A skill with known-vulnerable dependencies scored
100 / CRITICAL in static-only mode but 0 / SAFE once LLM analysis was enabled, because all 7 supply-chain findings (5 of them live OSV CVEs, e.g. PyYAML==5.1) were dropped by the filter even though the LLM confirmed 6 of them with is_vulnerability=True, confidence≈1.0.
- This is not a model-quality problem — the LLM classified correctly. It is a key-matching defect in
apply_filter that is triggered by a legitimate, schema-valid response shape.
Environment
- SkillSpector
v2.1.4 (commit cff7ecc)
- Python 3.13
- Triggered when the configured LLM populates
end_line in MetaAnalyzerFinding. Observed with DeepSeek via the OpenAI-compatible path (deepseek-chat); any model that fills end_line for single-line findings will trigger it. Stock OpenAI models tend to leave end_line unset, which is why this has stayed latent.
Root cause
MetaAnalyzerFinding.end_line is optional (the schema explicitly allows the model to provide it). Static analyzers commonly emit findings with end_line = None (single-line or dependency findings).
In apply_filter, the confirmation index is built with end_line (current meta_analyzer.py:301-312):
if start_line is not None:
end_line = item.get("end_line")
confirmed_granular[(file_path, pattern_id, int(start_line),
int(end_line) if end_line is not None else None)] = enrichment
else:
confirmed_coarse[(file_path, pattern_id)] = enrichment
and looked up with three keys (meta_analyzer.py:316-326):
exact_key = (f.file, f.rule_id, f.start_line, f.end_line) # static end_line is None
start_only_key = (f.file, f.rule_id, f.start_line, None) # forces None
coarse_key = (f.file, f.rule_id) # only used if LLM omitted start_line
if exact_key in confirmed_granular: ...
elif start_only_key in confirmed_granular: ...
elif coarse_key in confirmed_coarse: ...
else:
continue # <-- finding dropped
Concrete mismatch for a SC4 finding on line 4:
| source |
tuple |
| static finding |
("requirements.txt", "SC4", 4, None) |
stored in confirmed_granular (LLM filled end_line=4) |
("requirements.txt", "SC4", 4, 4) |
exact_key lookup |
(..., 4, None) → miss |
start_only_key lookup |
(..., 4, None) → miss |
coarse_key lookup |
confirmed_coarse is empty (LLM provided start_line) → miss |
The three branches cover "LLM end_line equals static end_line" and "LLM omitted end_line", but not "LLM provided an end_line while the static finding's is None".
Minimal reproduction (no API key / network)
apply_filter can be exercised directly with a simulated LLM response:
from skillspector.models import Finding
from skillspector.nodes.meta_analyzer import LLMMetaAnalyzer
class _FakeBatch:
def __init__(self, file_path):
self.file_path = file_path
def make_finding(rule_id, start_line):
return Finding(rule_id=rule_id, message=f"Vuln ({rule_id})", severity="CRITICAL",
confidence=0.9, file="requirements.txt",
start_line=start_line, end_line=None, remediation="")
findings = [make_finding("SC4", 4), make_finding("SC4", 5)]
llm_items = [
{"pattern_id": "SC4", "is_vulnerability": True, "confidence": 1.0,
"start_line": 4, "end_line": 4, "_file": "requirements.txt"},
{"pattern_id": "SC4", "is_vulnerability": True, "confidence": 1.0,
"start_line": 5, "end_line": 5, "_file": "requirements.txt"},
]
batch_results = [(_FakeBatch("requirements.txt"), llm_items)]
analyzer = LLMMetaAnalyzer.__new__(LLMMetaAnalyzer) # skip __init__ (no LLM needed)
kept = analyzer.apply_filter(findings, batch_results)
print(f"confirmed={sum(i['is_vulnerability'] for i in llm_items)} kept={len(kept)}")
Output on v2.1.4:
confirmed=2 kept=0 # both LLM-confirmed findings dropped
Expected:
Suggested fix
Add an end_line-agnostic fallback keyed by (file, rule_id, start_line). It only relaxes the line-matching; the is_vulnerability / confidence >= 0.6 gating upstream is unchanged, so it cannot resurrect findings the LLM rejected (verified: a finding the LLM marked is_vulnerability=False stays dropped).
confirmed_granular: dict[tuple[str, str, int, int | None], _enrichment] = {}
+ # end_line-agnostic index: some models populate end_line==start_line while
+ # static findings carry end_line=None, which made all three lookups miss
+ # and silently drop confirmed findings.
+ confirmed_by_start: dict[tuple[str, str, int], _enrichment] = {}
confirmed_coarse: dict[tuple[str, str], _enrichment] = {}
@@
] = enrichment
+ confirmed_by_start[(file_path, pattern_id, int(start_line))] = enrichment
else:
confirmed_coarse[(file_path, pattern_id)] = enrichment
@@
coarse_key = (f.file, f.rule_id)
+ start_key = (f.file, f.rule_id, f.start_line) if f.start_line is not None else None
if exact_key in confirmed_granular:
expl, rem, conf = confirmed_granular[exact_key]
elif start_only_key in confirmed_granular:
expl, rem, conf = confirmed_granular[start_only_key]
+ elif start_key is not None and start_key in confirmed_by_start:
+ expl, rem, conf = confirmed_by_start[start_key]
elif coarse_key in confirmed_coarse:
expl, rem, conf = confirmed_coarse[coarse_key]
else:
continue
After the fix the reproduction prints confirmed=2 kept=2, and the end-to-end supply-chain scan returns to 100 / CRITICAL (the LLM-rejected typosquatting finding correctly stays dropped, so the result is the 6 confirmed findings, not a blanket pass-through).
Suggested hardening (optional, separate from the fix)
Given that this is a security tool, consider making high-assurance static findings (OSV/CVE supply-chain, secrets, dangerous-code/AST) non-suppressible by the LLM filter — i.e. the LLM may add or annotate findings but never remove a deterministic static finding. That would bound the blast radius of any future matching defect to "extra noise" rather than "dropped CVE".
Suggested regression test
A unit test asserting apply_filter keeps a confirmed finding when the LLM returns end_line != None and the static finding has end_line == None would lock this in. The minimal reproduction above can serve as the basis.
Meta-analyzer silently drops LLM-confirmed findings when the model returns
end_line(security: false negatives, e.g. dropped CVEs)Summary
LLMMetaAnalyzer.apply_filter(src/skillspector/nodes/meta_analyzer.py) matches each static finding against the LLM's confirmation using a key that includesend_line. When the LLM confirms a finding as a real vulnerability but returns a non-nullend_line(e.g.end_line == start_line) while the static finding carriesend_line = None, none of the three lookup keys match and the confirmed finding is silently dropped (continue).Because the meta-analyzer is a drop-by-default whitelist filter, the failure mode is a false negative: real, LLM-confirmed findings — including live OSV/CVE supply-chain findings — disappear from the report, and the skill's risk score can collapse from CRITICAL to SAFE.
Impact / Severity
100 / CRITICALin static-only mode but0 / SAFEonce LLM analysis was enabled, because all 7 supply-chain findings (5 of them live OSV CVEs, e.g.PyYAML==5.1) were dropped by the filter even though the LLM confirmed 6 of them withis_vulnerability=True, confidence≈1.0.apply_filterthat is triggered by a legitimate, schema-valid response shape.Environment
v2.1.4(commitcff7ecc)end_lineinMetaAnalyzerFinding. Observed with DeepSeek via the OpenAI-compatible path (deepseek-chat); any model that fillsend_linefor single-line findings will trigger it. Stock OpenAI models tend to leaveend_lineunset, which is why this has stayed latent.Root cause
MetaAnalyzerFinding.end_lineis optional (the schema explicitly allows the model to provide it). Static analyzers commonly emit findings withend_line = None(single-line or dependency findings).In
apply_filter, the confirmation index is built withend_line(currentmeta_analyzer.py:301-312):and looked up with three keys (
meta_analyzer.py:316-326):Concrete mismatch for a
SC4finding on line 4:("requirements.txt", "SC4", 4, None)confirmed_granular(LLM filledend_line=4)("requirements.txt", "SC4", 4, 4)exact_keylookup(..., 4, None)→ missstart_only_keylookup(..., 4, None)→ misscoarse_keylookupconfirmed_coarseis empty (LLM providedstart_line) → missThe three branches cover "LLM
end_lineequals staticend_line" and "LLM omittedend_line", but not "LLM provided anend_linewhile the static finding's isNone".Minimal reproduction (no API key / network)
apply_filtercan be exercised directly with a simulated LLM response:Output on
v2.1.4:Expected:
Suggested fix
Add an
end_line-agnostic fallback keyed by(file, rule_id, start_line). It only relaxes the line-matching; theis_vulnerability/confidence >= 0.6gating upstream is unchanged, so it cannot resurrect findings the LLM rejected (verified: a finding the LLM markedis_vulnerability=Falsestays dropped).confirmed_granular: dict[tuple[str, str, int, int | None], _enrichment] = {} + # end_line-agnostic index: some models populate end_line==start_line while + # static findings carry end_line=None, which made all three lookups miss + # and silently drop confirmed findings. + confirmed_by_start: dict[tuple[str, str, int], _enrichment] = {} confirmed_coarse: dict[tuple[str, str], _enrichment] = {} @@ ] = enrichment + confirmed_by_start[(file_path, pattern_id, int(start_line))] = enrichment else: confirmed_coarse[(file_path, pattern_id)] = enrichment @@ coarse_key = (f.file, f.rule_id) + start_key = (f.file, f.rule_id, f.start_line) if f.start_line is not None else None if exact_key in confirmed_granular: expl, rem, conf = confirmed_granular[exact_key] elif start_only_key in confirmed_granular: expl, rem, conf = confirmed_granular[start_only_key] + elif start_key is not None and start_key in confirmed_by_start: + expl, rem, conf = confirmed_by_start[start_key] elif coarse_key in confirmed_coarse: expl, rem, conf = confirmed_coarse[coarse_key] else: continueAfter the fix the reproduction prints
confirmed=2 kept=2, and the end-to-end supply-chain scan returns to100 / CRITICAL(the LLM-rejected typosquatting finding correctly stays dropped, so the result is the 6 confirmed findings, not a blanket pass-through).Suggested hardening (optional, separate from the fix)
Given that this is a security tool, consider making high-assurance static findings (OSV/CVE supply-chain, secrets, dangerous-code/AST) non-suppressible by the LLM filter — i.e. the LLM may add or annotate findings but never remove a deterministic static finding. That would bound the blast radius of any future matching defect to "extra noise" rather than "dropped CVE".
Suggested regression test
A unit test asserting
apply_filterkeeps a confirmed finding when the LLM returnsend_line != Noneand the static finding hasend_line == Nonewould lock this in. The minimal reproduction above can serve as the basis.