fix(meta-analyzer): LLM-confirmed findings dropped when model returns end_line

# Meta-analyzer silently drops LLM-confirmed findings when the model returns `end_line` (security: false negatives, e.g. dropped CVEs)

## Summary

`LLMMetaAnalyzer.apply_filter` (`src/skillspector/nodes/meta_analyzer.py`) matches each static finding against the LLM's confirmation using a key that includes `end_line`. When the LLM **confirms** a finding as a real vulnerability but returns a non-null `end_line` (e.g. `end_line == start_line`) while the static finding carries `end_line = None`, **none of the three lookup keys match and the confirmed finding is silently dropped** (`continue`).

Because the meta-analyzer is a drop-by-default whitelist filter, the failure mode is a **false negative**: real, LLM-confirmed findings — including live OSV/CVE supply-chain findings — disappear from the report, and the skill's risk score can collapse from CRITICAL to SAFE.

## Impact / Severity

- **Security-relevant false negative.** A skill with known-vulnerable dependencies scored `100 / CRITICAL` in static-only mode but `0 / SAFE` once LLM analysis was enabled, because all 7 supply-chain findings (5 of them live OSV CVEs, e.g. `PyYAML==5.1`) were dropped by the filter even though the LLM confirmed 6 of them with `is_vulnerability=True, confidence≈1.0`.
- This is **not** a model-quality problem — the LLM classified correctly. It is a key-matching defect in `apply_filter` that is triggered by a legitimate, schema-valid response shape.

## Environment

- SkillSpector `v2.1.4` (commit `cff7ecc`)
- Python 3.13
- Triggered when the configured LLM populates `end_line` in `MetaAnalyzerFinding`. Observed with DeepSeek via the OpenAI-compatible path (`deepseek-chat`); any model that fills `end_line` for single-line findings will trigger it. Stock OpenAI models tend to leave `end_line` unset, which is why this has stayed latent.

## Root cause

`MetaAnalyzerFinding.end_line` is optional (the schema explicitly allows the model to provide it). Static analyzers commonly emit findings with `end_line = None` (single-line or dependency findings).

In `apply_filter`, the confirmation index is built **with** `end_line` (current `meta_analyzer.py:301-312`):

```python
if start_line is not None:
    end_line = item.get("end_line")
    confirmed_granular[(file_path, pattern_id, int(start_line),
                        int(end_line) if end_line is not None else None)] = enrichment
else:
    confirmed_coarse[(file_path, pattern_id)] = enrichment
```

and looked up with three keys (`meta_analyzer.py:316-326`):

```python
exact_key      = (f.file, f.rule_id, f.start_line, f.end_line)   # static end_line is None
start_only_key = (f.file, f.rule_id, f.start_line, None)         # forces None
coarse_key     = (f.file, f.rule_id)                             # only used if LLM omitted start_line
if   exact_key      in confirmed_granular: ...
elif start_only_key in confirmed_granular: ...
elif coarse_key     in confirmed_coarse:   ...
else:
    continue   # <-- finding dropped
```

Concrete mismatch for a `SC4` finding on line 4:

| source | tuple |
|---|---|
| static finding | `("requirements.txt", "SC4", 4, None)` |
| stored in `confirmed_granular` (LLM filled `end_line=4`) | `("requirements.txt", "SC4", 4, 4)` |
| `exact_key` lookup | `(..., 4, None)` → miss |
| `start_only_key` lookup | `(..., 4, None)` → miss |
| `coarse_key` lookup | `confirmed_coarse` is empty (LLM provided `start_line`) → miss |

The three branches cover "LLM `end_line` equals static `end_line`" and "LLM omitted `end_line`", but **not** "LLM provided an `end_line` while the static finding's is `None`".

## Minimal reproduction (no API key / network)

`apply_filter` can be exercised directly with a simulated LLM response:

```python
from skillspector.models import Finding
from skillspector.nodes.meta_analyzer import LLMMetaAnalyzer


class _FakeBatch:
    def __init__(self, file_path):
        self.file_path = file_path


def make_finding(rule_id, start_line):
    return Finding(rule_id=rule_id, message=f"Vuln ({rule_id})", severity="CRITICAL",
                   confidence=0.9, file="requirements.txt",
                   start_line=start_line, end_line=None, remediation="")


findings = [make_finding("SC4", 4), make_finding("SC4", 5)]
llm_items = [
    {"pattern_id": "SC4", "is_vulnerability": True, "confidence": 1.0,
     "start_line": 4, "end_line": 4, "_file": "requirements.txt"},
    {"pattern_id": "SC4", "is_vulnerability": True, "confidence": 1.0,
     "start_line": 5, "end_line": 5, "_file": "requirements.txt"},
]
batch_results = [(_FakeBatch("requirements.txt"), llm_items)]

analyzer = LLMMetaAnalyzer.__new__(LLMMetaAnalyzer)  # skip __init__ (no LLM needed)
kept = analyzer.apply_filter(findings, batch_results)
print(f"confirmed={sum(i['is_vulnerability'] for i in llm_items)} kept={len(kept)}")
```

Output on `v2.1.4`:

```
confirmed=2 kept=0      # both LLM-confirmed findings dropped
```

Expected:

```
confirmed=2 kept=2
```

## Suggested fix

Add an `end_line`-agnostic fallback keyed by `(file, rule_id, start_line)`. It only relaxes the line-matching; the `is_vulnerability` / `confidence >= 0.6` gating upstream is unchanged, so it cannot resurrect findings the LLM rejected (verified: a finding the LLM marked `is_vulnerability=False` stays dropped).

```diff
         confirmed_granular: dict[tuple[str, str, int, int | None], _enrichment] = {}
+        # end_line-agnostic index: some models populate end_line==start_line while
+        # static findings carry end_line=None, which made all three lookups miss
+        # and silently drop confirmed findings.
+        confirmed_by_start: dict[tuple[str, str, int], _enrichment] = {}
         confirmed_coarse: dict[tuple[str, str], _enrichment] = {}
@@
                     ] = enrichment
+                    confirmed_by_start[(file_path, pattern_id, int(start_line))] = enrichment
                 else:
                     confirmed_coarse[(file_path, pattern_id)] = enrichment
@@
             coarse_key = (f.file, f.rule_id)
+            start_key = (f.file, f.rule_id, f.start_line) if f.start_line is not None else None
             if exact_key in confirmed_granular:
                 expl, rem, conf = confirmed_granular[exact_key]
             elif start_only_key in confirmed_granular:
                 expl, rem, conf = confirmed_granular[start_only_key]
+            elif start_key is not None and start_key in confirmed_by_start:
+                expl, rem, conf = confirmed_by_start[start_key]
             elif coarse_key in confirmed_coarse:
                 expl, rem, conf = confirmed_coarse[coarse_key]
             else:
                 continue
```

After the fix the reproduction prints `confirmed=2 kept=2`, and the end-to-end supply-chain scan returns to `100 / CRITICAL` (the LLM-rejected typosquatting finding correctly stays dropped, so the result is the 6 confirmed findings, not a blanket pass-through).

## Suggested hardening (optional, separate from the fix)

Given that this is a security tool, consider making high-assurance static findings (OSV/CVE supply-chain, secrets, dangerous-code/AST) **non-suppressible** by the LLM filter — i.e. the LLM may add or annotate findings but never remove a deterministic static finding. That would bound the blast radius of any future matching defect to "extra noise" rather than "dropped CVE".

## Suggested regression test

A unit test asserting `apply_filter` keeps a confirmed finding when the LLM returns `end_line != None` and the static finding has `end_line == None` would lock this in. The minimal reproduction above can serve as the basis.


source	tuple
static finding	`("requirements.txt", "SC4", 4, None)`
stored in `confirmed_granular` (LLM filled `end_line=4`)	`("requirements.txt", "SC4", 4, 4)`
`exact_key` lookup	`(..., 4, None)` → miss
`start_only_key` lookup	`(..., 4, None)` → miss
`coarse_key` lookup	`confirmed_coarse` is empty (LLM provided `start_line`) → miss

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(meta-analyzer): LLM-confirmed findings dropped when model returns end_line #67

Meta-analyzer silently drops LLM-confirmed findings when the model returns `end_line` (security: false negatives, e.g. dropped CVEs)

Summary

Impact / Severity

Environment

Root cause

Minimal reproduction (no API key / network)

Suggested fix

Suggested hardening (optional, separate from the fix)

Suggested regression test

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fix(meta-analyzer): LLM-confirmed findings dropped when model returns end_line #67

Description

Meta-analyzer silently drops LLM-confirmed findings when the model returns end_line (security: false negatives, e.g. dropped CVEs)

Summary

Impact / Severity

Environment

Root cause

Minimal reproduction (no API key / network)

Suggested fix

Suggested hardening (optional, separate from the fix)

Suggested regression test

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Meta-analyzer silently drops LLM-confirmed findings when the model returns `end_line` (security: false negatives, e.g. dropped CVEs)