Skip to content

Redact high-entropy substrings from SAST snippets before sending to remote LLM #3

@nirmalgupta

Description

@nirmalgupta

Severity: LOW
Category: Data Exposure (triage path only)
Source: Initial security review (pre-public)

Risk

`secscan/triage.py` (`_finding_brief`) includes `extra["snippet"]` (truncated to 200 chars) in the JSON sent to Ollama for SAST findings. The snippet is the matched code region from Semgrep.

If a user has a hardcoded credential in their source AND triage is enabled AND Ollama is hosted off-host (not the default `host.docker.internal`), the credential reaches the model host. This is not a flaw in the deterministic path — the deterministic path uses `masked_preview` for secrets and never sends raw snippets. It's a triage-only leak vector for an edge configuration.

Fix options (pick one)

Option A — Network policy (simplest):

  • In `Triage.init`, if `cfg.base_url` does not resolve to a loopback / private address, log a warning and refuse to send. Document that triage is intended for a local Ollama only.

Option B — Entropy filter:

  • Before sending, walk strings in `safe["extra"]` and `safe["message"]` and replace any contiguous substring matching:
    • high Shannon entropy (≥ 4.0 over ≥ 20 chars) → `REDACTED:high-entropy`
    • a few well-known token shapes (AKIA…, ghp_/gho_/github_pat_, sk_live_, eyJ… JWTs, `-----BEGIN .* KEY-----` blocks) → `REDACTED:looks-like-token`
  • Add a test using a fixture finding whose snippet contains a fake AKIA-style and ghp-style string and assert they do not appear in the bytes posted to Ollama.

Option C — Belt and braces: do both.

Acceptance

  • A test asserts no high-entropy substring or known-token-shape leaves `Triage` over HTTP
  • Either remote-Ollama is refused, or snippets are redacted before send (or both)
  • Existing triage tests still pass
  • README `triage` section notes that snippets are sanitized / that remote Ollama is unsupported

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions