Skip to content

research: Rich token-label schema (sink / source / sanitizer / evidence / pre-emission) #9

Description

@peaktwilight

Current span labels are binary (y_tok ∈ {0,1}: inside the vulnerable line or not). The proper label schema (per the GPT-suggested format and standard vuln-analysis taxonomy) is multi-class per token:

{
  "is_completion_vulnerable": true,
  "is_functional": true,
  "cwe": "CWE-089",
  "token_labels": {
    "evidence": [...],
    "sink": [...],
    "source": [...],
    "sanitizer": [...],
    "vulnerable_line": [...]
  },
  "label_confidence": "dynamic_oracle+human_verified"
}
Label type Meaning Example
Sink token participates in dangerous operation eval, subprocess.run(..., shell=True)
Source token carries attacker-controlled data request.args["q"], req.body.name
Missing-sanitizer absence of code SQL query without parameterization
Evidence token a reviewer/tool used to decide strcpy, raw SQL interpolation
Pre-emission risk hidden state BEFORE vuln appears, predicts future vuln token right before model emits sink

Why this matters: the streaming probe should fire on pre-emission risk tokens, not just on the vulnerable line itself. That's the actual "warn before the unsafe character is finalized" signal we want. Source/sink/sanitizer heads give us a richer UI ("this is about to be SQL injection because line 5 has a source and line 7 has a sink and no sanitizer between them").

DoD:

  1. Migrate dataset format from single label to the multi-head schema above (CyberSecEval + SVEN rows).
  2. Train a multi-head probe (one logit per label type) with span-max loss per head.
  3. UI shows which class of risk each red span is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    path:token-levelToken-level probe path (per-token spans, value head, BCE + span-max)researchResearch / experiments / paper-tracking

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions