Current span labels are binary (y_tok ∈ {0,1}: inside the vulnerable line or not). The proper label schema (per the GPT-suggested format and standard vuln-analysis taxonomy) is multi-class per token:
{
"is_completion_vulnerable": true,
"is_functional": true,
"cwe": "CWE-089",
"token_labels": {
"evidence": [...],
"sink": [...],
"source": [...],
"sanitizer": [...],
"vulnerable_line": [...]
},
"label_confidence": "dynamic_oracle+human_verified"
}
| Label type |
Meaning |
Example |
| Sink |
token participates in dangerous operation |
eval, subprocess.run(..., shell=True) |
| Source |
token carries attacker-controlled data |
request.args["q"], req.body.name |
| Missing-sanitizer |
absence of code |
SQL query without parameterization |
| Evidence |
token a reviewer/tool used to decide |
strcpy, raw SQL interpolation |
| Pre-emission risk |
hidden state BEFORE vuln appears, predicts future vuln |
token right before model emits sink |
Why this matters: the streaming probe should fire on pre-emission risk tokens, not just on the vulnerable line itself. That's the actual "warn before the unsafe character is finalized" signal we want. Source/sink/sanitizer heads give us a richer UI ("this is about to be SQL injection because line 5 has a source and line 7 has a sink and no sanitizer between them").
DoD:
- Migrate dataset format from single
label to the multi-head schema above (CyberSecEval + SVEN rows).
- Train a multi-head probe (one logit per label type) with span-max loss per head.
- UI shows which class of risk each red span is.
Current span labels are binary (
y_tok ∈ {0,1}: inside the vulnerable line or not). The proper label schema (per the GPT-suggested format and standard vuln-analysis taxonomy) is multi-class per token:{ "is_completion_vulnerable": true, "is_functional": true, "cwe": "CWE-089", "token_labels": { "evidence": [...], "sink": [...], "source": [...], "sanitizer": [...], "vulnerable_line": [...] }, "label_confidence": "dynamic_oracle+human_verified" }eval,subprocess.run(..., shell=True)request.args["q"],req.body.namestrcpy, raw SQL interpolationWhy this matters: the streaming probe should fire on pre-emission risk tokens, not just on the vulnerable line itself. That's the actual "warn before the unsafe character is finalized" signal we want. Source/sink/sanitizer heads give us a richer UI ("this is about to be SQL injection because line 5 has a source and line 7 has a sink and no sanitizer between them").
DoD:
labelto the multi-head schema above (CyberSecEval + SVEN rows).