research: Rich token-label schema (sink / source / sanitizer / evidence / pre-emission)

Current span labels are binary (`y_tok ∈ {0,1}`: inside the vulnerable line or not). The proper label schema (per the GPT-suggested format and standard vuln-analysis taxonomy) is multi-class per token:

```json
{
  "is_completion_vulnerable": true,
  "is_functional": true,
  "cwe": "CWE-089",
  "token_labels": {
    "evidence": [...],
    "sink": [...],
    "source": [...],
    "sanitizer": [...],
    "vulnerable_line": [...]
  },
  "label_confidence": "dynamic_oracle+human_verified"
}
```

| Label type | Meaning | Example |
|---|---|---|
| **Sink** | token participates in dangerous operation | `eval`, `subprocess.run(..., shell=True)` |
| **Source** | token carries attacker-controlled data | `request.args["q"]`, `req.body.name` |
| **Missing-sanitizer** | absence of code | SQL query without parameterization |
| **Evidence** | token a reviewer/tool used to decide | `strcpy`, raw SQL interpolation |
| **Pre-emission risk** | hidden state BEFORE vuln appears, predicts future vuln | token right before model emits sink |

**Why this matters:** the streaming probe should fire on **pre-emission risk** tokens, not just on the vulnerable line itself. That's the actual "warn before the unsafe character is finalized" signal we want. Source/sink/sanitizer heads give us a richer UI ("this is about to be SQL injection because line 5 has a source and line 7 has a sink and no sanitizer between them").

**DoD:**
1. Migrate dataset format from single `label` to the multi-head schema above (CyberSecEval + SVEN rows).
2. Train a multi-head probe (one logit per label type) with span-max loss per head.
3. UI shows which class of risk each red span is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: Rich token-label schema (sink / source / sanitizer / evidence / pre-emission) #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Label type	Meaning	Example
Sink	token participates in dangerous operation	`eval`, `subprocess.run(..., shell=True)`
Source	token carries attacker-controlled data	`request.args["q"]`, `req.body.name`
Missing-sanitizer	absence of code	SQL query without parameterization
Evidence	token a reviewer/tool used to decide	`strcpy`, raw SQL interpolation
Pre-emission risk	hidden state BEFORE vuln appears, predicts future vuln	token right before model emits sink

research: Rich token-label schema (sink / source / sanitizer / evidence / pre-emission) #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions