research: Token-level training with span-max loss from Obeso et al. §3

## What

Move from snippet-level logistic regression to the proper token-level + span-max formulation described in Obeso et al. §3 (https://arxiv.org/abs/2509.03531).

The current baseline trains sklearn `LogisticRegression` on snippet-level labels (one label per snippet, taken at the last-token activation) and reaches AUC 0.846. The paper's formulation trains a small PyTorch head with:

```
L = (1-ω) · Σ w_i · BCE(y_i, p_i) + ω · Σ BCE(y_s, max p_i for i in s)
```

where `ω` is annealed `0 → 1` over training (token-level supervision early, span-level later), and entity tokens are up-weighted by `α = 10`.

## Why

Token-level supervision should give us calibrated per-token scores (which the UI already pretends to render) instead of broadcasting a single snippet label across every token. The span-max term then re-aligns the loss with the snippet-level objective we actually care about at inference.

## Status / context

- `src/train_probe_spanmax.py` — partial implementation of the loss + training loop
- `src/extract_token_activations.py` — per-token + span activation extractor (done)
- Integration gap: wire the activation extraction pipeline end-to-end and report AUC

## Definition of done

- Span-max probe trained on the merged CyberSecEval + SVEN dataset
- AUC ≥ 0.86 token-level, ≥ 0.88 example-level
- Replaces `data/probe.npz`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: Token-level training with span-max loss from Obeso et al. §3 #1

What

Why

Status / context

Definition of done

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research: Token-level training with span-max loss from Obeso et al. §3 #1

Description

What

Why

Status / context

Definition of done

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions