What
Move from snippet-level logistic regression to the proper token-level + span-max formulation described in Obeso et al. §3 (https://arxiv.org/abs/2509.03531).
The current baseline trains sklearn LogisticRegression on snippet-level labels (one label per snippet, taken at the last-token activation) and reaches AUC 0.846. The paper's formulation trains a small PyTorch head with:
L = (1-ω) · Σ w_i · BCE(y_i, p_i) + ω · Σ BCE(y_s, max p_i for i in s)
where ω is annealed 0 → 1 over training (token-level supervision early, span-level later), and entity tokens are up-weighted by α = 10.
Why
Token-level supervision should give us calibrated per-token scores (which the UI already pretends to render) instead of broadcasting a single snippet label across every token. The span-max term then re-aligns the loss with the snippet-level objective we actually care about at inference.
Status / context
src/train_probe_spanmax.py — partial implementation of the loss + training loop
src/extract_token_activations.py — per-token + span activation extractor (done)
- Integration gap: wire the activation extraction pipeline end-to-end and report AUC
Definition of done
- Span-max probe trained on the merged CyberSecEval + SVEN dataset
- AUC ≥ 0.86 token-level, ≥ 0.88 example-level
- Replaces
data/probe.npz
What
Move from snippet-level logistic regression to the proper token-level + span-max formulation described in Obeso et al. §3 (https://arxiv.org/abs/2509.03531).
The current baseline trains sklearn
LogisticRegressionon snippet-level labels (one label per snippet, taken at the last-token activation) and reaches AUC 0.846. The paper's formulation trains a small PyTorch head with:where
ωis annealed0 → 1over training (token-level supervision early, span-level later), and entity tokens are up-weighted byα = 10.Why
Token-level supervision should give us calibrated per-token scores (which the UI already pretends to render) instead of broadcasting a single snippet label across every token. The span-max term then re-aligns the loss with the snippet-level objective we actually care about at inference.
Status / context
src/train_probe_spanmax.py— partial implementation of the loss + training loopsrc/extract_token_activations.py— per-token + span activation extractor (done)Definition of done
data/probe.npz