Skip to content

research: Token-level training with span-max loss from Obeso et al. §3 #1

Description

@peaktwilight

What

Move from snippet-level logistic regression to the proper token-level + span-max formulation described in Obeso et al. §3 (https://arxiv.org/abs/2509.03531).

The current baseline trains sklearn LogisticRegression on snippet-level labels (one label per snippet, taken at the last-token activation) and reaches AUC 0.846. The paper's formulation trains a small PyTorch head with:

L = (1-ω) · Σ w_i · BCE(y_i, p_i) + ω · Σ BCE(y_s, max p_i for i in s)

where ω is annealed 0 → 1 over training (token-level supervision early, span-level later), and entity tokens are up-weighted by α = 10.

Why

Token-level supervision should give us calibrated per-token scores (which the UI already pretends to render) instead of broadcasting a single snippet label across every token. The span-max term then re-aligns the loss with the snippet-level objective we actually care about at inference.

Status / context

  • src/train_probe_spanmax.py — partial implementation of the loss + training loop
  • src/extract_token_activations.py — per-token + span activation extractor (done)
  • Integration gap: wire the activation extraction pipeline end-to-end and report AUC

Definition of done

  • Span-max probe trained on the merged CyberSecEval + SVEN dataset
  • AUC ≥ 0.86 token-level, ≥ 0.88 example-level
  • Replaces data/probe.npz

Metadata

Metadata

Assignees

No one assigned

    Labels

    path:token-levelToken-level probe path (per-token spans, value head, BCE + span-max)researchResearch / experiments / paper-tracking

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions