What
Replicate the current probe pipeline on the Gemma 4 26B-A4B variant. The current probe is trained on E2B activations; 26B-A4B (Apache-2.0, on-device runnable) should give a stronger signal but requires re-extracting activations on the larger model.
Why
Linear probes generally scale with base-model quality — a stronger backbone gives a more linearly separable "vulnerable" direction. 26B-A4B is the natural next step that keeps us inside the on-device licensing story.
Plan
- Re-run
src/extract_token_activations.py against 26B-A4B on the same CyberSecEval + SVEN dataset
- Train per-layer probes, sweep layers
- Compare per-layer AUC vs. the E2B baseline
- Ship the best 26B-A4B probe variant alongside the E2B one (user selectable in the UI)
Definition of done
- Per-layer AUC numbers on the same dataset for 26B-A4B
- Best 26B-A4B probe ships alongside the E2B probe (UI dropdown to switch)
- AUC delta reported in
docs/ or the project README
What
Replicate the current probe pipeline on the Gemma 4 26B-A4B variant. The current probe is trained on E2B activations; 26B-A4B (Apache-2.0, on-device runnable) should give a stronger signal but requires re-extracting activations on the larger model.
Why
Linear probes generally scale with base-model quality — a stronger backbone gives a more linearly separable "vulnerable" direction. 26B-A4B is the natural next step that keeps us inside the on-device licensing story.
Plan
src/extract_token_activations.pyagainst 26B-A4B on the same CyberSecEval + SVEN datasetDefinition of done
docs/or the project README