Skip to content

research: Replication on Gemma 4 26B-A4B #8

Description

@peaktwilight

What

Replicate the current probe pipeline on the Gemma 4 26B-A4B variant. The current probe is trained on E2B activations; 26B-A4B (Apache-2.0, on-device runnable) should give a stronger signal but requires re-extracting activations on the larger model.

Why

Linear probes generally scale with base-model quality — a stronger backbone gives a more linearly separable "vulnerable" direction. 26B-A4B is the natural next step that keeps us inside the on-device licensing story.

Plan

  1. Re-run src/extract_token_activations.py against 26B-A4B on the same CyberSecEval + SVEN dataset
  2. Train per-layer probes, sweep layers
  3. Compare per-layer AUC vs. the E2B baseline
  4. Ship the best 26B-A4B probe variant alongside the E2B one (user selectable in the UI)

Definition of done

  • Per-layer AUC numbers on the same dataset for 26B-A4B
  • Best 26B-A4B probe ships alongside the E2B probe (UI dropdown to switch)
  • AUC delta reported in docs/ or the project README

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchResearch / experiments / paper-tracking

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions