Skip to content

[Question] Canonical SAEs missing matching explanations (Gemma-2-2B) #549

@SleepyLan

Description

@SleepyLan

I am using SAELens with Gemma-2-2B canonical SAEs (e.g., release = "gemma-scope-2b-pt-res-canonical", sae_id = "layer_10/width_16k/average_l0_77").
This SAE has only 2304 features, but the explanations file I found on Neuronpedia corresponds to the full-width 16K SAE (≈16384 rows).
As a result, feature IDs from the canonical SAE do not match the explanations (many indices are out of bounds).

Steps to reproduce

  1. Load a canonical SAE in SAELens:
    from sae_lens import SAE
    sae, cfg, sparsity = SAE.from_pretrained(
        release="gemma-scope-2b-pt-res-canonical",
        sae_id="layer_10/width_16k/average_l0_77",
        device="cuda"
    )
    print("SAE feature dim:", sae.W_dec.shape[1])  # shows 2304
  2. Download explanations from Neuronpedia at
    gemma-2-2b/10-gemmascope-res-16k/explanations
    → file has ~16384 rows (full-width features).
  3. When trying to align these explanations with canonical SAE features, I get mismatched IDs and index-out-of-bounds errors.

Expected behavior

  • Is there an explanations file that matches the canonical SAE (2304 rows, aligned with canonical feature IDs)?
  • If not, could canonical SAE explanations be exported and published?
  • Alternatively, clear documentation on which SAEs have canonical explanations vs. only full-width ones would help.

Environment

  • SAE release: gemma-scope-2b-pt-res-canonical
  • SAE id: layer_10/width_16k/average_l0_77
  • SAELens version / commit hash (please fill in)
  • Explanations source: Neuronpedia (gemma-2-2b/10-gemmascope-res-16k)

Could you please clarify:

  • Is there a way to obtain canonical SAE explanations (aligned with 2304 features)?
  • Or instructions on how to map/crop full-width explanations down to the canonical subset?

Thanks a lot — SAELens + Neuronpedia has been super valuable for the interpretability community! 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions