TMLR 2026 | Mechanistic interpretability: attention-head binding (EB*) as a marker of concept emergence. 7 models, 5 architectures (Pythia 160M–2.8B, OLMo-1B, CRFM GPT-2, SmolLM3-3B, Qwen2.5-1.5B), 41 terms.
nlp accessibility transformers language-models pythia interpretability few-shot-learning training-dynamics attention-heads mechanistic-interpretability qwen tmlr olmo smollm3 tmlr-2026 concept-emergence crfm
-
Updated
Jun 9, 2026 - Python