evo2-sae: probing primitives (eval metrics + ActivationBuffer) by polinabinder1 · Pull Request #1629 · NVIDIA-BioNeMo/bionemo-recipes

polinabinder1 · 2026-06-11T19:46:33Z

Summary

SAE probing primitives (eval metrics + ActivationBuffer) for Evo2 — scoring metrics + per-feature annotation, all pure functions of codes + labels. Lives in the evo2 recipe at evo2_sae.eval.probing — moved out of the shared sae library because it's evo2-specific (the shared sae.eval keeps loss_recovered / sparsity / dead_latents for esm2/codonfm).

Stacked on #1622 (uses the evo2_sae package). Base of #1636 (probe harness); #1630 supplies the eval labels.

Contents — `evo2_sae.eval.probing`

ActivationBuffer (codes + optional dense twin + per-token labels + instance ids)
AUROC: auroc_all, auroc_vec, best_single_train_test
decoders: fit_logreg / fit_softmax / macro_auroc / decode_eval
domain_f1 (precision-per-nt, recall-per-instance)
annotate_features (per-feature best concept by AUROC → the annotation table)

How to use

from evo2_sae.eval.probing import auroc_all, annotate_features
au  = auroc_all(codes, labels)                                   # [F, L]
ann = annotate_features(codes, labels, names, min_auroc=0.85)    # [{feature_id, label, auroc}]

Tests

No dedicated CI lane (deferred — see #1622). Run via the recipe:

cd interpretability/sparse_autoencoders/recipes/evo2
bash .ci_build.sh && source .ci_test_env.sh        # or: PYTHONPATH=src:../../sae/src
pytest tests/test_probing.py

12 passed (CPU, no model): AUROC vs a pairwise-definition oracle, domain_f1 vs a hand-computed reference, best_single winner's-curse flip, decode_eval separability, annotate_features best-concept, buffer roundtrip, tie-correct (average) ranks, degenerate-label / tie / sparse edge cases, and standardize's zero-variance floor.

Why hand-rolled (not sklearn / torchmetrics) — checked, not a win

GPU-vectorized over the whole ~32k-feature dictionary in one pass; the library options are CPU and per-(scores, label), so a 32k-feature dictionary becomes a 32k-iteration CPU loop. Function by function:

auroc_all — no library computes a vectorized [features × labels] AUROC matrix on GPU. Kept.
domain_f1, best_single_train_test, annotate_features — bespoke (instance-F1, winner's-curse, per-feature assignment); no library equivalent.
fit_logreg / fit_softmax / decode_eval — the only sklearn-replaceable code, but they fit on the [N≈50k, F≈32k] SAE-code matrix, exactly where CodonFM hit the sklearn.LogisticRegression scaling wall and had to subsample to ≤5k features. Swapping reintroduces that coverage loss + a runtime dep. Net regression.
ActivationBuffer / split_indices / standardize — np.savez + tiny helpers; nothing to gain.

Conclusion: the module stays torch + numpy-only. Each metric is a standard formula (Mann–Whitney rank-AUROC, Adam BCE/softmax, instance-F1) vectorized for full-dictionary GPU scale, and each is validated against an independent reference in the tests.

coderabbitai · 2026-06-11T19:46:47Z

📝 Walkthrough

Walkthrough

This PR adds a comprehensive SAE feature-probing evaluation module (probing.py) to enable model-agnostic interpretation of learned features through metrics, classifiers, and annotation tools, along with an ActivationBuffer artifact for persistence and a full test suite validating correctness across all components.

Changes

SAE Probing Evaluation Suite

Layer / File(s)	Summary
ActivationBuffer data structure and persistence `bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py` (lines 1–65), `bionemo-recipes/interpretability/sparse_autoencoders/sae/tests/test_probing.py` (lines 123–142)	Dataclass storing SAE feature codes, per-token boolean labels and names, optional dense residuals, and concept-to-instance id mappings; `.save()` serializes to typed `.npz` with per-concept instance arrays; `.load()` reconstructs the dataclass; `.name_idx` property maps label names to column indices.
Dataset utilities and standardization `bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py` (lines 73–84)	`split_indices` performs deterministic train/test splitting via seeded `torch.randperm`; `standardize` computes mean and std on training rows with epsilon-clamped std normalization.
AUROC computation and best-feature selection `bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py` (lines 86–145), `bionemo-recipes/interpretability/sparse_autoencoders/sae/tests/test_probing.py` (lines 37–71)	`auroc_all` computes full [feature, label] AUROC matrix via chunked rank-statistics; `auroc_vec` handles single-vector AUROC with degenerate-case handling; `best_single_train_test` selects best feature on training set and reports test AUROC without winner's-curse bias; test oracle `_auroc_ref` validates against brute-force reference.
Feature concept annotation via AUROC thresholding `bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py` (lines 147–174), `bionemo-recipes/interpretability/sparse_autoencoders/sae/tests/test_probing.py` (lines 110–121)	`annotate_features` derives per-feature best-label annotations by selecting max AUROC across labels and filtering by configurable AUROC threshold; excludes low-information features.
Linear classifier training and macro-AUROC evaluation `bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py` (lines 176–226), `bionemo-recipes/interpretability/sparse_autoencoders/sae/tests/test_probing.py` (lines 89–108)	`fit_logreg` trains binary logistic regression; `fit_softmax` trains multinomial softmax; both use Adam with BCE-with-logits and cross-entropy respectively; `macro_auroc` computes macro one-vs-rest AUROC; `decode_eval` orchestrates training and dual metric reporting for test accuracy and macro AUROC.
Domain-adjusted F1 with instance-aware thresholding `bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py` (lines 228–270), `bionemo-recipes/interpretability/sparse_autoencoders/sae/tests/test_probing.py` (lines 73–87)	`domain_f1` computes threshold-swept per-feature F1 by normalizing activations per-feature, remapping instance ids, aggregating per-instance firing via `index_reduce_`, combining precision from concept masks with recall from instance aggregation, and selecting best F1 threshold per feature in chunked passes.
Module public API and test setup `bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/__init__.py` (lines 25–71), `bionemo-recipes/interpretability/sparse_autoencoders/sae/tests/test_probing.py` (lines 1–35)	Imports and re-exports all `probing.py` utilities in `__all__` for public access; test module imports and validates all components.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A warren of metrics, now bundled with care,
AUROC and F1 floating through air,
Buffers that save what the features unfold,
Linear probes seeking wisdom untold,
Domain-aware thresholds, adaptive and keen—
The richest of probing suites ever been seen! 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 91.30% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title accurately captures the main change: introducing probing primitives (eval metrics and ActivationBuffer) for SAE evaluation, which aligns with the core additions across three files.
Description check	✅ Passed	The PR description provides a clear summary, usage examples, comprehensive explanation of included components, and justification for implementation choices. However, the template requires explicit marking of change type and CI configuration labels.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pbinder/sae-interp-primitives

_{Comment @coderabbitai help to get the list of available commands.}

copy-pr-bot · 2026-06-11T19:51:35Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

polinabinder1 · 2026-06-12T04:36:42Z

@coderabbitai review

coderabbitai · 2026-06-12T04:36:46Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

🧹 Nitpick comments (2)

bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py (2)

54-65: 💤 Low value

allow_pickle=True poses a deserialization risk if loading untrusted files.

This is acceptable for internal artifacts but worth documenting. If these buffers might come from external sources, consider validating provenance or using a safer serialization format.

     `@classmethod`
     def load(cls, path: str) -> "ActivationBuffer":
-        """Load an ActivationBuffer from an .npz written by save()."""
+        """Load an ActivationBuffer from an .npz written by save().
+
+        Warning:
+            Uses allow_pickle=True; only load files from trusted sources.
+        """
         z = np.load(path, allow_pickle=True)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py`
around lines 54 - 65, The load method in ActivationBuffer uses np.load(...,
allow_pickle=True) which is unsafe for untrusted files; change load to avoid
allow_pickle=True by default (use allow_pickle=False) or add an explicit
parameter (e.g., allow_pickle: bool = False) and fail with a clear error if
pickled objects are required, and update the ActivationBuffer.load docstring to
document the deserialization risk and the need to validate provenance when
loading external files; ensure references to ActivationBuffer.load and the local
variable z are used to implement and surface the safer behavior.

243-245: 💤 Low value

Consider adding a comment explaining the +2 sizing for the remap tensor.

The +2 accounts for 0-indexing and ensures negative indexing (-1) wraps to a valid buffer position. While correct, this is subtle:

-    remap = torch.full((int(inst_ids.max().item()) + 2,), -1, device=dev, dtype=torch.long)
+    # +2: one for 0-indexing, one so that -1 wraps to a valid (unused) slot
+    remap = torch.full((int(inst_ids.max().item()) + 2,), -1, device=dev, dtype=torch.long)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py`
around lines 243 - 245, Add an inline comment above the remap creation
explaining why the size is int(inst_ids.max().item()) + 2: we need +1 for
0-based indexing of the maximum id and an extra slot so that using -1 as a
sentinel (when indexing remap with potentially -1 inst_ids) will wrap to a valid
buffer position instead of raising an out-of-bounds error; reference the remap
tensor and the subsequent remap[uniq.long()] / remap[inst_ids.long()] usage (and
the torch.full default -1) so readers understand the sentinel handling.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py`:
- Around line 54-65: The load method in ActivationBuffer uses np.load(...,
allow_pickle=True) which is unsafe for untrusted files; change load to avoid
allow_pickle=True by default (use allow_pickle=False) or add an explicit
parameter (e.g., allow_pickle: bool = False) and fail with a clear error if
pickled objects are required, and update the ActivationBuffer.load docstring to
document the deserialization risk and the need to validate provenance when
loading external files; ensure references to ActivationBuffer.load and the local
variable z are used to implement and surface the safer behavior.
- Around line 243-245: Add an inline comment above the remap creation explaining
why the size is int(inst_ids.max().item()) + 2: we need +1 for 0-based indexing
of the maximum id and an extra slot so that using -1 as a sentinel (when
indexing remap with potentially -1 inst_ids) will wrap to a valid buffer
position instead of raising an out-of-bounds error; reference the remap tensor
and the subsequent remap[uniq.long()] / remap[inst_ids.long()] usage (and the
torch.full default -1) so readers understand the sentinel handling.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 23ddf87a-6a45-46a2-8264-db968ee016e5

📥 Commits

Reviewing files that changed from the base of the PR and between e407165 and 79df727.

📒 Files selected for processing (3)

bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/__init__.py
bionemo-recipes/interpretability/sparse_autoencoders/sae/src/sae/eval/probing.py
bionemo-recipes/interpretability/sparse_autoencoders/sae/tests/test_probing.py

polinabinder1 · 2026-06-12T05:26:27Z

Addressed the two nitpicks in 57837ec7: documented the allow_pickle=True trust caveat on ActivationBuffer.load, and added a comment explaining the +2 remap-tensor sizing (index-by-max-id + sentinel headroom). Tests still green (6 passed).

Re-lands #1629 (sae.eval.probing: AUROC / domain-F1 / linear probes + ActivationBuffer) onto the post-#1633 top-level layout, and adds a dedicated CPU workflow (ubuntu-latest, no model/GPU) that runs the model-agnostic probing tests. Separate from the evo2 GPU lane; the tensor-parallel sae tests (torchrun/multi-GPU) are out of scope here. Validated: tests/test_probing.py -> 6 passed (CPU). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

Re-lands #1630 on the post-#1633 layout, on top of the rebased #1629: the DNA label producers (scripts/{labelers,annot_tracks,euk_windows}.py) that emit per-token concept labels (genes/exons/ motifs) to fill #1629's ActivationBuffer, + biopython dep (genetic code in labelers.py). Validated: tests/{test_labelers,test_annot_tracks}.py -> 8 passed (CPU). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

Re-lands #1636 on the post-#1633 layout, on top of rebased #1630: the harness/CLI (scripts/{evo2_buffer,probe,probe_loss_recovered}.py) that runs the model to build an ActivationBuffer (#1629) from #1630's labels and emits the probing metrics. Syntax-checked; the GPU extract->score smoke is a follow-up (no unit tests in this PR yet). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

copy-pr-bot · 2026-06-23T19:57:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Re-lands #1630 on the post-#1633 layout, on top of the rebased #1629: the DNA label producers (scripts/{labelers,annot_tracks,euk_windows}.py) that emit per-token concept labels (genes/exons/ motifs) to fill #1629's ActivationBuffer, + biopython dep (genetic code in labelers.py). Validated: tests/{test_labelers,test_annot_tracks}.py -> 8 passed (CPU). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

Re-lands #1636 on the post-#1633 layout, on top of rebased #1630: the harness/CLI (scripts/{evo2_buffer,probe,probe_loss_recovered}.py) that runs the model to build an ActivationBuffer (#1629) from #1630's labels and emits the probing metrics. Syntax-checked; the GPU extract->score smoke is a follow-up (no unit tests in this PR yet). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

Re-lands #1629 (sae.eval.probing: AUROC / domain-F1 / linear probes + ActivationBuffer) onto the post-#1633 top-level layout, and adds a dedicated CPU workflow (ubuntu-latest, no model/GPU) that runs the model-agnostic probing tests. Separate from the evo2 GPU lane; the tensor-parallel sae tests (torchrun/multi-GPU) are out of scope here. Validated: tests/test_probing.py -> 6 passed (CPU). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

auroc_all / auroc_vec / best_single / macro_auroc ranked via argsort().argsort(), giving tied values arbitrary distinct ranks. SAE codes are sparse (heavy zero-mass), so that biased the AUROC on the real data distribution — and the oracle test only covered randn (no ties). Switch to average (Mann-Whitney) ranks via a vectorized searchsorted helper (keeps the all-features-at-once speed that motivates hand-rolling), make the oracle tie-aware, and add sparse-tie + constant-feature tests. Also documents why these metrics are hand-rolled. tests/test_probing.py -> 8 passed (CPU). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

…er None paths - a never/always-firing concept -> AUROC 0.5 (the valid-mask branch; realistic for rare concepts) - auroc_vec directly (was only tested transitively via best_single) on tied scores - ActivationBuffer with no dense twin / no instances (the Optional -> None save/load paths) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

standardize z-scores SAE codes for the linear/codon probes, where ~20% of latents are dead (constant 0). Add a direct test that the 1e-6 std floor keeps those columns finite (no NaN into the logreg fit) and that mean/std use the train rows only (no test-set leakage). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

…ed sae lib); drop CI lane These probing primitives (eval metrics + ActivationBuffer) are evo2-specific, so move them from the shared sae library into the evo2_sae recipe package: * sae/src/sae/eval/probing.py -> recipes/evo2/src/evo2_sae/eval/probing.py * new recipes/evo2/src/evo2_sae/eval/__init__.py (re-exports the probing API) * sae/src/sae/eval/__init__.py reverted (no longer exports probing — stays shared for esm2/codonfm) * sae/tests/test_probing.py -> recipes/evo2/tests/test_probing.py (import evo2_sae.eval.probing) Remove .github/workflows/unit-tests-sae.yaml (defer CI; run tests via the recipe's .ci_build.sh + pytest). Re-parented onto #1622 so the evo2_sae package is available. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

Re-lands #1630 on the post-#1633 layout, on top of the rebased #1629: the DNA label producers (scripts/{labelers,annot_tracks,euk_windows}.py) that emit per-token concept labels (genes/exons/ motifs) to fill #1629's ActivationBuffer, + biopython dep (genetic code in labelers.py). Validated: tests/{test_labelers,test_annot_tracks}.py -> 8 passed (CPU). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

Re-lands #1636 on the post-#1633 layout, on top of rebased #1630: the harness/CLI (scripts/{evo2_buffer,probe,probe_loss_recovered}.py) that runs the model to build an ActivationBuffer (#1629) from #1630's labels and emits the probing metrics. Syntax-checked; the GPU extract->score smoke is a follow-up (no unit tests in this PR yet). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

…; drop CI lane Relocate #1636's probe harness from scripts/ into the evo2_sae.eval.probing package (alongside the #1629 primitives, now the package __init__): scripts/{labelers,evo2_buffer,annot_tracks,euk_windows,probe,probe_loss_recovered}.py -> src/evo2_sae/eval/probing/*.py Fix imports to package-relative (from . import labelers; from .evo2_buffer import ...) and pull the primitives from evo2_sae.eval.probing; loss_recovered stays in the shared sae lib. Re-point the tests at the package (drop the sys.path-into-scripts/ hack). Remove the CPU CI lane (defer; run via .ci_build.sh + pytest). Reparented onto the moved #1629. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>

polinabinder1 mentioned this pull request Jun 11, 2026

evo2 SAE steering: one clamp (sae.steering) for engine, CLI + harness #1634

Closed

polinabinder1 changed the title ~~sae: shared interpretability primitives (probing + steering)~~ sae: shared probing primitives (eval metrics + ActivationBuffer) Jun 11, 2026

polinabinder1 mentioned this pull request Jun 12, 2026

evo2 SAE eval: label producers + probing harness (on #1629) #1636

Open

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

polinabinder1 marked this pull request as ready for review June 12, 2026 05:32

polinabinder1 requested review from jstjohn, jwilber, pstjohn, savitha-eng and trvachov as code owners June 12, 2026 05:32

polinabinder1 force-pushed the pbinder/sae-interp-primitives branch from 57837ec to 13a0690 Compare June 23, 2026 06:06

polinabinder1 and others added 3 commits June 24, 2026 04:13

root and others added 2 commits June 24, 2026 04:13

polinabinder1 force-pushed the pbinder/sae-interp-primitives branch from 26dd036 to 73c261f Compare June 24, 2026 04:16

polinabinder1 changed the base branch from main to pbinder/evo2-sae-serve June 24, 2026 04:16

polinabinder1 force-pushed the pbinder/sae-interp-primitives branch from 73c261f to 26dd036 Compare June 24, 2026 04:19

polinabinder1 changed the base branch from pbinder/evo2-sae-serve to main June 24, 2026 04:19

polinabinder1 force-pushed the pbinder/sae-interp-primitives branch from 26dd036 to 73c261f Compare June 24, 2026 04:24

polinabinder1 changed the base branch from main to pbinder/evo2-sae-serve June 24, 2026 04:24

polinabinder1 changed the title ~~sae: shared probing primitives (eval metrics + ActivationBuffer)~~ evo2-sae: probing primitives (eval metrics + ActivationBuffer) Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evo2-sae: probing primitives (eval metrics + ActivationBuffer)#1629

evo2-sae: probing primitives (eval metrics + ActivationBuffer)#1629
polinabinder1 wants to merge 5 commits into
pbinder/evo2-sae-servefrom
pbinder/sae-interp-primitives

polinabinder1 commented Jun 11, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

copy-pr-bot Bot commented Jun 11, 2026

Uh oh!

polinabinder1 commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

polinabinder1 commented Jun 12, 2026

Uh oh!

copy-pr-bot Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

polinabinder1 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Contents — evo2_sae.eval.probing

How to use

Tests

Why hand-rolled (not sklearn / torchmetrics) — checked, not a win

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

copy-pr-bot Bot commented Jun 11, 2026

Uh oh!

polinabinder1 commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

polinabinder1 commented Jun 12, 2026

Uh oh!

copy-pr-bot Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

polinabinder1 commented Jun 11, 2026 •

edited

Loading

Contents — `evo2_sae.eval.probing`

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading