Skip to content

TemporalSAE: don't apply decoding bias if weights are tied and bias wasn't applied at encoding#703

Open
danra wants to merge 2 commits into
decoderesearch:mainfrom
danra:temporal_fix_b_dec
Open

TemporalSAE: don't apply decoding bias if weights are tied and bias wasn't applied at encoding#703
danra wants to merge 2 commits into
decoderesearch:mainfrom
danra:temporal_fix_b_dec

Conversation

@danra

@danra danra commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Description

TemporalSAE decoding previously added the decoder bias unconditionally at decoding time. It should only be done in case the encoder/decoder weights are untied, or, if tied, in case the bias was also subtracted at encoding time.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

You have tested formatting, typing and tests

  • I have run make check-ci to check format and linting. (you can run make format to format code if needed.)

Copilot AI review requested due to automatic review settings June 12, 2026 00:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Expands TemporalSAE’s bias-handling logic to support a new apply_b_dec_to_input mode (especially relevant when weights are tied) and updates tests to cover the new configuration combinations.

Changes:

  • Conditioned adding b_dec in TemporalSAE.decode() and TemporalSAE.forward() based on tied_weights + apply_b_dec_to_input.
  • Broadened TemporalSAE unit tests to parametrize over apply_b_dec_to_input (and tied_weights for decode).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
tests/saes/test_temporal_sae.py Expands test parametrization to cover apply_b_dec_to_input (and tied_weights for decode).
sae_lens/saes/temporal_sae.py Makes decoder bias application conditional based on config flags.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +328 to +329
if not self.cfg.tied_weights or self.cfg.apply_b_dec_to_input:
sae_out = sae_out + self.b_dec
Comment on lines +328 to +329
if not self.cfg.tied_weights or self.cfg.apply_b_dec_to_input:
sae_out = sae_out + self.b_dec
Comment on lines +352 to +354
x_recons = torch.matmul(z_novel + z_pred, self.W_dec)
if not self.cfg.tied_weights or self.cfg.apply_b_dec_to_input:
x_recons = x_recons + self.b_dec
# Decode novel codes
sae_out = torch.matmul(feature_acts, self.W_dec)
sae_out = sae_out + self.b_dec
if not self.cfg.tied_weights or self.cfg.apply_b_dec_to_input:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants