Skip to content

Clarifications on attention mask shapes in attn_encoder.py and use of noisy x0_sample in train.py #2

@yueyin19960520

Description

@yueyin19960520

Title: Clarifications on attention mask shapes in attn_encoder.py and use of noisy x0_sample in train.py

Body:

Hi! Thanks for open-sourcing FlowER — I’ve been reading through the code and had two small questions about masking and the training target. 🙏

1) Attention mask shapes and which mask is used

File: FlowER/model/attn_encoder.py

  • Around line 93, the comment says the mask has shape (batch, query_len, key_len).
  • Later, around lines 154–155, scores are masked_fill’d with a mask that (per the comment) has shape (B, 1, 1, T_values) just for letting the code keep processing.
  • In forward() (around lines 336 and 339), it looks like the code uses MASK (from around line 316) rather than MATRIX_MASKS (from around line 318).

This feels a bit inconsistent: the documented shape is (B, Q, K), while the applied mask seems to use a broadcast-friendly (B, 1, 1, T_values). Also, forward() appears to use MASK instead of MATRIX_MASKS.

Questions / suggestions:

  • In forward(), should MATRIX_MASKS be used instead of MASK, or is the current use of MASK intentional?

Note: I realize this may not affect final outputs because padding is masked later anyway; I’m mainly looking to understand the intended convention and avoid confusion for future readers. I’m happy to open a small PR to standardize comments/names if that helps.


2) Why use noisy x0_sample in ut computation?

File: FlowER/train.py

  • Around line 191, ut is computed as:
ut = flow.compute_conditional_vector_field(x0_sample, x1)

Questions

  • What’s the rationale for using the noisy x0_sample here instead of the clean x0? Is this for regularization/noise conditioning (e.g., to stabilize training or match the objective), or to ensure an unbiased estimate under the training distribution? If there’s a relevant paper/section that motivates this choice, a pointer would be great.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions