Title: Clarifications on attention mask shapes in attn_encoder.py and use of noisy x0_sample in train.py
Body:
Hi! Thanks for open-sourcing FlowER — I’ve been reading through the code and had two small questions about masking and the training target. 🙏
1) Attention mask shapes and which mask is used
File: FlowER/model/attn_encoder.py
- Around line 93, the comment says the mask has shape
(batch, query_len, key_len).
- Later, around lines 154–155,
scores are masked_fill’d with a mask that (per the comment) has shape (B, 1, 1, T_values) just for letting the code keep processing.
- In
forward() (around lines 336 and 339), it looks like the code uses MASK (from around line 316) rather than MATRIX_MASKS (from around line 318).
This feels a bit inconsistent: the documented shape is (B, Q, K), while the applied mask seems to use a broadcast-friendly (B, 1, 1, T_values). Also, forward() appears to use MASK instead of MATRIX_MASKS.
Questions / suggestions:
- In
forward(), should MATRIX_MASKS be used instead of MASK, or is the current use of MASK intentional?
Note: I realize this may not affect final outputs because padding is masked later anyway; I’m mainly looking to understand the intended convention and avoid confusion for future readers. I’m happy to open a small PR to standardize comments/names if that helps.
2) Why use noisy x0_sample in ut computation?
File: FlowER/train.py
- Around line 191,
ut is computed as:
ut = flow.compute_conditional_vector_field(x0_sample, x1)
Questions
- What’s the rationale for using the noisy x0_sample here instead of the clean x0? Is this for regularization/noise conditioning (e.g., to stabilize training or match the objective), or to ensure an unbiased estimate under the training distribution? If there’s a relevant paper/section that motivates this choice, a pointer would be great.
Title: Clarifications on attention mask shapes in
attn_encoder.pyand use of noisyx0_sampleintrain.pyBody:
Hi! Thanks for open-sourcing FlowER — I’ve been reading through the code and had two small questions about masking and the training target. 🙏
1) Attention mask shapes and which mask is used
File:
FlowER/model/attn_encoder.py(batch, query_len, key_len).scoresaremasked_fill’d with a mask that (per the comment) has shape(B, 1, 1, T_values)just for letting the code keep processing.forward()(around lines 336 and 339), it looks like the code usesMASK(from around line 316) rather thanMATRIX_MASKS(from around line 318).This feels a bit inconsistent: the documented shape is
(B, Q, K), while the applied mask seems to use a broadcast-friendly(B, 1, 1, T_values). Also,forward()appears to useMASKinstead ofMATRIX_MASKS.Questions / suggestions:
forward(), shouldMATRIX_MASKSbe used instead ofMASK, or is the current use ofMASKintentional?2) Why use noisy
x0_sampleinutcomputation?File:
FlowER/train.pyutis computed as:Questions