Clarifications on attention mask shapes in `attn_encoder.py` and use of noisy `x0_sample` in `train.py`

**Title:** Clarifications on attention mask shapes in `attn_encoder.py` and use of noisy `x0_sample` in `train.py`

**Body:**

Hi! Thanks for open-sourcing FlowER — I’ve been reading through the code and had two small questions about masking and the training target. 🙏

### 1) Attention mask shapes and which mask is used

**File:** `FlowER/model/attn_encoder.py`  
- Around **line 93**, the comment says the mask has shape `(batch, query_len, key_len)`.  
- Later, around **lines 154–155**, `scores` are `masked_fill`’d with a mask that (per the comment) has shape `(B, 1, 1, T_values)`  just for letting the code keep processing.  
- In `forward()` (around **lines 336 and 339**), it looks like the code uses **`MASK`** (from around **line 316**) rather than **`MATRIX_MASKS`** (from around **line 318**).

This feels a bit inconsistent: the documented shape is `(B, Q, K)`, while the applied mask seems to use a broadcast-friendly `(B, 1, 1, T_values)`. Also, `forward()` appears to use `MASK` instead of `MATRIX_MASKS`.

**Questions / suggestions:**
- In `forward()`, should `MATRIX_MASKS` be used instead of `MASK`, or is the current use of `MASK` intentional?  

> Note: I realize this may not affect final outputs because padding is masked later anyway; I’m mainly looking to understand the intended convention and avoid confusion for future readers. I’m happy to open a small PR to standardize comments/names if that helps.

---

### 2) Why use noisy `x0_sample` in `ut` computation?

**File:** `FlowER/train.py`  
- Around **line 191**, `ut` is computed as:
```python
ut = flow.compute_conditional_vector_field(x0_sample, x1)
```

**Questions**
- What’s the rationale for using the noisy x0_sample here instead of the clean x0? Is this for regularization/noise conditioning (e.g., to stabilize training or match the objective), or to ensure an unbiased estimate under the training distribution? If there’s a relevant paper/section that motivates this choice, a pointer would be great.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifications on attention mask shapes in `attn_encoder.py` and use of noisy `x0_sample` in `train.py` #2

1) Attention mask shapes and which mask is used

2) Why use noisy `x0_sample` in `ut` computation?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Clarifications on attention mask shapes in attn_encoder.py and use of noisy x0_sample in train.py #2

Description

1) Attention mask shapes and which mask is used

2) Why use noisy x0_sample in ut computation?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Clarifications on attention mask shapes in `attn_encoder.py` and use of noisy `x0_sample` in `train.py` #2

2) Why use noisy `x0_sample` in `ut` computation?