Loaders anisotropically stretch non-square images, producing out-of-distribution K for wide-aspect cameras

### Summary

Every loader (`scannet_loader`, `ca_loader`, `aria_loader`) and the reference `run_boxer.py` resize input images to `boxernet.hw × boxernet.hw` via **anisotropic stretch**, and scale `K` accordingly so that `fx *= hw/orig_w` and `fy *= hw/orig_h`. When the input image is not already square, this produces an anisotropic `K` with `fy/fx = orig_w/orig_h`.

For ScanNet (1296×968, aspect 1.34), the resulting anisotropy (fy/fx ≈ 1.34) is within BoxerNet's training distribution and works fine. For wider cameras — notably modern phones (iPhone main rear camera records 1920×1080, aspect 1.78) — the resulting `K` has `fy/fx ≈ 1.78`, which is noticeably outside the anisotropy range BoxerNet saw during training.

### Why this matters

BoxerNet's 2D→3D lifting relies on `K` to turn pixel coordinates into rays in camera frame. A `K` that's more anisotropic than anything in training means the angular footprint BoxerNet infers from a 2D bbox is biased along the "stretched" axis. The effect is systematic: all boxes get a consistent direction/distance bias.

### Proposed fix

Add an optional **pad-to-square** preprocessing mode to the loaders:

1. Pad the original image with zeros (or the dataset's background color) to `max(orig_h, orig_w)² square`.
2. Offset `cx`/`cy` by the pad amount.
3. Uniform (isotropic) resize the padded square to `hw × hw`.

Result: `fx = fy` for any camera with square pixels, regardless of source aspect ratio. Black-bar padding is a well-represented augmentation in DINOv3 pretraining, so BoxerNet's backbone should handle it gracefully.

### Reference implementation

The core of `_build_datum` in a downstream iPhone-video adapter where we've been running this:

```python
orig_h, orig_w = img_rgb.shape[:2]
side = max(orig_h, orig_w)
pad_top  = (side - orig_h) // 2
pad_left = (side - orig_w) // 2
img_square = cv2.copyMakeBorder(img_rgb, pad_top, side - orig_h - pad_top,
                                pad_left, side - orig_w - pad_left,
                                cv2.BORDER_CONSTANT, value=0)
img = cv2.resize(img_square, (hw, hw), interpolation=cv2.INTER_AREA)

# K: start from K at native resolution, shift cx/cy into the padded square,
# then uniform-resize side → hw. Result: fx == fy for square-pixel cameras.
K_padded = K_native.copy()
K_padded[0, 2] += pad_left
K_padded[1, 2] += pad_top
s = hw / side
K_boxer = K_padded.copy()
K_boxer[[0, 0, 1, 1], [0, 2, 1, 2]] *= s
```

I'd gate it behind a \`--pad-to-square\` flag (default off, preserving current behavior). If the maintainers agree on the motivation, I can open a PR that threads the flag through each loader.

### What I have / don't have

**Have**: geometric argument + anecdotal evidence on iPhone 16:9 footage that 3D boxes moved from noticeably-drifted to substantially-tighter when we switched from stretch to pad (mean center-to-cloud distance 0.57 m → 0.28 m; **but** that run also included a known-K override and a change in SDP source, so pad-to-square alone is not cleanly isolated).

**Don't have**: a clean A/B on a CA-1M or ScanNet sequence, which is what would actually convince me. Happy to run one if the maintainers point at a held-out eval split.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loaders anisotropically stretch non-square images, producing out-of-distribution K for wide-aspect cameras #6

Summary

Why this matters

Proposed fix

Reference implementation

What I have / don't have

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Loaders anisotropically stretch non-square images, producing out-of-distribution K for wide-aspect cameras #6

Description

Summary

Why this matters

Proposed fix

Reference implementation

What I have / don't have

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions