Add diagnose_boxer_output.py: geometric quality metrics without ground truth by zzhang001 · Pull Request #7 · facebookresearch/boxer

zzhang001 · 2026-04-18T07:24:42Z

Summary

A new top-level script diagnose_boxer_output.py that computes three orthogonal geometric self-consistency metrics per 3D box. All three use data Boxer's existing pipeline already produces — no new ground truth required — which makes it useful for custom datasets and in-the-wild captures where GT 3D labels aren't available.

The three metrics

	What it does	What it catches
A. dist_to_cloud	Distance from box center to the nearest scene-cloud point (uses `ctx.sdp_global` or `ctx.uid_to_p3` depending on the dataset).	Box floats in empty space.
B. iou2d	Reproject the box's 8 corners into the image with the same `(K, pose)` the loader supplied, take the axis-aligned 2D bbox, IoU with the OWL prompt bbox from `owl_2dbbs.csv`.	`K` / pose / 3D-position inconsistency (projected box misses where OWL saw it).
C. depth_gap	Camera-frame Z of the 8 corners vs. `q10..q90` of SDP Z inside the OWL 2D bbox footprint.	"Right angular direction, wrong distance" — the failure mode systematic K bias / per-segment scale drift produces, which A and B both tend to miss.

Each metric is orthogonal: one can pass and the others fail. The table in the docstring of diagnose_boxer_output.py lists which failure modes each detects so users know which to consult for which symptom.

Use cases

Debugging a new loader: does my 3D output even live in the scene? (A)
Smoke-testing custom K/pose inputs: are my inputs self-consistent? (B)
Filtering before fusion: drop clearly-bad per-frame boxes before fuse_obbs_from_csv clusters them, producing a cleaner scene graph. (--filter)
Light regression checks on in-house sequences: track mean/median of the three metrics across code changes to catch silent quality drops.

CLI

Modeled on view_fusion.py / view_tracker.py:

```bash
python diagnose_boxer_output.py --input nym10_gen1
python diagnose_boxer_output.py --input nym10_gen1 --filter
python diagnose_boxer_output.py --input nym10_gen1 --filter \
--max_dist_to_cloud 0.5 --min_iou2d 0.1 --max_depth_gap 0.5
```

Outputs (under <output_dir>/<seq>/):

diagnose_by_box.csv — per-box metrics
diagnose_summary.json — aggregate distributions (mean, median, p90, p99, max)
<write_name>_3dbbs_filtered.csv — bad boxes dropped (only with --filter)

The filtered CSV has the same schema as <write_name>_3dbbs.csv, so it drops into utils.fuse_3d_boxes.fuse_obbs_from_csv unchanged.

Implementation notes

Uses build_seq_ctx to pull pose / K / SDP per dataset, so Aria and CA-1M work out-of-the-box. ScanNet falls back to NaN for A/C when no pre-scanned cloud is in the context (B still works).
Recovers the BoxerNet-output ↔ OWL-prompt-bbox correspondence that's lost after run_boxer's thresh_3d filter by matching on label + nearest-projected-center — same heuristic a human would use.
~420 lines total including docstrings. Zero new dependencies.

Backward compatibility

Purely additive — no existing file modified except the README, which gets a new `Demo #3.5` section between Offline Fusion (#3) and Online Tracker (#4).

Open questions for the reviewer

Location. I put the file at the repo root to match view_fusion.py / view_tracker.py. If you'd prefer tools/ or scripts/, say the word.
ScanNet scene cloud. The pre-scanned mesh isn't in build_seq_ctx for ScanNet — should I add a path that loads it, or is metric-B-only output enough for the ScanNet case?
CLI shape. Mirroring view_fusion.py felt most natural; some users might prefer accepting explicit CSV/PLY paths for portability.

Context / provenance

This originated as pipeline/diagnostic.py in a downstream VGGT-SLAM → Boxer adapter for iPhone video (https://github.com/zzhang001/box). The metrics caught a per-submap scale-drift issue there that wasn't visible in 3D-only inspection, so I figured it'd be useful for other Boxer users hitting similar "is my custom input OK?" questions. Related upstream issue on aspect-ratio handling: #6.

…d truth Three orthogonal per-box self-consistency metrics for Boxer 3D output, useful for debugging custom loaders / in-the-wild captures that lack 3D ground-truth annotations. A. dist_to_cloud — distance from the box center to the nearest scene- cloud point (uses ctx.sdp_global or ctx.uid_to_p3 depending on the dataset). Catches boxes placed in empty space. B. iou2d — reproject the box's 8 corners with the same (K, pose) the loader supplied, take the axis-aligned 2D bbox, compute IoU with the OWL prompt bbox from owl_2dbbs.csv. Catches K / pose / 3D position inconsistency. C. depth_gap — camera-frame Z of the box's 8 corners vs. the q10..q90 Z range of SDP points whose projection lands inside the OWL 2D bbox. Catches "right angular direction, wrong distance" — a failure mode systematic K bias or per-segment scale drift produces, which A and B both tend to miss. Each metric is orthogonal: one can pass and the others fail. Combining them as `--filter` thresholds drops clearly-bad per-frame boxes before `fuse_3d_boxes.fuse_obbs_from_csv` clusters them, producing a cleaner scene graph. Outputs (in `<output_dir>/<seq>/`): - diagnose_by_box.csv per-box metrics - diagnose_summary.json aggregate distributions - <write_name>_3dbbs_filtered.csv with --filter Follows the CLI shape of view_fusion.py / view_tracker.py: python diagnose_boxer_output.py --input <seq> python diagnose_boxer_output.py --input <seq> --filter \\ --max_dist_to_cloud 0.5 --min_iou2d 0.1 --max_depth_gap 0.5 Uses build_seq_ctx to pull pose / K / SDP per dataset, so Aria and CA-1M work out of the box; ScanNet falls back to NaN for A/C when no pre-scanned cloud is in the context (IoU still works). README: add 'Demo facebookresearch#3.5' section pointing at the tool between the Offline Fusion demo and the Online Tracker demo.

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add diagnose_boxer_output.py: geometric quality metrics without ground truth#7

Add diagnose_boxer_output.py: geometric quality metrics without ground truth#7
zzhang001 wants to merge 1 commit intofacebookresearch:mainfrom
zzhang001:tools-diagnose-boxer-output

zzhang001 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zzhang001 commented Apr 18, 2026

Summary

The three metrics

Use cases

CLI

Implementation notes

Backward compatibility

Open questions for the reviewer

Context / provenance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants