Skip to content

Add diagnose_boxer_output.py: geometric quality metrics without ground truth#7

Open
zzhang001 wants to merge 1 commit intofacebookresearch:mainfrom
zzhang001:tools-diagnose-boxer-output
Open

Add diagnose_boxer_output.py: geometric quality metrics without ground truth#7
zzhang001 wants to merge 1 commit intofacebookresearch:mainfrom
zzhang001:tools-diagnose-boxer-output

Conversation

@zzhang001
Copy link
Copy Markdown

Summary

A new top-level script diagnose_boxer_output.py that computes three orthogonal geometric self-consistency metrics per 3D box. All three use data Boxer's existing pipeline already produces — no new ground truth required — which makes it useful for custom datasets and in-the-wild captures where GT 3D labels aren't available.

The three metrics

What it does What it catches
A. dist_to_cloud Distance from box center to the nearest scene-cloud point (uses ctx.sdp_global or ctx.uid_to_p3 depending on the dataset). Box floats in empty space.
B. iou2d Reproject the box's 8 corners into the image with the same (K, pose) the loader supplied, take the axis-aligned 2D bbox, IoU with the OWL prompt bbox from owl_2dbbs.csv. K / pose / 3D-position inconsistency (projected box misses where OWL saw it).
C. depth_gap Camera-frame Z of the 8 corners vs. q10..q90 of SDP Z inside the OWL 2D bbox footprint. "Right angular direction, wrong distance" — the failure mode systematic K bias / per-segment scale drift produces, which A and B both tend to miss.

Each metric is orthogonal: one can pass and the others fail. The table in the docstring of diagnose_boxer_output.py lists which failure modes each detects so users know which to consult for which symptom.

Use cases

  • Debugging a new loader: does my 3D output even live in the scene? (A)
  • Smoke-testing custom K/pose inputs: are my inputs self-consistent? (B)
  • Filtering before fusion: drop clearly-bad per-frame boxes before fuse_obbs_from_csv clusters them, producing a cleaner scene graph. (--filter)
  • Light regression checks on in-house sequences: track mean/median of the three metrics across code changes to catch silent quality drops.

CLI

Modeled on view_fusion.py / view_tracker.py:

```bash
python diagnose_boxer_output.py --input nym10_gen1
python diagnose_boxer_output.py --input nym10_gen1 --filter
python diagnose_boxer_output.py --input nym10_gen1 --filter \
--max_dist_to_cloud 0.5 --min_iou2d 0.1 --max_depth_gap 0.5
```

Outputs (under <output_dir>/<seq>/):

  • diagnose_by_box.csv — per-box metrics
  • diagnose_summary.json — aggregate distributions (mean, median, p90, p99, max)
  • <write_name>_3dbbs_filtered.csv — bad boxes dropped (only with --filter)

The filtered CSV has the same schema as <write_name>_3dbbs.csv, so it drops into utils.fuse_3d_boxes.fuse_obbs_from_csv unchanged.

Implementation notes

  • Uses build_seq_ctx to pull pose / K / SDP per dataset, so Aria and CA-1M work out-of-the-box. ScanNet falls back to NaN for A/C when no pre-scanned cloud is in the context (B still works).
  • Recovers the BoxerNet-output ↔ OWL-prompt-bbox correspondence that's lost after run_boxer's thresh_3d filter by matching on label + nearest-projected-center — same heuristic a human would use.
  • ~420 lines total including docstrings. Zero new dependencies.

Backward compatibility

Purely additive — no existing file modified except the README, which gets a new `Demo #3.5` section between Offline Fusion (#3) and Online Tracker (#4).

Open questions for the reviewer

  • Location. I put the file at the repo root to match view_fusion.py / view_tracker.py. If you'd prefer tools/ or scripts/, say the word.
  • ScanNet scene cloud. The pre-scanned mesh isn't in build_seq_ctx for ScanNet — should I add a path that loads it, or is metric-B-only output enough for the ScanNet case?
  • CLI shape. Mirroring view_fusion.py felt most natural; some users might prefer accepting explicit CSV/PLY paths for portability.

Context / provenance

This originated as pipeline/diagnostic.py in a downstream VGGT-SLAM → Boxer adapter for iPhone video (https://github.com/zzhang001/box). The metrics caught a per-submap scale-drift issue there that wasn't visible in 3D-only inspection, so I figured it'd be useful for other Boxer users hitting similar "is my custom input OK?" questions. Related upstream issue on aspect-ratio handling: #6.

…d truth

Three orthogonal per-box self-consistency metrics for Boxer 3D output,
useful for debugging custom loaders / in-the-wild captures that lack
3D ground-truth annotations.

A. dist_to_cloud — distance from the box center to the nearest scene-
   cloud point (uses ctx.sdp_global or ctx.uid_to_p3 depending on the
   dataset). Catches boxes placed in empty space.

B. iou2d — reproject the box's 8 corners with the same (K, pose) the
   loader supplied, take the axis-aligned 2D bbox, compute IoU with
   the OWL prompt bbox from owl_2dbbs.csv. Catches K / pose / 3D
   position inconsistency.

C. depth_gap — camera-frame Z of the box's 8 corners vs. the q10..q90
   Z range of SDP points whose projection lands inside the OWL 2D
   bbox. Catches "right angular direction, wrong distance" — a
   failure mode systematic K bias or per-segment scale drift
   produces, which A and B both tend to miss.

Each metric is orthogonal: one can pass and the others fail. Combining
them as `--filter` thresholds drops clearly-bad per-frame boxes before
`fuse_3d_boxes.fuse_obbs_from_csv` clusters them, producing a cleaner
scene graph.

Outputs (in `<output_dir>/<seq>/`):
  - diagnose_by_box.csv          per-box metrics
  - diagnose_summary.json        aggregate distributions
  - <write_name>_3dbbs_filtered.csv   with --filter

Follows the CLI shape of view_fusion.py / view_tracker.py:
  python diagnose_boxer_output.py --input <seq>
  python diagnose_boxer_output.py --input <seq> --filter \\
      --max_dist_to_cloud 0.5 --min_iou2d 0.1 --max_depth_gap 0.5

Uses build_seq_ctx to pull pose / K / SDP per dataset, so Aria and
CA-1M work out of the box; ScanNet falls back to NaN for A/C when no
pre-scanned cloud is in the context (IoU still works).

README: add 'Demo facebookresearch#3.5' section pointing at the tool between the
Offline Fusion demo and the Online Tracker demo.
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants