Add diagnose_boxer_output.py: geometric quality metrics without ground truth#7
Open
zzhang001 wants to merge 1 commit intofacebookresearch:mainfrom
Open
Add diagnose_boxer_output.py: geometric quality metrics without ground truth#7zzhang001 wants to merge 1 commit intofacebookresearch:mainfrom
zzhang001 wants to merge 1 commit intofacebookresearch:mainfrom
Conversation
…d truth
Three orthogonal per-box self-consistency metrics for Boxer 3D output,
useful for debugging custom loaders / in-the-wild captures that lack
3D ground-truth annotations.
A. dist_to_cloud — distance from the box center to the nearest scene-
cloud point (uses ctx.sdp_global or ctx.uid_to_p3 depending on the
dataset). Catches boxes placed in empty space.
B. iou2d — reproject the box's 8 corners with the same (K, pose) the
loader supplied, take the axis-aligned 2D bbox, compute IoU with
the OWL prompt bbox from owl_2dbbs.csv. Catches K / pose / 3D
position inconsistency.
C. depth_gap — camera-frame Z of the box's 8 corners vs. the q10..q90
Z range of SDP points whose projection lands inside the OWL 2D
bbox. Catches "right angular direction, wrong distance" — a
failure mode systematic K bias or per-segment scale drift
produces, which A and B both tend to miss.
Each metric is orthogonal: one can pass and the others fail. Combining
them as `--filter` thresholds drops clearly-bad per-frame boxes before
`fuse_3d_boxes.fuse_obbs_from_csv` clusters them, producing a cleaner
scene graph.
Outputs (in `<output_dir>/<seq>/`):
- diagnose_by_box.csv per-box metrics
- diagnose_summary.json aggregate distributions
- <write_name>_3dbbs_filtered.csv with --filter
Follows the CLI shape of view_fusion.py / view_tracker.py:
python diagnose_boxer_output.py --input <seq>
python diagnose_boxer_output.py --input <seq> --filter \\
--max_dist_to_cloud 0.5 --min_iou2d 0.1 --max_depth_gap 0.5
Uses build_seq_ctx to pull pose / K / SDP per dataset, so Aria and
CA-1M work out of the box; ScanNet falls back to NaN for A/C when no
pre-scanned cloud is in the context (IoU still works).
README: add 'Demo facebookresearch#3.5' section pointing at the tool between the
Offline Fusion demo and the Online Tracker demo.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A new top-level script
diagnose_boxer_output.pythat computes three orthogonal geometric self-consistency metrics per 3D box. All three use data Boxer's existing pipeline already produces — no new ground truth required — which makes it useful for custom datasets and in-the-wild captures where GT 3D labels aren't available.The three metrics
ctx.sdp_globalorctx.uid_to_p3depending on the dataset).(K, pose)the loader supplied, take the axis-aligned 2D bbox, IoU with the OWL prompt bbox fromowl_2dbbs.csv.K/ pose / 3D-position inconsistency (projected box misses where OWL saw it).q10..q90of SDP Z inside the OWL 2D bbox footprint.Each metric is orthogonal: one can pass and the others fail. The table in the docstring of
diagnose_boxer_output.pylists which failure modes each detects so users know which to consult for which symptom.Use cases
K/pose inputs: are my inputs self-consistent? (B)fuse_obbs_from_csvclusters them, producing a cleaner scene graph. (--filter)CLI
Modeled on
view_fusion.py/view_tracker.py:```bash
python diagnose_boxer_output.py --input nym10_gen1
python diagnose_boxer_output.py --input nym10_gen1 --filter
python diagnose_boxer_output.py --input nym10_gen1 --filter \
--max_dist_to_cloud 0.5 --min_iou2d 0.1 --max_depth_gap 0.5
```
Outputs (under
<output_dir>/<seq>/):diagnose_by_box.csv— per-box metricsdiagnose_summary.json— aggregate distributions (mean, median, p90, p99, max)<write_name>_3dbbs_filtered.csv— bad boxes dropped (only with--filter)The filtered CSV has the same schema as
<write_name>_3dbbs.csv, so it drops intoutils.fuse_3d_boxes.fuse_obbs_from_csvunchanged.Implementation notes
build_seq_ctxto pull pose / K / SDP per dataset, so Aria and CA-1M work out-of-the-box. ScanNet falls back to NaN for A/C when no pre-scanned cloud is in the context (B still works).thresh_3dfilter by matching on label + nearest-projected-center — same heuristic a human would use.Backward compatibility
Purely additive — no existing file modified except the README, which gets a new `Demo #3.5` section between Offline Fusion (#3) and Online Tracker (#4).
Open questions for the reviewer
view_fusion.py/view_tracker.py. If you'd prefertools/orscripts/, say the word.build_seq_ctxfor ScanNet — should I add a path that loads it, or is metric-B-only output enough for the ScanNet case?view_fusion.pyfelt most natural; some users might prefer accepting explicit CSV/PLY paths for portability.Context / provenance
This originated as
pipeline/diagnostic.pyin a downstream VGGT-SLAM → Boxer adapter for iPhone video (https://github.com/zzhang001/box). The metrics caught a per-submap scale-drift issue there that wasn't visible in 3D-only inspection, so I figured it'd be useful for other Boxer users hitting similar "is my custom input OK?" questions. Related upstream issue on aspect-ratio handling: #6.