Background:
I'm adapting Boxer to indoor LiDAR scans (pinhole RGB tiles + dense visible point clouds, not Aria). Geometry is verified: point cloud projects correctly onto the image, 2D detections look good. I use rotated0=True for Aria compatibility.
Core issue:
on some frames, adding SDP from dense LiDAR causes 3D boxes to shift or get filtered out. Most frames work fine with SDP, but a subset of frames degrade noticeably, while --no_sdp stays stable across all frames.
I suspect sdp_to_patches() assumes sparse Aria MPS feature points. Dense scan points (walls, floors, ceilings) dominate per-patch median depth, giving BoxerNet a misleading depth prior. I tried edge-guided, nearest-depth, and patch-balanced sampling — none improved over simple random downsampling, and all hurt previously good frames.
Questions:
- Does sdp_to_patches() assume Aria-style sparse SDP distribution?
- Any guidance on converting dense scans to Boxer-compatible SDP?
Example of badcase:

Below is the corresponding point cloud, downsampled to ≤ 3,000 points (the raw scan contains tens of thousands). Increasing the limit to 10,000 points actually makes the 3D predictions worse:

Background:
I'm adapting Boxer to indoor LiDAR scans (pinhole RGB tiles + dense visible point clouds, not Aria). Geometry is verified: point cloud projects correctly onto the image, 2D detections look good. I use rotated0=True for Aria compatibility.
Core issue:
on some frames, adding SDP from dense LiDAR causes 3D boxes to shift or get filtered out. Most frames work fine with SDP, but a subset of frames degrade noticeably, while --no_sdp stays stable across all frames.
I suspect sdp_to_patches() assumes sparse Aria MPS feature points. Dense scan points (walls, floors, ceilings) dominate per-patch median depth, giving BoxerNet a misleading depth prior. I tried edge-guided, nearest-depth, and patch-balanced sampling — none improved over simple random downsampling, and all hurt previously good frames.
Questions:
Example of badcase:

Below is the corresponding point cloud, downsampled to ≤ 3,000 points (the raw scan contains tens of thousands). Increasing the limit to 10,000 points actually makes the 3D predictions worse: