A deep line-feature matcher with a novel Expectation–Maximization (EM) assignment head, built on a frozen DeepLSD detector front-end and designed to feed a Structure-from-Motion (SfM) pipeline.
Most learned matchers (SuperGlue, LightGlue, GlueStick) end in an optimal-transport / dual-softmax assignment layer driven by appearance alone, deferring geometry to a downstream RANSAC. deep-lfm replaces that head with a differentiable EM module that alternates between estimating soft match responsibilities (E-step) and a two-view geometric model from points sampled along the lines (M-step). The converged responsibilities give calibrated, geometrically-consistent confidence — exactly what SfM needs — and the estimated geometry is a free two-view initializer.
Research repository, trained from scratch. See
docs/DESIGN.md(design),docs/ROADMAP.md(phased status),docs/RESULTS.md(validated results), anddocs/SETUP.md(real-data onboarding).
Phases 0–5 implemented and tested (38 tests). The full pipeline — detector → encoder →
attention backbone → EM head → eval — runs end-to-end with a mock detector and synthetic
data. Real-data training needs MegaDepth/ScanNet + DeepLSD weights wired in — follow
docs/SETUP.md.
Headline (training-free ablation, isolating the head): under match-ambiguous appearance,
OT F1 0.41 → EM 1.00, and the EM head returns a relative pose for free. See docs/RESULTS.md.
pip install -e ".[dev,train,viz]"
python -m pytest -q # 38 tests
python scripts/train.py --config dev # synthetic + mock detector
python scripts/eval.py --mode ablation # OT-vs-EM table- DeepLSD as a frozen line detector (optionally fine-tuned later). It provides geometry only; descriptors are learned here.
- Lines represented by K points sampled along the segment, pooled from a learned dense feature map — robust to fragmentation and the substrate for epipolar geometric scoring.
- Attention backbone (GlueStick-style self/cross attention) → context-aware line descriptors.
- EM assignment head: E-step responsibilities mix appearance + epipolar likelihood + an outlier/no-match state (with inner Sinkhorn for the one-to-one constraint); M-step re-estimates geometry by responsibility-weighted, differentiable solving. Unrolled and end-to-end trainable.
- SfM-aware outputs: confidence, multi-view tracks, two-view geometry, COLMAP-compatible export.
Ordered by expected payoff for closing the held-out gap to GlueStick (ETH3D AP
72.6; we are at 33.6 after homography pretraining). See docs/RESULTS.md for
the data behind each.
- Close the domain gap (biggest lever). Held-out ETH3D overfits by ~10k
because training is ScanNet-indoor only. Train/fine-tune on MegaDepth
(outdoor, matches ETH3D) — fill the stubbed
data/megadepth.pybuild_manifest(COLMAP poses + .h5 depth); ~199 GB download. - Harder homography pretext. The current pretext saturates by ~1k steps (F1 0.99) yet still gave +2–3 AP. Stronger warps + photometric augmentation and a larger/more diverse image set (COCO train, Oxford-Paris) should lift more.
- Pretrained descriptors. Bootstrap the line encoder from a pretrained dense backbone (SuperPoint/DISK) instead of learning the CNN from scratch.
- Fix overfitting directly. Augmentation, weight-decay/dropout sweep, early-stop at the held-out peak (~10k on full ScanNet).
- Matched-protocol GlueStick comparison. Run GlueStick through our ETH3D harness (or ours under glue-factory's protocol) for a true head-to-head — current AP is indicative only (different detector/GT pipeline).
- Robust M-step (RANSAC/IRLS). Pose AUC collapses under heavy mismatching
(no robust estimator today) — see Limitations in
docs/RESULTS.md. - Orchestration resilience. The overnight multi-stage loop's wakeup chain broke after the fine-tune; prefer a single driver script for long pipelines.
- (Stretch) Joint points+lines. GlueStick's core strength; large scope and partially dilutes the EM-on-lines thesis — evaluate before committing.
MIT (this repo). DeepLSD and any pretrained weights are governed by their own licenses — see docs/RELATED.md.