Navigate a robot through a maze using only camera images — no map, GPS, or odometry required.
Built for the Robot Vision course (ROB-GY 6203) at NYU. The system constructs a topological graph from explored images offline, then navigates in real-time by matching the robot's live camera feed against the graph using learned visual features.
voyager_demo.mp4
voyager_actual_demo.mp4
Phase 1 — Graph Building (Offline)
- Extract 512-D global descriptors from exploration images using CosPlace (ResNet backbone)
- Build a BallTree index for fast nearest-neighbor retrieval (~1-2ms per query)
- For each image, query top-5 neighbors by CosPlace distance
- Verify edges with SuperPoint keypoint detection + SuperGlue feature matching
- Accept edges only when BOTH appearance similarity (distance < 0.25) AND geometric consistency (≥80 RANSAC inliers) are satisfied
- Detect loop closures via temporal (≥50 frame gap) and spatial (≤5 hop proximity) checks
Phase 2 — Navigation (Online)
- Localize: Match live FPV frame against graph using CosPlace retrieval + SuperGlue verification
- Plan: Run A* shortest path from current node to goal
- Execute: Move toward next waypoint, re-localize and re-plan every frame
- Recover: Stuck detection (>10 frames same location) triggers alternating random turns; search mode rotates up to 6 times to find target
├── autonomous_navigator.py # Main navigation pipeline (CosPlace + SuperGlue + A*)
├── baseline.py # Enhanced baseline with graph construction
├── baseline_lv1.py # Level 1 baseline implementation
├── player.py # Game interface and keyboard controls
├── environment.yaml # Conda environment
└── requirements.txt # Python dependencies
conda env create -f environment.yaml
conda activate game# Keyboard exploration
python player.py
# Autonomous navigation
python autonomous_navigator.py- Dual verification threshold — Single criteria (appearance OR geometry alone) failed frequently. Requiring both eliminates false matches from similar-looking corridors and accidental geometric alignment.
- Learned descriptors over hand-crafted — CosPlace outperformed SIFT+VLAD in both discrimination and speed (1-2ms vs 50-100ms per query).
- Waypoint locking — 5-hop threshold prevents oscillation between nearby graph nodes during navigation.
- Graph topology tuning —
knn_max_dist=0.30andgeo_edge_min_inliers=80balanced connectivity vs. false edges.
- Successfully navigated to multiple target locations in the maze
- Loop closure detection identified shortcuts during navigation
- Stuck recovery reliably escaped dead ends with alternating turn strategy
- Graph-based A* planning outperformed sequential exploration
Python · PyTorch · CosPlace · SuperPoint + SuperGlue · OpenCV · NetworkX · scikit-learn (BallTree) · NumPy · pygame
Tarunkumar Palanivelan · Vivekananda Swamy Mattam · Nishanth Pushparaju · Leo Kong
