vis_nav_AI4CE

Autonomous topological navigation in a maze simulator using CosPlace + SuperGlue recall-and-verify, with real-time A* re-planning. Toggle autopilot mid-game.

An autonomous player for the ai4ce/vis_nav_game maze simulator. The agent drives itself to a target image using a topological map built from its own exploration footage, two pretrained vision models working as a recall-and-verify pair, and an A* planner that re-runs every frame. You toggle autopilot with the a key during navigation; everything else (localization, planning, steering) runs in real time.

This is the working version used in our demo. Drop your exploration data in data/images_subsample/, press a, and watch the player solve the maze.

Demo

Sped-up demo (~2 min): https://youtu.be/3dnE2oHW1Zc
Full team demo video: https://youtu.be/shXDClZoEfM

What's in here

An autonomous mode you can toggle mid-game. Press a during the navigation phase and the player takes over. Press a again to take it back. State resets cleanly on each toggle so you can hand control back and forth.

A real-time debug overlay. Two extra OpenCV windows open in autopilot: Auto: Current Location (the database frame the player thinks it is at) and Auto: Next Target (the database frame it is steering toward). You can watch the planner re-route in real time when localization jumps.

Inspection keys. While the game is running:

Key	What it does
`↑ ↓ ← →`	Manual movement (forward, backward, turn left, turn right)
`Space`	`CHECKIN` (claim you have reached the target)
`Esc`	`QUIT` the current phase
`a`	Toggle autopilot (navigation phase only)
`g`	Show the current target image
`q`	Show the database frame closest to your current FPV
`l`	Print a loop-closure summary to the terminal
`p`	Recompute and visualize the A* path

Two pretrained models, used for what each is good at. CosPlace (ResNet-18, 512-D) gives fast top-K recall over the database. SuperPoint + SuperGlue verifies each candidate with keypoint matching and an essential-matrix inlier count. CosPlace alone is fast but aliases on repeated wall textures. SuperGlue alone is too slow to run against the whole database every frame. Together they are both fast and precise.

A topological graph with geometry-verified shortcuts. The map is a networkx graph over exploration frames. Every consecutive pair gets a sequence edge so there is always some path. On top of that, every node gets up to five extra edges to its CosPlace nearest neighbours, but only if SuperGlue returns at least 50 inliers between the two views. Those verified shortcuts are what let the planner take diagonal hops across the maze instead of replaying the recording in reverse.

Idle compression. When two consecutive frames have nearly identical descriptors (the player was sitting still during exploration), the edge between them gets weight 0.01 instead of 1.0. A* steps through those for free, so idle gaps in the recording do not bloat the planned path.

Multi-crop descriptors. Each frame's CosPlace embedding is the average of three crops (full, left half, right half), L2-normalized after averaging. This tolerates small left/right viewpoint shifts so the agent does not have to be perfectly centered on a stored frame to localize.

Graceful degradation. If SuperGlue is not installed, the player detects this at import time, logs a warning, and runs in CosPlace-only mode. Localization gets less precise, geometry-verified edges drop out of the graph, but the rest of the pipeline still works end-to-end.

A stuck detector. If the localized index does not change for ten frames in a row, the autopilot forces a LEFT turn to break symmetry. This stops the agent from oscillating in front of a dead-end view that keeps localizing to the same node.

How it works

The player factors a hard metric navigation problem into three easier sub-problems and uses well-tested CV primitives for each.

Stage 1: Build the map (`pre_nav_compute`)

When the exploration phase ends, the framework calls pre_navigation(). The player loads CosPlace, scans data/images_subsample/ in natural sort order, and computes a 512-D descriptor for every image. Those go into a (N, 512) matrix and a BallTree for fast K-NN lookup. Because the descriptors are L2-normalized, Euclidean distance in the tree is monotonic in cosine distance, so top-K BallTree retrieval is equivalent to top-K cosine NN. SuperPoint and SuperGlue load lazily with the indoor weights.

Stage 2: Build the graph (`build_graph_structure`)

Two kinds of edges go in:

Sequence edges. Every pair (i, i+1) of consecutive frames gets an edge. Idle-streak pairs get weight 0.01; everything else gets weight 1.0. This guarantees a spanning path through the graph regardless of what the verifier accepts later.
CosPlace KNN edges. For each node, look up its five CosPlace nearest neighbours. If the descriptor distance is under 0.4 and SuperGlue returns at least 50 inliers between the two views, add the edge. Otherwise skip it. A pairwise cache makes sure each (i, j) pair only goes through SuperGlue once.

Stage 3: Pin the goal

The framework hands the player four target images (front, right, back, left of the goal). The player runs the front view through the same recall-and-verify pipeline used for self-localization and stores the matched database index as self.goal. From this point on, navigation is "reach node self.goal in graph G" rather than free-form metric navigation. That reduction is the trick that makes a topological player solve a metric maze.

Stage 4: Drive (`auto_act`)

Every frame, while autopilot is on:

Localize. CosPlace returns the top 5 candidates from the database; SuperGlue picks the one with the most inliers. If no candidate clears 30 inliers, fall back to CosPlace top-1.
Check for arrival. If the localized index is within one of the goal index, return CHECKIN.
Plan. nx.astar_path(G, current_idx, goal, weight="weight"). This re-runs every frame, so any localization jump self-corrects on the next planning step.
Step. Take path[1] as the next waypoint.
Steer. Match SuperGlue keypoints between the current FPV and the waypoint image, take the horizontal centroid of the matched keypoints in the FPV, and:
- centroid left of center by 30 px → LEFT
- centroid right of center by 30 px → RIGHT
- otherwise → FORWARD
- fewer than 8 matches → FORWARD (don't spin blindly).
Detect stuck. If the localized index does not change for ten frames, force a LEFT to escape symmetry.

The steering rule deliberately ignores essential-matrix yaw and CosPlace score even though both are available. We tried richer estimators (align_step_to_next, geometric_servo_step) and the centroid rule won on the narrow-FOV camera. Simpler controllers degrade more gracefully when matches get noisy.

BreadCrumbs

A few details that took a while to get right and would silently break the pipeline if missed:

cv2.recoverPose returns a {0, 255} mask, not {0, 1}. The inlier count uses (mask_pose > 0).sum() instead of mask_pose.sum(), so counts are not 255× inflated and the inlier threshold actually means what you think it means.
Image filenames are sorted with natsorted, not Python's default string sort, so 0009.jpg < 0010.jpg < 0100.jpg instead of the lexicographic order that would scramble the sequence edges.
CosPlace descriptors are L2-normalized after the multi-crop average, not before, so the BallTree's Euclidean distance stays monotonic in cosine distance.
Models load lazily (CosPlace and SuperGlue both). A manual-play session does not pay the GPU cost, and the SuperGlue path is conditional on the import succeeding.
Pygame's QUIT event is checked before any autopilot logic, so closing the window always works even when the player is mid-step.

Architecture

Two diagrams: what runs once when exploration ends, and what runs every game tick during navigation. Editable source: docs/architecture.drawio (open in draw.io).

Build the map (once, when EXPLORATION ends):

Per-frame autopilot loop (every tick during NAVIGATION):

Repository layout

.
├── baseline_lv1.py          the player (single file, ~2,500 lines)
├── player.py                upstream keyboard player, unmodified
├── environment.yaml         conda environment
├── requirements.txt         pip-only fallback
└── docs/
    ├── architecture.svg     rendered architecture
    ├── architecture.drawio  editable source
    └── lv1_demo.gif         demo loop

Setup

conda env create -f environment.yaml
conda activate vis_nav
pip install git+https://github.com/ai4ce/vis_nav_game_public.git

For the optional SuperGlue verifier, clone SuperGluePretrainedNetwork either next to this repo or one directory up. The player searches both locations at import time and falls back to CosPlace-only if neither exists.

Running

python baseline_lv1.py

The first time you run, the framework starts in the exploration phase. Drive the maze manually with the arrow keys. When you press Esc to end exploration, the player builds the descriptor database and the graph (this takes a minute on a GPU; longer on CPU), then the navigation phase begins. Press a to hand over to the autopilot.

Place your exploration data under data/images_subsample/ (frames named 0001.jpg, 0002.jpg, …) and your startup.json at the repo root before running. Both are gitignored.

Acknowledgements

ai4ce/vis_nav_game for the simulator and the baseline scaffolding.
CosPlace (Berton et al.) for the place-recognition descriptors.
SuperPoint + SuperGlue (Magic Leap) for the keypoint extractor and matcher.
The AI4CE course staff and our teammates.

Author

Nishant Pushparaju · nishantpushparaju@gmail.com · github.com/Nishant-ZFYII

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseline_lv1.py		baseline_lv1.py
environment.yaml		environment.yaml
player.py		player.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vis_nav_AI4CE

Demo

What's in here

How it works

Stage 1: Build the map (`pre_nav_compute`)

Stage 2: Build the graph (`build_graph_structure`)

Stage 3: Pin the goal

Stage 4: Drive (`auto_act`)

BreadCrumbs

Architecture

Repository layout

Setup

Running

Acknowledgements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vis_nav_AI4CE

Demo

What's in here

How it works

Stage 1: Build the map (pre_nav_compute)

Stage 2: Build the graph (build_graph_structure)

Stage 3: Pin the goal

Stage 4: Drive (auto_act)

BreadCrumbs

Architecture

Repository layout

Setup

Running

Acknowledgements

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Stage 1: Build the map (`pre_nav_compute`)

Stage 2: Build the graph (`build_graph_structure`)

Stage 4: Drive (`auto_act`)

Packages