This repo prepares a Jetson vehicle runtime that sends camera frames to a remote RTX 4070 inference service running waynechu/NaVIDA, then publishes safe geometry_msgs/msg/Twist commands to the chassis.
- Jetson camera node publishes
/navida/camera/image_raw. - Jetson controller posts recent camera history + current image + instruction to your remote inference endpoint.
- The 4070 service runs the NaVIDA backend and returns action chunks.
- Jetson executes the first safe action chunk as a bounded
/cmd_velpulse. serial_twistctlsubscribes/cmd_veland writes STM32 serial commands likevcx=0.200,wc=0.800.
The copied chassis bridge lives under ros2_ws/src/sensor_drivers/serial_twistctl, with its local serial dependency in ros2_ws/src/sensor_drivers/serial.
python3 -m pip install -e ".[dev]"
python3 scripts/run_inference_server.py --backend mock --host 127.0.0.1 --port 50051In another shell:
python3 scripts/run_http_client.py
python3 scripts/run_ros2_node.py
python3 -m pytest -qRun this on your remote GPU inference host:
git clone https://github.com/Yangbadger222/VLN.git
cd VLN
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
python -m pip install -e ".[server]"
python scripts/run_inference_server.py --backend hf --host 0.0.0.0 --port 50051 --model-id waynechu/NaVIDA --device cuda --load-in-4bitHealth check:
curl http://127.0.0.1:50051/healthUse --backend mock first if CUDA/model dependencies are not ready yet.
Run this on your Jetson vehicle host:
git clone https://github.com/Yangbadger222/VLN.git
cd VLN
python3 -m pip install --user -U "pip>=24" "setuptools>=68,<80" wheel
python3 -m pip install --user --no-build-isolation -e .
sudo apt update
sudo apt install -y python3-opencv
cd ros2_ws
rosdep install --from-paths src --ignore-src -r -y
colcon build --symlink-install
source install/setup.bash
ros2 launch navida_vehicle navida_jetson.launch.py \
inference_url:=http://REMOTE_INFERENCE_HOST:50051/v1/infer \
serial_port:=/dev/serial_twistctl \
instruction:="Go to the target object in front of you. Approach slowly and stop when close." \
history_size:=4camera_device now defaults to auto, which probes /dev/video0 through /dev/video5 and picks the first device that can return a frame. Override it explicitly with camera_device:=/dev/video4 when you already know the correct capture node.
The Jetson-side bridge now JPEG-compresses ROS image frames before posting them to the 4070. It also sends a rolling history of prior compressed frames (history_size, default 4) plus the current frame so NaVIDA can reason over observation history instead of a single still image. inference_timeout_s defaults to 20.0 and can be raised further during first on-car tests.
Jetson uses ROS 2 Humble's colcon-core, which currently requires setuptools<80. Keep the user-level setuptools>=68,<80 pin above; it supports editable installs without breaking colcon build.
If /dev/serial_twistctl does not exist yet, launch with the actual device, for example serial_port:=/dev/ttyUSB0. If turning is reversed, add angular_z_scale:=-1.0.
/navida/camera/image_raw:sensor_msgs/msg/Image, published bynavida_vehicle camera_publisher./cmd_vel:geometry_msgs/msg/Twist, published bynavida_vehicle remote_controller.serial_twistctl_nodesubscribes/cmd_veland sends serial chassis commands.
max_linear_x: 0.2max_angular_z: 0.45command_timeout_s: 0.5- every non-stop command is followed by an explicit zero
Twistafterstep_duration_s visual_servo_enabled: truetarget_forward_speed: 0.1target_max_angular_z: 0.25target_stop_area: 0.3history_size: 4- inference failure immediately publishes zero
Twist
Tune these in ros2_ws/src/navida_vehicle/config/navida_jetson.yaml or through launch arguments.
By default, the 4070 service asks NaVIDA to use historical observations plus the current observation and return compact action chunks:
{"actions":[{"action":"forward","repeat":1}]}The Jetson executes action chunks through the existing speed clamps and pulse-stop safety layer. This is the primary runtime path for paper-faithful VLN/VLA behavior.
For semantic goal debugging such as "go to the box/chair/door", you can opt into open-vocabulary detector fallback:
python scripts/run_inference_server.py --backend hf --host 0.0.0.0 --port 50051 \
--model-id waynechu/NaVIDA --device cuda --load-in-4bit \
--target-detector-model-id google/owlvit-base-patch32 \
--target-detector-fallbackWhen fallback is enabled and target metadata is present, Jetson can use visual servoing metadata:
{"target":{"visible":true,"center_x":0.50,"area":0.10,"confidence":0.80},"actions":["forward"]}When this metadata is present, Jetson uses center_x and area to turn slowly toward the target, drive when centered, and stop when the target is close. If metadata is missing, the controller falls back to action chunks.