See How It Thinks: Mixed Palletizing with Explainable Visual Reasoning

Problem

In warehouse palletizing, robots handle boxes blindly — following fixed rules regardless of contents, condition, or fragility. Damaged boxes get stacked, fragile items get crushed, and when something goes wrong, there's no explanation why.

Solution

We built an end-to-end palletizing system powered by NVIDIA Cosmos Reason2 8B, fine-tuned with LoRA on synthetic data. Given only a camera image, the model:

Infers box contents without barcodes or labels
Detects damage and routes unsafe boxes to human inspection
Decides placement parameters — position, speed, grip strength — based on weight and fragility reasoning

Every decision includes a full chain-of-thought trace, making the system fully auditable.

Full NVIDIA Stack

Our system leverages the complete NVIDIA ecosystem: Cosmos Reason2 for visual reasoning, Isaac Sim for simulation and synthetic data, cuRobo for GPU-accelerated motion planning, vLLM for inference, and Jetson Thor for edge deployment. Four containerized services form a continuous control loop from perception to execution.

Running

Minimum Requirements

Requirement	Version
NVIDIA Driver	585+
CUDA	12.8+
nvidia-container-toolkit	installed and configured
Docker	with Compose V2

Tested Hardware

Hardware	Architecture
NVIDIA RTX 4090	Ada Lovelace
NVIDIA H200	Hopper
NVIDIA RTX PRO 6000 / Jetson Thor	Blackwell

Prerequisites

Install uv (Python package manager):

curl -LsSf https://astral.sh/uv/install.sh | sh

How to Run

1. Configure environment variables

cp docker/.env.example docker/.env

Edit docker/.env to set your configuration. Key variables:

Variable	Description	Default
`HF_TOKEN`	HuggingFace token (required for gated models)	—
`INFERENCE_MODEL`	Model ID (`nvidia/Cosmos-Reason2-2B` or `nvidia/Cosmos-Reason2-8B`)	`nvidia/Cosmos-Reason2-8B`
`LORA_ADAPTER_PATH`	LoRA adapter path inside container (e.g. `/adapters/cosmos-reason2-8b`)	—
`LORA_MODEL`	LoRA model name for vLLM (e.g. `palletize`)	—
`VLLM_MAX_MODEL_LEN`	Max model context length	`5120`
`VLLM_GPU_MEMORY_UTILIZATION`	Fraction of GPU memory for vLLM	`0.5`
`SIM_GPU_DEVICE` / `INFERENCE_GPU_DEVICE` / `CUROBO_GPU_DEVICE`	GPU device IDs (for multi-GPU setups)	`0`
`HF_CACHE_DIR`	Host directory for HuggingFace model cache	Docker volume

The first Docker build can take 30+ minutes due to compiling CUDA extensions and downloading model weights. It is recommended to download the Cosmos Reason2 models beforehand and mount your host HuggingFace cache into Docker by setting HF_CACHE_DIR (e.g. ~/.cache/huggingface) to avoid re-downloading models on every container rebuild.

# Pre-download the model (pick one)
hf download nvidia/Cosmos-Reason2-2B
hf download nvidia/Cosmos-Reason2-8B

2. Download LoRA adapters

Note: The VLM prompt was recently redesigned with a new reasoning structure and output format. LoRA weights trained on the previous prompt are not compatible and will produce degraded results. We are currently retraining the adapters and will publish updated weights as soon as they are ready. In the meantime, the system runs correctly without a LoRA adapter using the base model.

make adapters

This downloads the fine-tuned LoRA adapters into adapters/2B and adapters/8B. Requires uv — see Prerequisites above. Skips automatically if adapters are already present.

3. Launch the system

make docker-up      # build and start all 4 services
make docker-logs    # follow logs
make docker-down    # stop and remove containers

4. Access the UI

Open http://localhost:3000 in your browser.

Service	Port	Health Check
app-server	8000	`curl localhost:8000/api/health`
sim-server	8100	`curl localhost:8100/sim/health`
inference-server	8200	`curl localhost:8200/health`
frontend	3000	`curl localhost:3000/api/status`

Test mode (no HuggingFace token needed, uses a tiny model):

make docker-test

For Developers

To set up the project locally for development:

make init

This creates the virtual environment, installs all dependencies, and downloads the LoRA adapters.

Team

Kyungchan Son, Minsoo Song, Yujeong Jeong, Yuri Rocha — Doosan Robotics

Disclaimer

This is not an official Doosan Robotics product. It is a proof-of-concept created for the NVIDIA Cosmos Cookoff hackathon.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
adapters		adapters
app		app
docker		docker
motion		motion
scripts		scripts
sim		sim
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Makefile		Makefile
README.md		README.md
dev.sh		dev.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

See How It Thinks: Mixed Palletizing with Explainable Visual Reasoning

Problem

Solution

Full NVIDIA Stack

Running

Minimum Requirements

Tested Hardware

Prerequisites

How to Run

For Developers

Team

Disclaimer

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

See How It Thinks: Mixed Palletizing with Explainable Visual Reasoning

Problem

Solution

Full NVIDIA Stack

Running

Minimum Requirements

Tested Hardware

Prerequisites

How to Run

For Developers

Team

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages