Skip to content

Add PAI-Bench-C reproduction guide for Cosmos3#224

Open
trungtpham wants to merge 1 commit into
NVIDIA:mainfrom
trungtpham:features/paibenchc-reproduce
Open

Add PAI-Bench-C reproduction guide for Cosmos3#224
trungtpham wants to merge 1 commit into
NVIDIA:mainfrom
trungtpham:features/paibenchc-reproduce

Conversation

@trungtpham

@trungtpham trungtpham commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Adds an end-to-end recipe for reproducing PAI-Bench Conditional Generation (PAI-Bench-C) results with Cosmos3 using the native Cosmos Framework PyTorch entrypoint.

New files under evaluation/cosmos3/generator/paibench_c/:

  • README.md: reference scores table, sampling settings, dataset layout, step-by-step generation and evaluation commands.
  • assets/prompts.json: 600 task entries, each with the fully-upsampled opus JSON caption used in the internal evaluation run, control-signal paths, and a shared negative prompt. All 600 tasks use the exact prompts from the published benchmark.
  • run_with_cosmos_framework.ipynb: self-contained notebook covering demo mode (1–N tasks, single modality) and optional full sweeps across all four modalities (edge, blur, depth, seg). Includes a demo evaluation step that runs compute_metrics.py on generated outputs and prints the metrics JSON.
  • .gitignore: excludes runtime artifacts (outputs/, .cache/, dataset clone, executed notebooks).

PAIBench-C Reproducibility
image
image

@trungtpham trungtpham force-pushed the features/paibenchc-reproduce branch 14 times, most recently from 80a3c53 to 4981e86 Compare June 22, 2026 18:20
@qmiao-hub qmiao-hub requested a review from yaoxu-crypto June 22, 2026 18:44
Comment thread evaluation/cosmos3/generator/paibench_c/README.md Outdated
Comment thread evaluation/cosmos3/generator/paibench_c/run_with_cosmos_framework.ipynb Outdated
Comment thread evaluation/cosmos3/generator/paibench_c/README.md Outdated
Comment thread evaluation/cosmos3/generator/paibench_c/README.md Outdated
@trungtpham trungtpham force-pushed the features/paibenchc-reproduce branch 4 times, most recently from b0233cc to f86e7ff Compare June 25, 2026 22:42
@trungtpham

Copy link
Copy Markdown
Contributor Author

Paibench-c results on demo 4 samples run by @trungtpham
image

@trungtpham

Copy link
Copy Markdown
Contributor Author

@lfengad please help review this PR? Thanks

Adds a self-contained end-to-end script and notebook for reproducing
Cosmos3 PAI-Bench-C results using the public physical-ai-bench library.

Key fixes:
- Remove hardcoded --dp-shard-size/--dp-replicate-size/--cp-size/--cfgp-size
  from generation torchrun; let --parallelism-preset=latency auto-shard so
  Cosmos3-Super (32B) fits across multiple GPUs without OOM.
- Include checkpoint slug in output path (demo-Cosmos3-Nano/, Cosmos3-Super/,
  etc.) so Nano and Super runs no longer overwrite each other.
- Add SKIP_GEN=1 flag to skip generation and evaluate existing videos.
- Strengthen paibench venv health check: detect broken Python symlinks from
  NFS stale handles and auto-rebuild the venv.
- Add explicit UV_PROJECT_ENVIRONMENT to all paibench uv/pip calls so uv
  never auto-discovers a stale .venv from the working directory.
- Compute GT seg/depth once (first modality) and reuse cache for the rest.
@trungtpham trungtpham force-pushed the features/paibenchc-reproduce branch from f86e7ff to cf2e416 Compare June 25, 2026 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants