add mcap dataset support#10
Conversation
There was a problem hiding this comment.
Pull request overview
Adds MCAP as an additional dataset export format to the SLAM dataset-generation pipeline, allowing datasets to be inspected as typed, time-aligned episode recordings (e.g., in Foxglove Studio), while retaining the existing Zarr replay-buffer export.
Changes:
- Added
07_generate_mcap_dataset.pyto write one.mcapfile per episode with robot state topics and JPEG-compressed camera images (foxglove.CompressedImage). - Renamed/retitled step-7 Zarr generation script to
07_generate_zarr_dataset.pyand updated usage docs accordingly. - Updated
dataset_generation_pipeline.pyandREADME.mdto support--format {mcap,zarr}and document both dataset formats; addedmcapdependency.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/scripts_slam_pipeline/07_generate_zarr_dataset.py | Updates usage text to match the renamed Zarr dataset generator entrypoint. |
| scripts/scripts_slam_pipeline/07_generate_mcap_dataset.py | New MCAP exporter producing per-episode MCAP files with JSON-encoded schemas and JPEG images. |
| scripts/dataset_generation_pipeline.py | Adds --format/-f flag and routes step 7 to MCAP (default) or Zarr generator. |
| pyproject.toml | Adds mcap as a runtime dependency. |
| README.md | Documents dataset format selection and output structure for MCAP vs Zarr. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lukeschmitt-tr
left a comment
There was a problem hiding this comment.
Please provide us with an example mcap file
| "-or", | ||
| "--out_res", | ||
| type=str, | ||
| default="224,224", |
There was a problem hiding this comment.
will this crop the images before they are saved? we should include the native resolution and crop it during replay
There was a problem hiding this comment.
Added full resolution by default
9e49bd5
d1f96ef to
fd6687f
Compare
fd6687f to
9e49bd5
Compare
| encoder = av.CodecContext.create("libx264", "w") | ||
| encoder.width = w | ||
| encoder.height = h | ||
| encoder.pix_fmt = "yuv420p" | ||
| encoder.time_base = Fraction(1, int(cfg.video_fps)) | ||
| encoder.options = { | ||
| "preset": "ultrafast", | ||
| "crf": "20", | ||
| "tune": "zerolatency", | ||
| "x264-params": "bframes=0:repeat-headers=1", | ||
| } |
| INFO: Done! 2 videos used in total! | ||
| ############### 07_generate_dataset (mcap) ############### | ||
| INFO: Collected 2 episodes, 1 grippers, 1 cameras. | ||
| INFO: Writing 2 episode MCAP files to example_gopro13_dataset/dataset_mcap |

TL;DR
Adds MCAP as a supported dataset output format.
What changed?
07_generate_mcap_dataset.pywrites one.mcapfile per episode to adataset_mcap/directory. Each file contains time-aligned robot state messages and JPEG-compressed camera images using thefoxglove.CompressedImageschema.07_generate_replay_buffer.pyhas been renamed to07_generate_zarr_dataset.pyto better reflect its purpose.dataset_generation_pipeline.pynow accepts a--format/-fflag (mcaporzarr, defaulting tomcap) that selects which generation script to invoke in step 7.## Dataset Formatssection in the README documents both formats, their output structure, and the data they contain.mcap>=1.3.1has been added as a dependency.How to test?
Run the pipeline with each format and verify the outputs:
Open a generated
.mcapfile in Foxglove Studio to verify robot state topics and camera image streams are correctly populated and time-aligned.