Skip to content

sync: gitlab/main -> github/main#25

Merged
NINGBENZHE merged 2 commits into
mainfrom
sync/from-gitlab
Apr 30, 2026
Merged

sync: gitlab/main -> github/main#25
NINGBENZHE merged 2 commits into
mainfrom
sync/from-gitlab

Conversation

@Yangruipis
Copy link
Copy Markdown
Collaborator

Routine internal -> external sync.

@Yangruipis Yangruipis requested a review from NINGBENZHE as a code owner April 29, 2026 13:44
李佳兴 and others added 2 commits April 30, 2026 16:13
# ⭐ Feature

## Add PrefetchBuffer for streaming multimodal datasets

- Implement `PrefetchBuffer` with `set_index_order` pattern (inspired by AReaL)
- Background thread uses `ThreadPoolExecutor` for parallel video decoding (PyAV releases GIL)
- Flow-controlled cache with `max_cached` bound and `_space_available` Event
- Add `--prefetch-chunk-size` and `--prefetch-max-cached` CLI arguments
- Wire arguments through `data_source.py` into `StreamingDataset`

## Add SGLang engine profiling support

- Add 7 profiling CLI arguments (`--sglang-profile`, `--sglang-profile-output-dir`, etc.)
- Implement `_start_sglang_profile` / `_stop_sglang_profile` in sglang_rollout
- Support `num_steps` auto-stop, per-stage profiling, stack and shape recording

## Add pipeline batch encoding for multimodal generate

- Add `--mm-encode-batch-groups-size` argument for encode-generate overlap
- Split samples into batches: encode batch N, send to SGLang, encode batch N+1 concurrently
- Implement `_encode_multimodal_inputs` for base64 encoding of images/videos/audio

---

# 📝 Documentation

## Sync new parameters to configuration docs

- Add prefetch parameters to Dataset section (en/zh)
- Add `--mm-encode-batch-groups-size` to Multimodal Data section (en/zh)
- Add 7 SGLang profiling parameters to SGLang Engine Parameters section (en/zh)
# 🐛 Bug Fix

## Fix torch.cuda patch broken by device abstraction refactor

- The device abstraction commit (632b29c5) replaced hardcoded
  `torch.cuda.get_device_properties` / `torch.cuda.get_device_capability`
  patch targets with `torch.{device_utils.get_device_name()}.*`
- When no accelerator is available, `get_device_name()` returns `"cpu"`,
  so patches targeted `torch.cpu.*` instead of `torch.cuda.*`
- Megatron validate_args internally always calls `torch.cuda.*`,
  so the patches must target `torch.cuda` regardless of device abstraction
- Restore hardcoded `torch.cuda.*` patch targets with explanatory comment
@NINGBENZHE NINGBENZHE merged commit 01973f3 into main Apr 30, 2026
5 checks passed
@Yangruipis Yangruipis deleted the sync/from-gitlab branch April 30, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants