sync: gitlab/main -> github/main by Yangruipis · Pull Request #21 · redai-infra/Relax

Yangruipis · 2026-04-28T03:23:03Z

Routine internal -> external sync.

# 🐛 Bug Fix ## Correct judge prompt spelling in DeepEyes reward - replace `Judgement` and `Judement` with `Judgment` in few-shot examples and prompt text - update the prompt suffix to request `Judgment:` consistently - keep response parsing backward compatible with both `Judgment:` and `Judgement:` labels

# ⭐ Feature ## Add unified device abstraction layer (`relax/utils/device.py`) - Introduce `AcceleratorType` enum: CUDA, NPU, XPU, PPU, ROCM, CPU - Auto-detect hardware via `_detect_accelerator()` with priority-based probing - Support `RELAX_DEVICE_TYPE` env var override for debugging - Provide 25+ thin-wrapper APIs: `current_device()`, `set_device()`, `synchronize()`, `empty_cache()`, `Stream()`, `Event()`, `stream_context()`, `is_initialized()`, etc. - Map distributed backends: CUDA→nccl, NPU→hccl, XPU→xccl, PPU→eccl - Map Ray resource names: CUDA/ROCm→GPU, NPU→NPU, XPU→XPU - Map visible-devices env vars per accelerator type - Abstract NUMA affinity with graceful degradation for non-CUDA backends --- # ♻️ Refactor ## Replace hardcoded `torch.cuda.*` calls across 20+ files - Replace `torch.cuda.current_device()` → `device_utils.current_device()` - Replace `torch.cuda.set_device()` → `device_utils.set_device()` - Replace `torch.cuda.synchronize()` → `device_utils.synchronize()` - Replace `torch.cuda.empty_cache()` → `device_utils.empty_cache()` - Replace `torch.cuda.Stream/Event` → `device_utils.Stream()/Event()` - Replace `torch.cuda.mem_get_info()` → `device_utils.mem_get_info()` - Replace `torch.cuda.device_count()` → `device_utils.device_count()` - Replace `torch.device("cuda:...")` → `device_utils.make_current_torch_device()` - Replace `device="cuda"` → `device=device_utils.get_device_name()` - Replace `"nccl"` backend → `device_utils.get_dist_backend()` - Replace `"GPU"` Ray resource → `device_utils.get_ray_accelerator_name()` - Replace `CUDA_VISIBLE_DEVICES` → `device_utils.get_visible_devices_env_var()` - Wrap CUDA-specific memory profiling APIs with `hasattr` guards - Add CUDA-only annotation to `int4_qat/setup.py` kernel build script

# ⭐ Feature ## Add Qwen3.5-9B single-node fully-async training script - Add `run_deepeyes_qwen35_9B_async.sh` for 8xGPU fully-async DeepEyes training - Resource layout: actor(4) + rollout(2) + reference(1) + actor_fwd(1) - Use `--use-dynamic-batch-size` and `--no-rope-fusion` per latest Qwen3.5 conventions --- # 🐛 Bug Fix ## Add 0-1000 normalized bbox coordinate conversion - Qwen-VL/Qwen2-VL/Qwen3-VL output 0-1000 normalized coords but `_maybe_resize_bbox` treated them as absolute pixels - Add coordinate conversion step before clamping in `_maybe_resize_bbox` - Add `normalize_bbox` parameter to `DeepeyesEnv` (default True) for model-specific control - Qwen2.5-VL users can set `normalize_bbox: false` since it outputs absolute pixel coords - Wire `normalize_bbox` through `build_env` from custom config

…roup recreation

…sion # 🐛 Bug Fix ## Fix IndexError on 1D tensor transpose in fully-async weight sync - Qwen3.5 Bridge outputs expert gate_up_proj as 2D [2*H, D] (cat, no transpose), unlike Qwen3-VL which outputs 3D [2, D_out, D_in] (stack + transpose) - Qwen3.5 ExpertMLPDownProjMapping inherits AutoMapping (no transpose), unlike Qwen3-VL which overrides megatron_to_hf with transpose - Add ndim-based branching in _convert_to_hf_bridge post-processing: 3D → Qwen3-VL path (undo transpose + index), 2D → Qwen3.5 path (chunk) - Detect bridge_expert_transposes_down at init time via __dict__ introspection to decide whether down_proj needs un-transpose --- # ✅ Tests ## Add Qwen3.5 Bridge expert weight conversion tests - Add TestQwen35BridgeMappingOutput: verify 2D cat output, no-transpose down_proj, and megatron_to_hf override detection - Add TestQwen35PostProcessingCorrectness: end-to-end gate_up split and down_proj passthrough correctness - Update _apply_expert_postprocessing helper to accept bridge_expert_transposes_down param

Yangruipis · 2026-04-28T03:24:06Z

Closes #18 #17

NINGBENZHE and others added 6 commits April 28, 2026 11:08

fix: lost multimodal data

9c46ffa

fix(distributed): add retry and port-release delay for NCCL process g…

e725fa5

…roup recreation

Yangruipis requested review from Aurelius84, NINGBENZHE and yxyOo as code owners April 28, 2026 03:23

NINGBENZHE approved these changes Apr 28, 2026

View reviewed changes

Yangruipis merged commit 207ace5 into main Apr 28, 2026
5 checks passed

Yangruipis deleted the sync/from-gitlab branch April 28, 2026 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync: gitlab/main -> github/main#21

sync: gitlab/main -> github/main#21
Yangruipis merged 6 commits into
mainfrom
sync/from-gitlab

Yangruipis commented Apr 28, 2026

Uh oh!

Yangruipis commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Yangruipis commented Apr 28, 2026

Uh oh!

Yangruipis commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants