sync: gitlab/main -> github/main#21
Merged
Merged
Conversation
# 🐛 Bug Fix ## Correct judge prompt spelling in DeepEyes reward - replace `Judgement` and `Judement` with `Judgment` in few-shot examples and prompt text - update the prompt suffix to request `Judgment:` consistently - keep response parsing backward compatible with both `Judgment:` and `Judgement:` labels
# ⭐ Feature
## Add unified device abstraction layer (`relax/utils/device.py`)
- Introduce `AcceleratorType` enum: CUDA, NPU, XPU, PPU, ROCM, CPU
- Auto-detect hardware via `_detect_accelerator()` with priority-based probing
- Support `RELAX_DEVICE_TYPE` env var override for debugging
- Provide 25+ thin-wrapper APIs: `current_device()`, `set_device()`, `synchronize()`,
`empty_cache()`, `Stream()`, `Event()`, `stream_context()`, `is_initialized()`, etc.
- Map distributed backends: CUDA→nccl, NPU→hccl, XPU→xccl, PPU→eccl
- Map Ray resource names: CUDA/ROCm→GPU, NPU→NPU, XPU→XPU
- Map visible-devices env vars per accelerator type
- Abstract NUMA affinity with graceful degradation for non-CUDA backends
---
# ♻️ Refactor
## Replace hardcoded `torch.cuda.*` calls across 20+ files
- Replace `torch.cuda.current_device()` → `device_utils.current_device()`
- Replace `torch.cuda.set_device()` → `device_utils.set_device()`
- Replace `torch.cuda.synchronize()` → `device_utils.synchronize()`
- Replace `torch.cuda.empty_cache()` → `device_utils.empty_cache()`
- Replace `torch.cuda.Stream/Event` → `device_utils.Stream()/Event()`
- Replace `torch.cuda.mem_get_info()` → `device_utils.mem_get_info()`
- Replace `torch.cuda.device_count()` → `device_utils.device_count()`
- Replace `torch.device("cuda:...")` → `device_utils.make_current_torch_device()`
- Replace `device="cuda"` → `device=device_utils.get_device_name()`
- Replace `"nccl"` backend → `device_utils.get_dist_backend()`
- Replace `"GPU"` Ray resource → `device_utils.get_ray_accelerator_name()`
- Replace `CUDA_VISIBLE_DEVICES` → `device_utils.get_visible_devices_env_var()`
- Wrap CUDA-specific memory profiling APIs with `hasattr` guards
- Add CUDA-only annotation to `int4_qat/setup.py` kernel build script
# ⭐ Feature ## Add Qwen3.5-9B single-node fully-async training script - Add `run_deepeyes_qwen35_9B_async.sh` for 8xGPU fully-async DeepEyes training - Resource layout: actor(4) + rollout(2) + reference(1) + actor_fwd(1) - Use `--use-dynamic-batch-size` and `--no-rope-fusion` per latest Qwen3.5 conventions --- # 🐛 Bug Fix ## Add 0-1000 normalized bbox coordinate conversion - Qwen-VL/Qwen2-VL/Qwen3-VL output 0-1000 normalized coords but `_maybe_resize_bbox` treated them as absolute pixels - Add coordinate conversion step before clamping in `_maybe_resize_bbox` - Add `normalize_bbox` parameter to `DeepeyesEnv` (default True) for model-specific control - Qwen2.5-VL users can set `normalize_bbox: false` since it outputs absolute pixel coords - Wire `normalize_bbox` through `build_env` from custom config
…sion # 🐛 Bug Fix ## Fix IndexError on 1D tensor transpose in fully-async weight sync - Qwen3.5 Bridge outputs expert gate_up_proj as 2D [2*H, D] (cat, no transpose), unlike Qwen3-VL which outputs 3D [2, D_out, D_in] (stack + transpose) - Qwen3.5 ExpertMLPDownProjMapping inherits AutoMapping (no transpose), unlike Qwen3-VL which overrides megatron_to_hf with transpose - Add ndim-based branching in _convert_to_hf_bridge post-processing: 3D → Qwen3-VL path (undo transpose + index), 2D → Qwen3.5 path (chunk) - Detect bridge_expert_transposes_down at init time via __dict__ introspection to decide whether down_proj needs un-transpose --- # ✅ Tests ## Add Qwen3.5 Bridge expert weight conversion tests - Add TestQwen35BridgeMappingOutput: verify 2D cat output, no-transpose down_proj, and megatron_to_hf override detection - Add TestQwen35PostProcessingCorrectness: end-to-end gate_up split and down_proj passthrough correctness - Update _apply_expert_postprocessing helper to accept bridge_expert_transposes_down param
Collaborator
Author
NINGBENZHE
approved these changes
Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Routine internal -> external sync.