Checklist
Background
After #1310 introduced the colocated CUDA-IPC weight transfer path, the awex Megatron adapter hardcodes fp8_direct_convert=False, quantization_config=None when iterating HF parameters. A Megatron model trained with FP8EngineConfig is silently dequantized to BF16 before being transferred to the inference engine through awex, even though the legacy xccl path already supports FP8 end-to-end.
Evidence
-
Hardcoded downgrade: areal/experimental/weight_update/awex/megatron_adapter.py:288-289
gathered = all_gather_param(
mcore_name,
param,
fp8_direct_convert=False, # forces BF16
quantization_config=None, # disables FP8 quantize_params
duplicated_param_names=...,
)
-
The xccl path uses the same all_gather_param + convert_to_hf but plumbs self.fp8_direct_convert and self.quantization_config through (megatron_engine.py:1471-1486, 1511-1521), with FP8 gather wired in megatron_utils/megatron.py:86-122 (_all_gather_fp8_tensor_and_concat).
-
Repo-wide grep for fp8|float8 under areal/experimental/weight_update/ returns only the two hardcoded lines. The receiving SGLang adapter has no FP8 awareness either.
Impact for users running Megatron+FP8 with awex colocated mode:
- Silent correctness gap: no error or warning; users may believe FP8 is end-to-end while rollout always sees BF16 weights.
- Bandwidth regression vs xccl: weight transfer is 2× (BF16 vs FP8).
- Inference cannot leverage FP8 acceleration because it never receives FP8 weights and scales.
Potential Solution
awex matches sender/receiver by parameter name through _iter_hf_params() →ParameterMeta → TransferPlan. Enabling FP8 introduces extra *.weight_scale_inv tensors per weight, which BOTH ends must declare.
Stage 1 - Sender (Megatron adapter)
In awex/megatron_adapter.py:_iter_hf_params, replace hardcoded values with the engine's existing fields (self._engine.fp8_direct_convert, self._engine.quantization_config, self._engine.hf_config). Pattern mirrors the xccl path in megatron_engine.py:1471-1521.
Stage 2 - Receiver (SGLang adapter)
awex/sglang_adapter.py must mirror the FP8-aware iteration so that the inference side declares matching ParameterMeta for *.weight_scale_inv tensors, allowing the TransferPlan to align names across both ends.
Stage 3 - Tests
Add tests/experimental/weight_update/test_colocate_fp8.py: Megatron FP8EngineConfig enabled → awex colocated transfer → SGLang. Compare post-transfer weights (after dequant) against the xccl path baseline.
Reference implementation: the xccl path (MegatronEngine._collect_param and _impl_update_weight_from_distributed) already does the equivalent work end-to-end, so Stage 1 is largely a matter of plumbing existing fields through.
Additional Information
Scope constraint: PR #1310 limits colocated mode to TP=PP=EP=1. This work should follow the same scope; broader parallelism support depends on the awex roadmap.
Open question: how should the SGLang side discover FP8 configuration — parse from the running SGLang server, or be told via the gateway / KV store metadata?
Related:
Checklist
areal/api/. If not, please raise a refactor issue first.Background
After #1310 introduced the colocated CUDA-IPC weight transfer path, the awex Megatron adapter hardcodes fp8_direct_convert=False, quantization_config=None when iterating HF parameters. A Megatron model trained with FP8EngineConfig is silently dequantized to BF16 before being transferred to the inference engine through awex, even though the legacy xccl path already supports FP8 end-to-end.
Evidence
Hardcoded downgrade:
areal/experimental/weight_update/awex/megatron_adapter.py:288-289The xccl path uses the same all_gather_param + convert_to_hf but plumbs self.fp8_direct_convert and self.quantization_config through (megatron_engine.py:1471-1486, 1511-1521), with FP8 gather wired in megatron_utils/megatron.py:86-122 (_all_gather_fp8_tensor_and_concat).
Repo-wide grep for fp8|float8 under areal/experimental/weight_update/ returns only the two hardcoded lines. The receiving SGLang adapter has no FP8 awareness either.
Impact for users running Megatron+FP8 with awex colocated mode:
Potential Solution
awex matches sender/receiver by parameter name through _iter_hf_params() →ParameterMeta → TransferPlan. Enabling FP8 introduces extra *.weight_scale_inv tensors per weight, which BOTH ends must declare.
Stage 1 - Sender (Megatron adapter)
In awex/megatron_adapter.py:_iter_hf_params, replace hardcoded values with the engine's existing fields (self._engine.fp8_direct_convert, self._engine.quantization_config, self._engine.hf_config). Pattern mirrors the xccl path in megatron_engine.py:1471-1521.
Stage 2 - Receiver (SGLang adapter)
awex/sglang_adapter.py must mirror the FP8-aware iteration so that the inference side declares matching ParameterMeta for *.weight_scale_inv tensors, allowing the TransferPlan to align names across both ends.
Stage 3 - Tests
Add tests/experimental/weight_update/test_colocate_fp8.py: Megatron FP8EngineConfig enabled → awex colocated transfer → SGLang. Compare post-transfer weights (after dequant) against the xccl path baseline.
Reference implementation: the xccl path (MegatronEngine._collect_param and _impl_update_weight_from_distributed) already does the equivalent work end-to-end, so Stage 1 is largely a matter of plumbing existing fields through.
Additional Information
Scope constraint: PR #1310 limits colocated mode to TP=PP=EP=1. This work should follow the same scope; broader parallelism support depends on the awex roadmap.
Open question: how should the SGLang side discover FP8 configuration — parse from the running SGLang server, or be told via the gateway / KV store metadata?
Related: