[Feature] Support FP8 weight transfer in awex colocated (CUDA-IPC) path

## Checklist

- [x] This feature will maintain backward compatibility with the current APIs in
  `areal/api/`. If not, please raise a refactor issue first.

## Background

After #1310 introduced the colocated CUDA-IPC weight transfer path, the awex Megatron adapter hardcodes fp8_direct_convert=False, quantization_config=None when iterating HF parameters. A Megatron model trained with FP8EngineConfig  is silently dequantized to BF16 before being transferred to the inference  engine through awex, even though the legacy xccl path already supports FP8  end-to-end. 

**Evidence**

- Hardcoded downgrade: `areal/experimental/weight_update/awex/megatron_adapter.py:288-289`

  ```python
  gathered = all_gather_param(
      mcore_name,
      param,
      fp8_direct_convert=False,    # forces BF16
      quantization_config=None,    # disables FP8 quantize_params
      duplicated_param_names=...,
  )
  ```

- The xccl path uses the same all_gather_param + convert_to_hf but plumbs self.fp8_direct_convert and self.quantization_config through (megatron_engine.py:1471-1486, 1511-1521), with FP8 gather wired in megatron_utils/megatron.py:86-122 (_all_gather_fp8_tensor_and_concat).  

- Repo-wide grep for fp8|float8 under areal/experimental/weight_update/ returns only the two hardcoded lines. The receiving SGLang adapter has no FP8 awareness either.    

**Impact** for users running Megatron+FP8 with awex colocated mode:

1. Silent correctness gap: no error or warning; users may believe FP8 is end-to-end while rollout always sees BF16 weights.
2. Bandwidth regression vs xccl: weight transfer is 2× (BF16 vs FP8).
3. Inference cannot leverage FP8 acceleration because it never receives FP8 weights and scales.  

## Potential Solution

awex matches sender/receiver by parameter name through _iter_hf_params() →ParameterMeta → TransferPlan. Enabling FP8 introduces extra *.weight_scale_inv tensors per weight, which BOTH ends must declare.    

**Stage 1 - Sender (Megatron adapter)**

In awex/megatron_adapter.py:_iter_hf_params, replace hardcoded values with the engine's existing fields (self._engine.fp8_direct_convert, self._engine.quantization_config, self._engine.hf_config). Pattern mirrors the xccl path in megatron_engine.py:1471-1521.   

**Stage 2 - Receiver (SGLang adapter)**

awex/sglang_adapter.py must mirror the FP8-aware iteration so that the inference side declares matching ParameterMeta for *.weight_scale_inv tensors, allowing the TransferPlan to align names across both ends. 

**Stage 3 - Tests**

Add tests/experimental/weight_update/test_colocate_fp8.py: Megatron FP8EngineConfig enabled → awex colocated transfer → SGLang. Compare post-transfer weights (after dequant) against the xccl path baseline.    

Reference implementation: the xccl path (MegatronEngine._collect_param and _impl_update_weight_from_distributed) already does the equivalent work end-to-end, so Stage 1 is largely a matter of plumbing existing fields through.   

## Additional Information

Scope constraint: PR #1310 limits colocated mode to TP=PP=EP=1. This work should follow the same scope; broader parallelism support depends on the awex roadmap.       

Open question: how should the SGLang side discover FP8 configuration — parse from the running SGLang server, or be told via the gateway / KV store metadata?  

**Related**:
- #1302 - parent low-precision training direction
- #1310 - introduced the awex colocated path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support FP8 weight transfer in awex colocated (CUDA-IPC) path #1359

Checklist

Background

Potential Solution

Additional Information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Support FP8 weight transfer in awex colocated (CUDA-IPC) path #1359

Description

Checklist

Background

Potential Solution

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions