Description
Ray Direct Transport (RDT) supports pluggable tensor transports. Upstream Ray provides:
- Collective transports (Gloo, NCCL) — require
create_collective_group
- Point-to-point transports (NIXL) — one-sided RDMA P2P, typically without a collective group
On Ascend, #9 / PR #21 add HCCL as the collective RDT transport (NCCL/Gloo analogue). This issue requests HiXL as a separate point-to-point RDT tensor transport (NIXL analogue).
|
HCCL (#9) |
HiXL (this issue) |
| Ray analogue |
NCCL / Gloo |
NIXL |
| Model |
Collective, create_collective_group required |
P2P, one-sided RDMA |
| API |
@ray.method(tensor_transport="HCCL") + HCCL group |
@ray.method(tensor_transport="hixl"); no group for typical P2P transfers |
HiXL references
- Repository: CANN/hixl — Ascend one-sided communication library (Huawei Xfer Library)
Proposed scope: Implement / register an hixl-backed HiXLTensorTransport in ray-ascend (e.g. via register_tensor_transport("hixl", ["npu"], HiXLTensorTransport)), covering memory registration, one-sided P2P transfer, lifecycle/cleanup, docs, and a minimal two-actor example without create_collective_group. Document when to use hixl vs HCCL.
Related: #9, PR #21
Use Case
-
Dynamic actor-to-actor tensor handoff — Prefill → Decode, pipeline stage handoffs, or ad-hoc weight shards between actors that are not in a fixed HCCL collective group.
-
KV cache / activation transfer — Low-latency, low-copy P2P moves aligned with Ascend inference stacks that already use HiXL (e.g. KV pool / PD-disaggregation paths; see also vLLM-Ascend KV pool guide).
-
Parity with Ray RDT on GPU — Same choice as upstream: HCCL for collective/group workloads (#9); HiXL for NIXL-style P2P when object-store serialization is too expensive.
-
ray.get across actors — One-sided P2P fits fetch patterns where the caller is not part of an HCCL collective group (collective RDT transports are a poor fit there).
Description
Ray Direct Transport (RDT) supports pluggable tensor transports. Upstream Ray provides:
create_collective_groupOn Ascend, #9 / PR #21 add HCCL as the collective RDT transport (NCCL/Gloo analogue). This issue requests HiXL as a separate point-to-point RDT tensor transport (NIXL analogue).
create_collective_grouprequired@ray.method(tensor_transport="HCCL")+ HCCL group@ray.method(tensor_transport="hixl"); no group for typical P2P transfersHiXL references
Proposed scope: Implement / register an hixl-backed
HiXLTensorTransportinray-ascend(e.g. viaregister_tensor_transport("hixl", ["npu"], HiXLTensorTransport)), covering memory registration, one-sided P2P transfer, lifecycle/cleanup, docs, and a minimal two-actor example withoutcreate_collective_group. Document when to usehixlvsHCCL.Related: #9, PR #21
Use Case
Dynamic actor-to-actor tensor handoff — Prefill → Decode, pipeline stage handoffs, or ad-hoc weight shards between actors that are not in a fixed HCCL collective group.
KV cache / activation transfer — Low-latency, low-copy P2P moves aligned with Ascend inference stacks that already use HiXL (e.g. KV pool / PD-disaggregation paths; see also vLLM-Ascend KV pool guide).
Parity with Ray RDT on GPU — Same choice as upstream: HCCL for collective/group workloads (#9); HiXL for NIXL-style P2P when object-store serialization is too expensive.
ray.getacross actors — One-sided P2P fits fetch patterns where the caller is not part of an HCCL collective group (collective RDT transports are a poor fit there).