Skip to content

flagos-ai/FlagCX

Repository files navigation

github+banner__2025-11-11+13_27_10

About

FlagCX is part of FlagOS, a unified, open-source AI system software stack that aims to foster an open technology ecosystem by seamlessly integrating various models, systems and chips. By "develop once, migrate across various chips", FlagOS aims to unlock the full computational potential of hardware, break down the barriers between different chip software stacks, and effectively reduce migration costs.

FlagCX is a scalable and adaptive cross-chip communication library. It serves as a platform where developers, researchers, and AI engineers can collaborate on various projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community.

FlagCX leverages native collective communication libraries to provide full single-chip communication support across platforms. Beyond its native x-CCL integrations, FlagCX introduces original device-buffer IPC and device-buffer RDMA technologies, enabling high-performance P2P operations for both cross-chip and single-chip scenarios. These mechanisms can be seamlessly combined with native x-CCL backends to deliver optimized performance for cross-chip collective communications.

Backend Support

The following table summarizes the currently supported communication backends and their corresponding capabilities.

Backend NCCL IXCCL CNCL MCCL XCCL DUCCL HCCL MUSACCL RCCL TCCL
Mode Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero
send ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
recv ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
broadcast ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
gather ✓/✓ ✓/✓ ✓/✓ ✓/✓ ☓/☓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
scatter ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
reduce ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
allreduce ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
allgather ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
reducescatter ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
alltoall ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
alltoallv ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓
group ops ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓

Note that Homo and Hetero modes refer to communications among homogeneous and heterogeneous clusters. All native collective communications libraries can be referenced through the links below (in alphabetic order):

  • CNCL, Cambricon Communications Library.
  • DUCCL, DU Collective Communications Library.
  • HCCL, Ascend Communications Library.
  • IXCCL, Iluvatar Corex Collective Communications Library.
  • MCCL, Metax Collective Communications Library.
  • MUSACCL, Musa Collective Communications Library.
  • NCCL, NVIDIA Collective Communications Library.
  • RCCL, ROCm Communication Collectives Library.
  • TCCL, TsingMicro Communication Collectives Library.
  • XCCL, Kunlunxin XPU Collective Communications Library.

Additionally, FlagCX supports three collective communication libraries for host-side communication:

  • BOOTSTRAP: Host-side communication library built using the FlagCX bootstrap component.
  • GLOO: Gloo Collective Communications Library.
  • MPI: Message Passing Interface (MPI) standard.

Application Integration

FlagCX integrates with upper-layer applications such as PyTorch and PaddlePaddle. The table below lists the frameworks supported by FlagCX and their related communication operations, where the batch_XXX and XXX_coalesced ops refer to the usage of group primitives.

Framework PyTorch PaddlePaddle
send
recv
all_gather
all_gather_into_tensor_coalesced ✓ (in order, no aggregation)
all_reduce
all_reduce_coalesced ✓ (in order, no aggregation)
all_to_all
all_to_all_single
barrier
batch_isend_irecv
broadcast
gather
reduce
reduce_scatter
reduce_scatter_tensor_coalesced ✓ (in order, no aggregation)
scatter

Note that PyTorch support is enabled via the FlagCX Torch plugin, which provides native integration with the PyTorch distributed backend. This plugin has undergone comprehensive validation across diverse communication backends and hardware platforms, ensuring robust functionality, consistent performance, and compatibility in multi-chip heterogeneous environments.

FlagCX Backend NCCL IXCCL CNCL MCCL XCCL DUCCL HCCL MUSACCL RCCL
PyTorch Support

Tip

To enable heterogeneous cross-chip communication using the PyTorch DDP FlagCX backend, it is recommended to use identical PyTorch versions across all nodes. Mismatched versions may lead to initialization failures during process group setup. Helpful advice for doing things better or more easily.

Quick Start

Please check the guides on building, testing the software:

Training Models

After building and testing FlagCX, you can start training models using upper-layer deep learning frameworks such as PyTorch or PaddlePaddle using FlagCX as the communication backend. We provide detailed user guides for both homogeneous and heterogeneous training across different hardware platforms. Please refer to the docs below:

Contribution

  • We warmly welcome community contributions to help expand and strengthen the validation matrix.

  • Join our Discussion Channel

    开源小助手

License

This project is licensed under the Apache License (Version 2.0).