GitHub - flagos-ai/FlagCX: FlagCX is a scalable and adaptive cross-chip communication library.

About

FlagCX is part of FlagOS, a unified, open-source AI system software stack that aims to foster an open technology ecosystem by seamlessly integrating various models, systems and chips. By "develop once, migrate across various chips", FlagOS aims to unlock the full computational potential of hardware, break down the barriers between different chip software stacks, and effectively reduce migration costs.

FlagCX is a scalable and adaptive cross-chip communication library. It serves as a platform where developers, researchers, and AI engineers can collaborate on various projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community.

FlagCX leverages native collective communication libraries to provide full single-chip communication support across platforms. Beyond its native x-CCL integrations, FlagCX introduces original device-buffer IPC and device-buffer RDMA technologies, enabling high-performance P2P operations for both cross-chip and single-chip scenarios. These mechanisms can be seamlessly combined with native x-CCL backends to deliver optimized performance for cross-chip collective communications.

Backend Support

The following table summarizes the currently supported communication backends and their corresponding capabilities.

Backend	NCCL	IXCCL	CNCL	MCCL	XCCL	DUCCL	HCCL	MUSACCL	RCCL	TCCL
Mode	Homo/Hetero	Homo/Hetero	Homo/Hetero	Homo/Hetero	Homo/Hetero	Homo/Hetero	Homo/Hetero	Homo/Hetero	Homo/Hetero	Homo/Hetero
send	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
recv	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
broadcast	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
gather	✓/✓	✓/✓	✓/✓	✓/✓	☓/☓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
scatter	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
reduce	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
allreduce	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
allgather	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
reducescatter	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
alltoall	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
alltoallv	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓
group ops	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/✓	✓/☓	✓/✓	✓/✓	✓/✓

Note that Homo and Hetero modes refer to communications among homogeneous and heterogeneous clusters. All native collective communications libraries can be referenced through the links below (in alphabetic order):

CNCL, Cambricon Communications Library.
DUCCL, DU Collective Communications Library.
HCCL, Ascend Communications Library.
IXCCL, Iluvatar Corex Collective Communications Library.
MCCL, Metax Collective Communications Library.
MUSACCL, Musa Collective Communications Library.
NCCL, NVIDIA Collective Communications Library.
RCCL, ROCm Communication Collectives Library.
TCCL, TsingMicro Communication Collectives Library.
XCCL, Kunlunxin XPU Collective Communications Library.

Additionally, FlagCX supports three collective communication libraries for host-side communication:

BOOTSTRAP: Host-side communication library built using the FlagCX bootstrap component.
GLOO: Gloo Collective Communications Library.
MPI: Message Passing Interface (MPI) standard.

Application Integration

FlagCX integrates with upper-layer applications such as PyTorch and PaddlePaddle. The table below lists the frameworks supported by FlagCX and their related communication operations, where the batch_XXX and XXX_coalesced ops refer to the usage of group primitives.

Framework	PyTorch	PaddlePaddle
send	✓	✓
recv	✓	✓
all_gather	✓	✓
all_gather_into_tensor_coalesced	✓ (in order, no aggregation)	☓
all_reduce	✓	✓
all_reduce_coalesced	✓ (in order, no aggregation)	☓
all_to_all	✓	✓
all_to_all_single	✓	✓
barrier	✓	✓
batch_isend_irecv	✓	✓
broadcast	✓	✓
gather	✓	✓
reduce	✓	✓
reduce_scatter	✓	✓
reduce_scatter_tensor_coalesced	✓ (in order, no aggregation)	☓
scatter	✓	✓

Note that PyTorch support is enabled via the FlagCX Torch plugin, which provides native integration with the PyTorch distributed backend. This plugin has undergone comprehensive validation across diverse communication backends and hardware platforms, ensuring robust functionality, consistent performance, and compatibility in multi-chip heterogeneous environments.

FlagCX Backend	NCCL	IXCCL	CNCL	MCCL	XCCL	DUCCL	HCCL	MUSACCL	RCCL
PyTorch Support	✓	✓	✓	✓	✓	✓	✓	✓	✓

Tip

To enable heterogeneous cross-chip communication using the PyTorch DDP FlagCX backend, it is recommended to use identical PyTorch versions across all nodes. Mismatched versions may lead to initialization failures during process group setup. Helpful advice for doing things better or more easily.

Quick Start

Please check the guides on building, testing the software:

Training Models

After building and testing FlagCX, you can start training models using upper-layer deep learning frameworks such as PyTorch or PaddlePaddle using FlagCX as the communication backend. We provide detailed user guides for both homogeneous and heterogeneous training across different hardware platforms. Please refer to the docs below:

Contribution

We warmly welcome community contributions to help expand and strengthen the validation matrix.
Join our Discussion Channel

License

This project is licensed under the Apache License (Version 2.0).

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
.gemini		.gemini
.github		.github
docs		docs
flagcx		flagcx
plugin		plugin
test		test
third-party		third-party
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Backend Support

Application Integration

Quick Start

Training Models

Contribution

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors 32

Languages

License

flagos-ai/FlagCX

Folders and files

Latest commit

History

Repository files navigation

About

Backend Support

Application Integration

Quick Start

Training Models

Contribution

License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 32

Languages

Packages