Skip to content

[Performance] enable openYuanrong RDMA support #98

@tianyi-ge

Description

@tianyi-ge

Background

  1. For 910B it's possible to add another RoCE nic, in addition to the npu-side RoCE. It means that openYuanrong RDMA transport can also work if the hardware is capable.
  2. openYuanrong kv client already supports host RDMA transfer based on UCX. Users only need to enable the startup config and pass cpu tensors to mset.

Changes

  1. In config.yaml, consider a new option enable_rdma, false by default. If enabled, add --enable_rdma true to dscli startup cmd and UCX_TLS=rc_x to force rdma transfer.
  2. For debug purpose, users can set UCX_LOG_FILE at higher level (e.g. when ray startup).
  3. The rdma best practice doc (https://pages.openeuler.openatom.cn/openyuanrong-datasystem/docs/zh-cn/latest/best_practices/best_practices_for_rdma.html) and config explanation is required in config yaml comments .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions