Skip to content

UCX slower than TCP Socket #305

@luweizheng

Description

@luweizheng

I started a thread in ucx-py, and now I have replaced ucx-py with ucxx, which resolved the blocking issue. However, in terms of performance, ucx is slower than TCP sockets. Over the past few days, I have done some analysis and profiling, and I found that ucx's await endpoint.recv() consumes a lot of CPU time.

I am currently maintaining repositories (xorbits/xoscar) that are data science tools similar to Dask, which can scale workloads like pandas to a cluster. Among them, xoscar is the underlying actor framework used for inter-process communication, serialization, and more. As an underlying actor framework, xoscar is a little bit similar to Dask's distributed package. All communication and resource management is handled by xoscar. Similar to Dask, xorbits has a supervisor for management and workers responsible for computation. When a compute node starts xorbits/xoscar, xorbits sends messages such as the heartbeat of that compute node to the supervisor through xoscar's communication mechanism. These management messages are mostly small transfers, and these communications occur at short intervals (a few times within one second). Big transfers include shuffling dataframes. In summary, there are small transfers for management messages and big transfers for data shuffling.

In xoscar, different actors communicate with each other through the concept of Channels, and we have currently implemented UCXChannel.
The ucx code is on this page.

I used py-spy to profile xorbits/xoscar/ucxx and found that this line of await endpoint.recv() is consuming a lot of CPU resources. The flame graph is as follows.

Image

@pentschev mentioned in the previous thread that it's not surprising for TCP sockets to be faster than UCX. So what I want to confirm is whether there are good solutions for scenarios like mine, where small transfers and big transfers are mixed together, to reduce the CPU load of await endpoint.recv(). Or do I need to modify the current design to have small transfers go through TCP and dataframe shuffling go through UCX?

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementImproves an existing functionalityucxx

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions