Skip to content
This repository was archived by the owner on Sep 18, 2025. It is now read-only.
This repository was archived by the owner on Sep 18, 2025. It is now read-only.

UCXUnreachable when running a benchmark with UCX_TLS=sm #1006

@rkooo567

Description

@rkooo567

I tried running a benchmark using

UCX_TLS=sm python send_recv.py --server-dev 1 --client-dev 1 --object_type numpy --reuse-alloc --n-bytes 1024

And it seems like the client cannot reach the server for some reasons;

Server Running at 172.31.8.189:60173
Client connecting to server at 172.31.8.189:60173
Process SpawnProcess-2:
Traceback (most recent call last):
  File "/home/ubuntu/.conda/envs/ucx/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ubuntu/.conda/envs/ucx/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/work/ucx/ucx-py/ucp/benchmarks/send_recv.py", line 95, in client
    loop.run_until_complete(client.run())
  File "/home/ubuntu/.conda/envs/ucx/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/home/ubuntu/.conda/envs/ucx/lib/python3.9/site-packages/ucp/benchmarks/backends/ucp_async.py", line 117, in run
    ep = await ucp.create_endpoint(self.server_address, self.port)
  File "/home/ubuntu/.conda/envs/ucx/lib/python3.9/site-packages/ucp/core.py", line 1004, in create_endpoint
    return await _get_ctx().create_endpoint(
  File "/home/ubuntu/.conda/envs/ucx/lib/python3.9/site-packages/ucp/core.py", line 316, in create_endpoint
    peer_info = await exchange_peer_info(
  File "/home/ubuntu/.conda/envs/ucx/lib/python3.9/site-packages/ucp/core.py", line 54, in exchange_peer_info
    await comm.stream_recv(endpoint, peer_info_arr, peer_info_arr.nbytes)
ucp._libs.exceptions.UCXUnreachable: <stream_recv>: 
Traceback (most recent call last):
  File "/home/ubuntu/work/ucx/ucx-py/ucp/benchmarks/send_recv.py", line 395, in <module>
    main()
  File "/home/ubuntu/work/ucx/ucx-py/ucp/benchmarks/send_recv.py", line 387, in main
    assert not p2.exitcode
AssertionError

When I enable debug logs, I also see

[1698940358.556109] [devbox:356803:0]          ucp_ep.c:3315 UCX  DEBUG ep 0x7f6202366000: calling user error callback 0x7f6203889e40 with arg 0x7f6203557820 and status Destination is unreachable
[1698940358.556225] [devbox:356803] UCXPY  DEBUG Error callback for endpoint 0x7f6202366000 called with status -6: Destination is unreachable

Have you guys seen any similar issue before

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions