Skip to content
This repository was archived by the owner on Dec 31, 2025. It is now read-only.
This repository was archived by the owner on Dec 31, 2025. It is now read-only.

Communication and compute on separate Streams do not overlap #64

@garrett361

Description

@garrett361

Cross-posting this issue from ipex, in case the torch-ccl team is not aware of it.

Key issues:

  • Compute and collective communications do not overlap on intel GPU devices
  • Collectives block the host thread, rather than launching a kernel and immediately returning (as on NVIDIA devices)

The pytorch profiler traces highlight the issues (copied from the other thread):

A100 Trace

nvidia_a100_trace

Non-blocking kernel launch and comms/compute overlap.

Intel Max 1550 Trace

intel_1550_trace

Blocking kernel launch and no comms/compute overlap.

See the other thread for more details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions