Skip to content

ncclAllReduce may require additional barriers #1587

@suranap

Description

@suranap

checkNCCL(ncclAllReduce(w_grad_ptr,

A thread on Zulip mentioned some additional care required for NCCL within a Legion task. Rohan spotted a problem in FlexFlow's use of ncclAllReduce. You may need to add concurrent_task_barrier before and after the call, and call set_concurrent_barrier on the task. More info is in the comment for that barrier.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions