Skip to content

Pack parallel gradient allreduces#41

Open
XingyuZhang2018 wants to merge 1 commit into
iPEPS-unifiedfrom
codex/packed-parallel-allreduce
Open

Pack parallel gradient allreduces#41
XingyuZhang2018 wants to merge 1 commit into
iPEPS-unifiedfrom
codex/packed-parallel-allreduce

Conversation

@XingyuZhang2018
Copy link
Copy Markdown
Owner

Summary

  • Add a packed allreduce helper that groups gradient arrays by eltype/backend and reduces each group through one allreduce_p2p! call.
  • Use the packed helper in parallel's reverse pass for non-split argument gradients, preserving dense-gradient semantics while reducing Allreduce launch count.
  • Add optional MPI coverage for packed allreduce correctness.

Test Plan

  • packed allreduce 2-rank PASS via MPI.mpiexec() -n 2
  • optional MPI single-process test file: 55/55 pass
  • FLmap_parallel serial-vs-parallel gradient check: relative errors (0.0, 0.0, 0.0, 0.0)
  • manual one-step C4v VUMPS trace: PASS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant