Skip to content

vccl doesn't get benefit in megatron training #22

@dmndxld

Description

@dmndxld

Hello! Thanks for the nice work. I've tested nccl-tests with vccl, and got good results.

Image

So I used vccl in my v0.13 megatron following the doc.

But I didnt see any improvements in my training. Could you tell me what I did wrong and how to fix it?

I use slurm to train my model.
16nodes with TP=4 and PP = 4.
Image

Image

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions