Skip to content

Feat: detect and slave processes#218

Merged
bpiwowar merged 4 commits into
experimaestro:masterfrom
VictorMorand:feat-slave-processes
Apr 22, 2026
Merged

Feat: detect and slave processes#218
bpiwowar merged 4 commits into
experimaestro:masterfrom
VictorMorand:feat-slave-processes

Conversation

@VictorMorand
Copy link
Copy Markdown
Contributor

needed when using torch.distributed - the task launches several subprocesses (e.g one per gpu) which cause problems

  • each processes tries to acquire the lock
  • each process touches the donepath when terminating (even if master process fails afterwards)

=> implemented a taskglobals env.slave variable detecting if it is a slave process - preventing it to do what only the global zero should

@bpiwowar bpiwowar merged commit a53cf44 into experimaestro:master Apr 22, 2026
4 of 5 checks passed
@VictorMorand VictorMorand deleted the feat-slave-processes branch April 22, 2026 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants