Skip to content

Hydra Release 2021.11 #1

@knagrecha

Description

@knagrecha

Current Release:

  • Standard model-parallel sharding supported
  • Pilot-run style partitioning
  • Sharded-LRTF Scheduling
  • Standard linear execution patterns for forward/backward passes
  • Arbitrarily deep models can be trained on one GPU
  • Near-linear speedups for end-to-end runtimes in single-node multi-GPU setting

Next Release TODOs (Target Completion: 2022.06):

  • Tensor Parallel Support
  • Recurrent/Autoregressive Network Support
  • Cluster setting scheduler (prioritize turnaround time vs makespan)
  • Disk spilling
  • Multi-node scaling
  • More examples
  • Additional documentation
  • More robust partitioner

Long-term TODOs

  • (Potential) FlexFlow integration?
  • Pip package
  • AMD GPU support
  • Unit tests
  • Website
  • TensorFlow support
  • Data Parallel Support
  • Model Selection APIs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions