Skip to content
This repository was archived by the owner on Apr 19, 2026. It is now read-only.
This repository was archived by the owner on Apr 19, 2026. It is now read-only.

Distributed training with Kubernetes #17

@jlewi

Description

@jlewi

Opening this issue to start a discussion about whether it would be worth investing to make it easy to run tensorflow agents K8s.

For some inspiration you can look at TfJob CRD.

Some questions:

  1. Is there a need to be able to distribute the environments across multiple machines?
  2. What is the communication pattern between the simulations and TensorFlow job?
    * Is data fetched from all simulations simultaneously?
    * Does each simulation need to be individually addressable?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions