Skip to content

Support less-noisy reconfigure #90

@sjpb

Description

@sjpb

With an image-based deploy the current workflow for adding a node looks like:

  1. Boot a new compute node. It will attempt to join the cluster, slurmctld will say it doesn't have a nodename entry, and slurmd will die.
  2. Run the role on the ENTIRE cluster, so that:
    • new slurm.conf generated including the new node
    • slurmctld and ALL slurmd restarted (inc. the new, failed one) in the correct order

Item 2 is really noisy as all the compute nodes run all the ansible. It would be good if really we could just run the appropriate steps for these cases.

I think the cases covered are:

  • Adding nodes with an appropriate image
  • Deleting nodes

We probably could do something just using the configure tag, but this needs testing/documenting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions