Skip to content

SelectTypeParameters=CR_Core_Memory; DefMemPerCPU#209

Open
elelaysh wants to merge 1 commit into
masterfrom
feat/cr_core_memory
Open

SelectTypeParameters=CR_Core_Memory; DefMemPerCPU#209
elelaysh wants to merge 1 commit into
masterfrom
feat/cr_core_memory

Conversation

@elelaysh
Copy link
Copy Markdown
Contributor

@elelaysh elelaysh commented Feb 25, 2026

With SelectTypeParameters=CR_Core_Memory, SLURM scheduler will consider memory
as a consumable resource:
a job will be scheduled to a node only if the node has sufficient remaining memory for it.

With default openhpc_cgroup_config, the job is constrained to not use more:

  • memory (ConstrainRAMSpace=yes)
  • swap (ConstrainSwapSpace=yes)

than requested. This is a change from previous SelectTypeParameters=CR_Core:
memory was not taken into account and a job's memory use was only limited
by the total memory and swap on the node at each time.

With SelectTypeParameters=CR_Core_Memory, DefMemPerCPU must be defined to prevent a job using a fraction of a node's CPUs from taking all the node's memory. We compute a default value as the minimal memory per thread accross the cluster but it can be changed in the appliance configuration.

Also create the ohpc_nodegroups_computed variable,
to share computed values per nodegroup between multiple places of the role.
If a nodegroup is in ohpc_nodegroups_computed it has at least one host. It stored

  • first_host: name of the first host in the nodegroup; use hostvars[computed.first_host]
    to access its hostvars
  • inventory_group_name: name of the inventory group for this nodegroup; use groups[computed.inventory_group_name] to access the host list

@elelaysh elelaysh requested a review from a team as a code owner February 25, 2026 08:33
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement by making memory a consumable resource in Slurm, which will help prevent node oversubscription. The refactoring of nodegroup-specific computations into the ohpc_nodegroup_computed variable is a great step towards better code organization and reusability.

My review includes a few suggestions:

  • A high-severity suggestion to make the memory-per-CPU calculation more robust by using ansible_processor_vcpus and handling cases where this fact might be missing. This will prevent potential playbook failures.
  • A couple of medium-severity comments to address inconsistent indentation (mixing tabs and spaces) in the Jinja2 template, which will improve code readability and maintainability.

Comment thread vars/main.yml Outdated
Comment thread templates/slurm.conf.j2 Outdated
Comment thread templates/slurm.conf.j2 Outdated
@elelaysh elelaysh force-pushed the feat/cr_core_memory branch from 5a27e0d to 3babb40 Compare February 25, 2026 08:55
@elelaysh
Copy link
Copy Markdown
Contributor Author

Re ohpc_nodegroup_computed:

It is awkward to have the ohpc_nodegroup as a list (instead of a dict, because we can't have multiple nodegroups with same name, etc), but since it's an input parameter, it shouldn't be changed.

I had tried a version with all fields of ohpc_nodegroup copied to ohpc_nodegroup_computed, but it wasn't that useful in slurm.conf.j2 we need to iterate other empty nodegroups as well. I preferred ohpc_nodegroup_computed to only contain non-empty nodegroups, to have correct values for computed values ram_mb, def_mem_per_cpu(to later combine usingmin` without needing to filter).

If one would prefer to use ohpc_nodegroup_computed everywhere the code could be adapted.

Before, I tried computing the nodegroup and DefMemPerCPU for each host, but it was very awkward.

Also the naive approach would be to duplicate the whole block iterating over ohpc_nodegroup in slurm.conf.j2 to compute DefMemPerCPU. I hope having the ohpc_nodegroup_computed would be more useful.

We could also compute all slurm.conf attributes of each nodegroup in a variable, then just serialize it in slurm.conf.j2 (we can pick some attributes to put first the order of our choosing (NodeList, Features, State), then append the rest...

@elelaysh elelaysh changed the title SelectTypeParameters=CR_Core_Memory; define DefMemPerCPU SelectTypeParameters=CR_Core_Memory; DefMemPerCPU Feb 25, 2026
@elelaysh
Copy link
Copy Markdown
Contributor Author

ohpc_nodegroup_computed should be plural I guess

Copy link
Copy Markdown
Collaborator

@sjpb sjpb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR description should describe what DefMemPerCPU is set to

Comment thread vars/main.yml
Copy link
Copy Markdown
Collaborator

@sjpb sjpb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks good once merge conflict fixed

@elelaysh
Copy link
Copy Markdown
Contributor Author

elelaysh commented Mar 2, 2026

I think the PR description should describe what DefMemPerCPU is set to

Yes, good point

Also create the ohpc_nodegroups_computed variable,
to share computed values per nodegroup between multiple
places of the role.
If a nodegroup is in ohpc_nodegroups_computed it has
at least one host. It stores
 - first_host: name of the first host in the nodegroup;
   use `hostvars[computed.first_host]` to access its hostvars
 - inventory_group_name: name of the inventory group for this
   nodegroup; use `groups[computed.inventory_group_name]` to access
   the host list
 - ram_mb: memory per node in the nodegroup
 - def_mem_per_cpu: computed DefMemPerCPU for this nodegroup
   as the RAM/vCPU ratio.
@elelaysh elelaysh force-pushed the feat/cr_core_memory branch from 3babb40 to da6d88c Compare March 2, 2026 09:17
@elelaysh elelaysh requested a review from sjpb March 2, 2026 09:25
@elelaysh
Copy link
Copy Markdown
Contributor Author

elelaysh commented Mar 2, 2026

ohpc_nodegroup_computed should be plural I guess

Fixed it to be plural

@sjpb
Copy link
Copy Markdown
Collaborator

sjpb commented Apr 8, 2026

This looks ok now but I found the PR comment a bit hard to understand and it'll be a bit critical to understanding what we need to update for client sites. Here's my attempt, what do you think?


Changes how memory is assessed when scheduling and constrained for a job:

  1. The default for the slurm.conf parameter SelectTypeParameters is changed from CR_Core to CR_Core_Memory. Memory is now considered as a consumable resource and a job will be scheduled to a node only if the node has sufficent remaining memory for it. Previously memory was not taken into account when scheduling.

  2. Although the default cgroup configuration is not changed, the change to SelectTypeParameters means a job’s memory and swap use are now constrained to the requested values.

  3. The default memory request for a job is set, per allocated CPU, by the slurm.conf parameter DefMemPerCPU. This is set here to the minimum memory per thread for any node across the cluster.


To provide computed per-nodegroup values in multiple places this adds an internal variable ohpc_nodegroups_computed. If a nodegroup is in this mapping it has at least one host. It stores:

  • first_host: The inventory name of the first host in the nodegroup; use hostvars[computed.first_host]
    to access its hostvars.
  • inventory_group_name: The name of the inventory group for this nodegroup; use groups[computed.inventory_group_name] to access the host list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants