exposing job cpus and gpus by luccabb · Pull Request #119 · facebookresearch/clusterscope

luccabb · 2025-10-08T22:50:07Z

Summary

exposing a way to get number of cpus and gpus from job information. defaults to local host if not on a slurm cluster

Test Plan

works on slurm and locally:

$ srun python -c "import clusterscope; print(clusterscope.get_job().get_gpus()
); print(clusterscope.get_job().get_cpus())"
...
srun: job 1476042 has been allocated resources
1
2
$ python
Python 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import clusterscope
>>> clusterscope.get_job().get_gpus()
WARNING:root:No GPUs found or unable to retrieve GPU information
0
>>> clusterscope.get_job().get_cpus()
96

local node with gpus:

Type "help", "copyright", "credits" or "license" for more information.
>>> import clusterscope
>>> clusterscope.get_job().get_gpus()
2
>>> clusterscope.get_job().get_cpus()
80
>>> exit()

…cpus_gpus

skalyan

What is the motivation for this change? Did we have any user requests?

Is the idea to simply hide Slurm env variables and add a wrapper?

luccabb · 2025-10-10T01:13:07Z

@skalyan yeah, this is to enable: facebookresearch/matrix#105 (comment)

Is the idea to simply hide Slurm env variables and add a wrapper?

it gives the info for slurm or local nodes

skalyan · 2025-10-15T18:09:38Z

clusterscope/job_info.py

+            return int(os.environ.get("SLURM_CPUS_ON_NODE", 1))
+        return int(max(os.cpu_count() or 0, 1))
+
+    @lru_cache(maxsize=1)
+    def get_gpus(self) -> int:
+        if self.is_slurm_job():
+            return int(os.environ.get("SLURM_GPUS_ON_NODE", 1))
+        return sum(


What is the intent of the ask?

Is it to know how many GPUs/CPUs are "present" on the node or "allocated" to this job?

allocated to the job if slurm job, otherwise what's present in the node

If I may, I think we should stick to a clear and limited contract for an API - If we want to return GPUs allocated to a job for a given API let's stick to that. I don't see benefits in either allocated or provisioned GPU count coming via the same API.

@skalyan I think this is fine for now, it matches how other methods from this class behaves. I'm up to change position here as we see how it gets used

exposing job cpus and gpus

5709758

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 8, 2025

Merge branch 'main' of github.com:facebookresearch/clusterscope into …

d50fc4c

…cpus_gpus

luccabb marked this pull request as ready for review October 8, 2025 22:54

luccabb requested review from gunchu and skalyan as code owners October 8, 2025 22:54

skalyan reviewed Oct 10, 2025

View reviewed changes

skalyan reviewed Oct 15, 2025

View reviewed changes

luccabb requested a review from skalyan October 27, 2025 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exposing job cpus and gpus#119

exposing job cpus and gpus#119
luccabb wants to merge 2 commits intomainfrom
cpus_gpus

luccabb commented Oct 8, 2025 •

edited

Loading

Uh oh!

skalyan left a comment

Uh oh!

luccabb commented Oct 10, 2025

Uh oh!

skalyan Oct 15, 2025

Uh oh!

luccabb Oct 15, 2025

Uh oh!

skalyan Oct 21, 2025

Uh oh!

luccabb Oct 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luccabb commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

skalyan left a comment

Choose a reason for hiding this comment

Uh oh!

luccabb commented Oct 10, 2025

Uh oh!

skalyan Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

luccabb Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

skalyan Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

luccabb Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luccabb commented Oct 8, 2025 •

edited

Loading

luccabb Oct 25, 2025 •

edited

Loading