Summary
The current implementation in AWS Research Engineering Studio (RES) appears to use psutil.cpu_percent() as a proxy for system idleness. This approach does not scale appropriately with multi-core systems and can incorrectly classify actively used instances as idle, particularly in data science workflows.
We'd like to use this feature properly, but the implementation doesn't fit well with our user workloads.
Problem Description
psutil.cpu_percent() reports CPU utilisation as a percentage of total system capacity, averaged across all cores.
On high core-count instances (e.g. 8, 16, 32 vCPUs), this leads to unintuitive and misleading behaviour:
- A single fully utilised core on a 16 vCPU instance results in ~6–10% reported usage.
- Moderate but legitimate workloads (e.g. 1–2 active cores) appear as low overall utilisation.
Example
On a 16 vCPU instance:
| Workload |
Actual Activity |
cpu_percent() |
| 1 core fully used |
Active computation |
~6–10% |
| 2 cores fully used |
Parallel work |
~12–20% |
| Fully idle |
None |
~0–2% |
Real-World Impact (Data Science / R Workloads)
In many data science use cases (especially R-based analytics):
- High CPU instances are used for RAM requirements rather than CPU-
- Processing occurs in bursty or phased patterns-
This results in:
- Sustained meaningful work
- Low aggregate CPU percentage
In contrast, system load metrics behave differently:
import os
os.getloadavg()
Typical observed values:
cpu_percent() → ~8–12%
loadavg (1 min) → ~1.0–1.5
The load average correctly reflects that the system is not idle, whereas CPU percentage suggests it is.
Current Behaviour in RES
- CPU utilisation from
psutil.cpu_percent() is directly assessed against idle threshold (fixed at instance creation)
- Default idle threshold: 30% - problematic
- Idle timeout: default 1 year - we'd like 4h but can't safely set due to method
Consequence
Current global implementation is not compatible with instances of different sizes
This is particularly problematic for:
- Long-running analyses
- Pipeline stages with intermittent CPU usage
- Interactive analytical sessions
Why This Approach Is Problematic
psutil.cpu_percent() measures:
Total CPU utilisation as a fraction of total available compute capacity
Whereas "idleness" in a multi-core environment should consider:
- Whether any meaningful work is occurring
- Queueing and scheduling pressure
- Per-process or per-core activity
Thus, CPU percentage alone is not a reliable indicator of idleness on modern multi-core systems.
Suggested Improvements
✅ Option 1: Use Load Average
Leverage os.getloadavg():
- Load ≈ number of runnable processes
- Interpretable relative to CPU count
Example heuristic:
import os
load1, _, _ = os.getloadavg()
if load1 < 0.2:
system_idle = True
✅ Option 2: Convert CPU % to Core-Equivalent Usage
Instead of raw percentage:
core_usage = (cpu_percent / 100.0) * cpu_count
This makes thresholds meaningful across instance sizes.
✅ Option 3: Multi-Signal Approach
Combine indicators:
- CPU usage (scaled)
- Load average
- Recent activity window (buffering spikes)
Recommendation
At minimum:
Replace or augment psutil.cpu_percent() with a load-based or core-normalised metric before applying idle thresholds.
Expected Outcome
Fixing this would:
- Make the idle shutdown a reliable functionality
- Improve usability for data science workflows
- Better align RES behaviour with real-world compute patterns
- Reduce user frustration and job failures
Additional Context
This issue is especially visible in:
- R workloads
- Single-threaded Python tasks
- Memory-bound computations
- Interactive sessions (e.g. notebooks)
Screenshots/Video
Example of cpu vs load average (can't copy text out of environment).
During exec (generating load):

Result:

Environment (please complete the following information):
- RES Version: 2026.03
- Software Stack AMI ID:
- Private prepped for closed environment: ami-0c26b557ed705377b
- Oringinal: ami-067ea4effee56973f
- Software Stack OS: Ubuntu 24.04
Additional context
Enterprise support account
Summary
The current implementation in AWS Research Engineering Studio (RES) appears to use
psutil.cpu_percent()as a proxy for system idleness. This approach does not scale appropriately with multi-core systems and can incorrectly classify actively used instances as idle, particularly in data science workflows.We'd like to use this feature properly, but the implementation doesn't fit well with our user workloads.
Problem Description
psutil.cpu_percent()reports CPU utilisation as a percentage of total system capacity, averaged across all cores.On high core-count instances (e.g. 8, 16, 32 vCPUs), this leads to unintuitive and misleading behaviour:
Example
On a 16 vCPU instance:
cpu_percent()Real-World Impact (Data Science / R Workloads)
In many data science use cases (especially R-based analytics):
This results in:
In contrast, system load metrics behave differently:
Typical observed values:
cpu_percent()→ ~8–12%loadavg (1 min)→ ~1.0–1.5The load average correctly reflects that the system is not idle, whereas CPU percentage suggests it is.
Current Behaviour in RES
psutil.cpu_percent()is directly assessed against idle threshold (fixed at instance creation)Consequence
Current global implementation is not compatible with instances of different sizes
This is particularly problematic for:
Why This Approach Is Problematic
psutil.cpu_percent()measures:Whereas "idleness" in a multi-core environment should consider:
Thus, CPU percentage alone is not a reliable indicator of idleness on modern multi-core systems.
Suggested Improvements
✅ Option 1: Use Load Average
Leverage
os.getloadavg():Example heuristic:
✅ Option 2: Convert CPU % to Core-Equivalent Usage
Instead of raw percentage:
This makes thresholds meaningful across instance sizes.
✅ Option 3: Multi-Signal Approach
Combine indicators:
Recommendation
At minimum:
Expected Outcome
Fixing this would:
Additional Context
This issue is especially visible in:
Screenshots/Video
Example of cpu vs load average (can't copy text out of environment).
During exec (generating load):

Result:

Environment (please complete the following information):
Additional context
Enterprise support account