Limit physical CPU and memory consumed across all running sandbox containers

`sand` does not currently try to prevent you from running too many sandboxes.  It will happily grind your system to a halt if you create enough of them and they're all busy consuming CPU and memory pages.

Physical memory consumption is probably going to be a more immediate limiting factor than CPU usage, but we should track and limit both for currently running containers.

## Memory usage

- `--memory` is the guest's address space limit, not a carve-out from the host's physical RAM. 
- e.g. a container configured with `--memory 4g` that's only running a small Go binary might only have 400MB of real host RAM backing it. 
- summing across all `--memory` values used by running containers would be a large overestimate of the actual usage
- apple/container doesn't support memory ballooning
  - or more precisely, the Virtualization framework (on which apple/container depends) implements only partial support for memory ballooning. 
  - see this ["Releasing container memory to macOS" note](https://github.com/apple/container/blob/615b5a6ffe473afe9c02769903e1d7d5884f96e5/docs/technical-overview.md#releasing-container-memory-to-macos): "Currently, memory pages freed to the Linux operating system by processes running in the container's VM are not relinquished to the host. If you run many memory-intensive containers, you may need to occasionally restart them to reduce memory utilization." 
- memory overcommit might not be as catastrophic as CPU overcommit, given the compression and swap memory management features in macOS

### Proposal: allow memory overcommit up to some ratio (> 1) of available physical memory

E.g.:

```go
physicalRam, _ := unix.SysctlUint64("hw.memsize")

total := ... // sum of --memory across all running sandboxes

softLimit := physicalRam * 1.0   // warn when total >= softLimit
hardLimit := physicalRam * 1.5   // reject when total >= hardLimit (or maybe 1.25 to be safe)
```

For now, punt on trying to actively monitoring guest container memory usage (e.g. polling `/proc/meminfo`) 

## CPU usage

The number of "cpu"s allocated to a container is kind of tricky to think about:   
- `--cpus N` gives the VM N vCPUs, not dedicated physical cores. It's a *hint*, not a reservation.
- Each vCPU becomes a high QoS thread managed by the host macOS scheduler, just like any other process thread
- `--cpus N` sets the width of parallelism available to the guest, meaning It can run at most N threads simultaneously but it doesn't cap the rate at which those vCPUs execute.
- If your container pegs all N vCPUs at 100%, the host will happily give it 100% of N physical cores' worth of time

We don't have very good controls over how much CPU a sandbox consumes
- can't use cgroups to throttle CPU usage at the sandbox container level 
  - macOS doesn't support cgroups
  - cgroups work fine inside the container VM but they only control resources of processes relative to other processes in that same VM

### Assumptions
- We're trying to prevent worst-case overload, not average-case
- Coding agents tend to be bursty — they'll actually peg all their vCPUs during builds, test runs, etc.
- We'd rather reject a new sandbox preemptively than have everything grind because 8 agents all started make -j8 simultaneously
- letting vCPU count exceed physical core count is technically possible, but doing so and then pegging them all at the same time is *bad* - containers AND host will grind to a halt 

### Proposal: Cap the sum of vCPUs currently allocated across all currently running sandbox containers

`sand new` then warns or rejects attempts to start new sandbox containers when doing so would exceed this cap.

We can make this cap a configurable setting, and choose a default based on `unix.SysctlUint32("hw.perflevel0.physicalcpu")` (number of Performance cores available to the host) and shave 2 cores off of that just to be safe. Side note: `hw.perflevel1.physicalcpu` is the number of Efficiency cores, but those are significantly slower than the Performance cores so we should ignore them for this calculation.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit physical CPU and memory consumed across all running sandbox containers #94

Memory usage

Proposal: allow memory overcommit up to some ratio (> 1) of available physical memory

CPU usage

Assumptions

Proposal: Cap the sum of vCPUs currently allocated across all currently running sandbox containers

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Limit physical CPU and memory consumed across all running sandbox containers #94

Description

Memory usage

Proposal: allow memory overcommit up to some ratio (> 1) of available physical memory

CPU usage

Assumptions

Proposal: Cap the sum of vCPUs currently allocated across all currently running sandbox containers

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions