sand does not currently try to prevent you from running too many sandboxes. It will happily grind your system to a halt if you create enough of them and they're all busy consuming CPU and memory pages.
Physical memory consumption is probably going to be a more immediate limiting factor than CPU usage, but we should track and limit both for currently running containers.
Memory usage
--memory is the guest's address space limit, not a carve-out from the host's physical RAM.
- e.g. a container configured with
--memory 4g that's only running a small Go binary might only have 400MB of real host RAM backing it.
- summing across all
--memory values used by running containers would be a large overestimate of the actual usage
- apple/container doesn't support memory ballooning
- or more precisely, the Virtualization framework (on which apple/container depends) implements only partial support for memory ballooning.
- see this "Releasing container memory to macOS" note: "Currently, memory pages freed to the Linux operating system by processes running in the container's VM are not relinquished to the host. If you run many memory-intensive containers, you may need to occasionally restart them to reduce memory utilization."
- memory overcommit might not be as catastrophic as CPU overcommit, given the compression and swap memory management features in macOS
Proposal: allow memory overcommit up to some ratio (> 1) of available physical memory
E.g.:
physicalRam, _ := unix.SysctlUint64("hw.memsize")
total := ... // sum of --memory across all running sandboxes
softLimit := physicalRam * 1.0 // warn when total >= softLimit
hardLimit := physicalRam * 1.5 // reject when total >= hardLimit (or maybe 1.25 to be safe)
For now, punt on trying to actively monitoring guest container memory usage (e.g. polling /proc/meminfo)
CPU usage
The number of "cpu"s allocated to a container is kind of tricky to think about:
--cpus N gives the VM N vCPUs, not dedicated physical cores. It's a hint, not a reservation.
- Each vCPU becomes a high QoS thread managed by the host macOS scheduler, just like any other process thread
--cpus N sets the width of parallelism available to the guest, meaning It can run at most N threads simultaneously but it doesn't cap the rate at which those vCPUs execute.
- If your container pegs all N vCPUs at 100%, the host will happily give it 100% of N physical cores' worth of time
We don't have very good controls over how much CPU a sandbox consumes
- can't use cgroups to throttle CPU usage at the sandbox container level
- macOS doesn't support cgroups
- cgroups work fine inside the container VM but they only control resources of processes relative to other processes in that same VM
Assumptions
- We're trying to prevent worst-case overload, not average-case
- Coding agents tend to be bursty — they'll actually peg all their vCPUs during builds, test runs, etc.
- We'd rather reject a new sandbox preemptively than have everything grind because 8 agents all started make -j8 simultaneously
- letting vCPU count exceed physical core count is technically possible, but doing so and then pegging them all at the same time is bad - containers AND host will grind to a halt
Proposal: Cap the sum of vCPUs currently allocated across all currently running sandbox containers
sand new then warns or rejects attempts to start new sandbox containers when doing so would exceed this cap.
We can make this cap a configurable setting, and choose a default based on unix.SysctlUint32("hw.perflevel0.physicalcpu") (number of Performance cores available to the host) and shave 2 cores off of that just to be safe. Side note: hw.perflevel1.physicalcpu is the number of Efficiency cores, but those are significantly slower than the Performance cores so we should ignore them for this calculation.
sanddoes not currently try to prevent you from running too many sandboxes. It will happily grind your system to a halt if you create enough of them and they're all busy consuming CPU and memory pages.Physical memory consumption is probably going to be a more immediate limiting factor than CPU usage, but we should track and limit both for currently running containers.
Memory usage
--memoryis the guest's address space limit, not a carve-out from the host's physical RAM.--memory 4gthat's only running a small Go binary might only have 400MB of real host RAM backing it.--memoryvalues used by running containers would be a large overestimate of the actual usageProposal: allow memory overcommit up to some ratio (> 1) of available physical memory
E.g.:
For now, punt on trying to actively monitoring guest container memory usage (e.g. polling
/proc/meminfo)CPU usage
The number of "cpu"s allocated to a container is kind of tricky to think about:
--cpus Ngives the VM N vCPUs, not dedicated physical cores. It's a hint, not a reservation.--cpus Nsets the width of parallelism available to the guest, meaning It can run at most N threads simultaneously but it doesn't cap the rate at which those vCPUs execute.We don't have very good controls over how much CPU a sandbox consumes
Assumptions
Proposal: Cap the sum of vCPUs currently allocated across all currently running sandbox containers
sand newthen warns or rejects attempts to start new sandbox containers when doing so would exceed this cap.We can make this cap a configurable setting, and choose a default based on
unix.SysctlUint32("hw.perflevel0.physicalcpu")(number of Performance cores available to the host) and shave 2 cores off of that just to be safe. Side note:hw.perflevel1.physicalcpuis the number of Efficiency cores, but those are significantly slower than the Performance cores so we should ignore them for this calculation.