fix(chart): raise backend.resources.limits.ephemeral-storage default to 16Gi#116
Open
dragonpaw wants to merge 2 commits into
Open
fix(chart): raise backend.resources.limits.ephemeral-storage default to 16Gi#116dragonpaw wants to merge 2 commits into
dragonpaw wants to merge 2 commits into
Conversation
Contributor
|
Thanks for the PR. The one required check failing here is Verify helm-docs output: the chart README needs regenerating after the values change. Could you run this from the repo root and commit the result? (helm-docs v1.14.2, matching Heads up: the red SonarCloud Scan is a known non-blocking issue on fork PRs. GitHub withholds |
added 2 commits
June 3, 2026 16:37
…to 16Gi
Kubernetes `ephemeral-storage` accounting includes the container writable
layer, container logs, AND all emptyDir volumes (including
`/data/storage`, which `values.yaml` sets to `sizeLimit: 10Gi`). Streaming
format handlers (Incus, large OCI layer pushes) stage uploads on local
disk before finalizing the storage backend write. Any artifact larger
than the pod's `ephemeral-storage` limit triggers kubelet eviction:
Pod ephemeral local storage usage exceeds the total limit of
containers 1Gi
The client sees a TLS EOF mid-stream; AK never logs anything because the
pod was killed before the write completed. The deployment recreates the
pod and the cycle repeats on the next retry.
The previous default of `1Gi` was inconsistent with `persistence.size`
(also defaulted to `10Gi` in this same block) and with the chart's own
`scanWorkspace.size: 2Gi`. Bump to `16Gi` — matches the storage volume
sizing plus headroom for the writable layer and logs.
Reproduction: stock-chart install, GCS-backed storage, PUT a >1 GiB file
to any Incus repo. Pre-fix: TLS EOF, kubelet `Evicted`. Post-fix: HTTP
201, artifact persisted.
Validated on a self-hosted dev-shared-gke deploy: 3.4 GiB `.tar.zst`
upload completes in 205s @ 17.8 MiB/s.
23d9ffd to
024ba5d
Compare
Contributor
Author
|
Regenerated the chart README via |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bumps the backend pod's
ephemeral-storagelimit from1Gito16Giso streaming format handlers (Incus, large OCI layer pushes) don't get kubelet-evicted on >1 GiB uploads.The bug
Kubernetes
ephemeral-storageaccounting includes the container writable layer, container logs, and all emptyDir volumes. The chart's/data/storagemount is an emptyDir withsizeLimit: 10Gi(whenpersistence.enabled: false), and any single artifact streamed to disk by the backend goes into that mount as a temp file before being finalized. Streaming-format handlers like the Incus monolithic + chunked upload paths build the artifact entirely on disk viatokio::fscalls (they don't route throughStorageBackend::put_streaming), so the pod's writable usage tracks the full artifact size in flight.With the previous
ephemeral-storage: 1Gilimit, any artifact >1 GiB triggers kubelet eviction:The client sees a TLS EOF mid-stream; the backend logs nothing because the pod was killed before the write completed. The deployment recreates the pod and the next retry hits the same wall.
The previous default also disagreed with the chart's own
persistence.size: 10Giin the same block — anyone enabling persistence and assuming the limits matched would still hit the eviction.Reproduction
Stock-chart install with
STORAGE_BACKENDset to anything (filesystem,gcs,s3). Create anincusrepo and PUT a >1 GiB file:Post-fix: HTTP 201, artifact lands in storage backend, no eviction.
Validation
Self-hosted dev-shared-gke deploy. 3.4 GiB
.tar.zstIncus upload:kubectl describe podshows no eviction events.Related
artifact-keeper/artifact-keeper#1296— accept.tar.zstIncus filename (also needed to upload our Incus images, but separate fix; this chart change is independently useful for OCI / any large-artifact format).artifact-keeper/artifact-keeper#1297— incus handler uses a writable staging dir on GCS/S3 backends. Same scenario surfaces both bugs and this PR; could be in any order.Note for reviewers
Longer-term, the chart could make this configurable per-format (Incus deployments need more headroom than a deploy serving only npm-sized artifacts), but defaulting closer to
persistence.sizeis at minimum required for the chart's own opinionated storage layout to work end-to-end.Closes #153