Skip to content

feat(inference): add spec.modelCache.claimName for user-owned cache PVCs#960

Merged
Defilan merged 1 commit into
defilantech:mainfrom
joryirving:feat/inference-model-cache-claimname
Jul 3, 2026
Merged

feat(inference): add spec.modelCache.claimName for user-owned cache PVCs#960
Defilan merged 1 commit into
defilantech:mainfrom
joryirving:feat/inference-model-cache-claimname

Conversation

@joryirving

Copy link
Copy Markdown
Collaborator

What

Adds optional InferenceService.spec.modelCache.claimName: when set, the named pre-existing, user-owned PVC is mounted as the writable model cache for that workload instead of the operator's shared/perService cache PVC.

Why

The cache backend is currently operator-global, so there is no way to give one workload (e.g. a large model pinned to a node) node-local cache storage while everything else rides the shared cache.

Fixes #928

How

  • modelCachePVCName returns the user claim when set, so all cache volume references (single-file, multi-file, invalid-fileset branches) and the same prep + downloader init containers use it; weights still land under <cacheKey>/ and the serving container mounts read-only.
  • ensureModelCachePVC never creates/adopts/GCs the user claim — it only verifies existence. A missing claim yields a Degraded condition plus a ModelCachePVCNotFound warning event (no silent fallback to the shared cache).
  • claimName + pre-staged pvc:// source: claimName is ignored (no download path) and a ModelCacheClaimIgnored warning event is emitted.
  • Unset claimName: behavior unchanged (global shared/perService mode).
  • Tests: builder units cover claim override in both modes, layout/init-container parity, and unchanged default; envtest covers no operator PVC created, no adoption/mutation of the user PVC, and clear error when missing. Verified with go test ./internal/controller/... ./api/... -count=1 and golangci-lint run (0 issues).

AI assistance: authored by Claude (Opus 4.8) operating under the maintainer's direction; reviewed before submitting.

Checklist

  • Tests added/updated
  • make test passes locally
  • make lint passes locally
  • Commit messages follow conventional commits
  • All commits are signed off (git commit -s) per DCO
  • AI assistance (if any) is disclosed above, per CONTRIBUTING.md
  • Documentation updated (if user-facing change)

Add an optional per-InferenceService modelCache.claimName field that
mounts a pre-existing, user-owned PVC as the writable model cache
volume instead of the operator's shared/perService cache PVC. The
existing model-cache-prep + model-downloader init containers run
against the claim unchanged, weights land under the usual <cacheKey>/
subdirectory, and the serving container mounts it read-only.

The operator never creates, adopts, or garbage-collects the user's
claim: ensureModelCachePVC only verifies it exists, and a missing
claim surfaces a Degraded condition plus a ModelCachePVCNotFound
warning event instead of silently falling back to the shared cache.
For pre-staged pvc:// model sources (read-only, no download) the
claimName is ignored and a ModelCacheClaimIgnored warning is emitted.

When claimName is unset, behavior is unchanged: the operator-global
shared/perService cache mode applies as before.

AI assistance: authored by Claude (Opus 4.8) operating under the
maintainer's direction; reviewed before submitting.

Fixes defilantech#928

Signed-off-by: Jory Irving <jory.irving@stackadapt.com>
@joryirving joryirving requested a review from Defilan as a code owner July 3, 2026 02:58
@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 39.47368% with 23 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
api/v1alpha1/zz_generated.deepcopy.go 0.00% 11 Missing and 1 partial ⚠️
internal/controller/inferenceservice_controller.go 0.00% 5 Missing and 1 partial ⚠️
internal/controller/model_storage.go 75.00% 4 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@Defilan

Defilan commented Jul 3, 2026

Copy link
Copy Markdown
Member

Review

One fix, then good to go. The riskiest concern is handled correctly: a user-owned claimName PVC gets only a Get and return nil in ensureModelCachePVC — never Create, never a controller owner-ref, never operator labels (internal/controller/model_storage.go:669-683), so it can't be garbage-collected with the InferenceService. Verified and tested. Precedence (claimName wins over mode before the shared/perService branch, model_storage.go:91-100), RBAC (existing persistentvolumeclaims get is sufficient), and CRD/deepcopy/chart sync all check out.

Silent drop when caching is disabled or the model has no cache key (should fix): useCache := effectiveModelCacheKey(model) != "" && r.ModelCachePath != "" (internal/controller/deployment_builder.go:249) gates the whole cache path. When it's false — chart modelCache.enabled: false, a file:///local source, or a remote single-file model before Status.CacheKey is populated — buildModelStorageConfig routes to buildEmptyDirStorageConfig (model_storage.go:369) which never consults claimName. So a user who sets spec.modelCache.claimName gets an emptyDir and re-downloads every restart, with no event, no condition, no signal. The PR deliberately added a ModelCacheClaimIgnored warning for the pvc://-source case (inferenceservice_controller.go:178) to avoid exactly this silent drop, but missed the symmetric one. Recommend emitting the same warning (or failing closed) when claimName != "" but caching is off / no cache key, so the behavior matches the PR's own "no silent fallback" promise — plus a test (neither new event is currently asserted).

Nice-to-haves: the missing-claim degraded condition uses the generic Reason: PhaseFailed rather than ModelCachePVCNotFound (the specific reason is only in the message/event); claimName is mutable on a running service with no immutability marker (no data loss since the old claim isn't GC'd, but worth a docs note). docs/MODEL-CACHE.md is otherwise accurate.

@Defilan Defilan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Approve. The riskiest part is handled correctly and tested: a user-owned claimName PVC gets only a Get and return nil in ensureModelCachePVC (never Create, no owner-ref, no operator labels), so it survives InferenceService deletion instead of being garbage-collected. Precedence (claimName wins before the shared/perService branch), RBAC (existing persistentvolumeclaims get suffices), and CRD/deepcopy/chart sync all check out.

Merging now. One non-blocking follow-up, filed as #965: the symmetric silent-drop when caching is off (claimName set but useCache false routes to buildEmptyDirStorageConfig, which ignores claimName) wants the same ModelCacheClaimIgnored warning you already added for the pvc:// case. Purely observability, no data-loss risk.

Thanks for this, the safe-by-default handling of user PVCs is exactly right.

@Defilan Defilan merged commit aab5a58 into defilantech:main Jul 3, 2026
23 checks passed
@github-actions github-actions Bot mentioned this pull request Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Per-InferenceService model cache PVC override (spec.modelCache.claimName)

2 participants