Skip to content

feat: Introduce v1alpha2 version of LlamaStackDistribution CRD#253

Open
VaishnaviHire wants to merge 11 commits intollamastack:mainfrom
VaishnaviHire:implement_run_config_schema
Open

feat: Introduce v1alpha2 version of LlamaStackDistribution CRD#253
VaishnaviHire wants to merge 11 commits intollamastack:mainfrom
VaishnaviHire:implement_run_config_schema

Conversation

@VaishnaviHire
Copy link
Copy Markdown
Collaborator

@VaishnaviHire VaishnaviHire commented Feb 23, 2026

This PR introduces the v1alpha2 API version for the LlamaStackDistribution CRD, enabling declarative, Kubernetes-native configuration of LlamaStack servers. Instead of requiring users to manually craft and supply a config.yaml via ConfigMap (as in v1alpha1), the operator now generates the server configuration automatically from structured CR fields (providers, resources, storage, networking). Both API versions are served concurrently with full conversion webhook support.

v1alpha2 Example

The v1alpha2 API replaces environment-variable-driven configuration with structured, declarative fields. All provider fields use typed []ProviderConfig slices with CEL validation.

Basic : single Ollama provider:

apiVersion: llamastack.io/v1alpha2
kind: LlamaStackDistribution
metadata:
  name: llamastackdistribution-v1alpha2-sample
spec:
  distribution:
    name: starter
  providers:
    inference:
      - provider: ollama
        endpoint: http://ollama-server-service.ollama-dist.svc.cluster.local:11434/v1
  resources:
    models:
      - name: "llama3.2:1b"
  networking:
    port: 8321
  workload:
    replicas: 1

Advanced : vLLM with secret refs, PostgreSQL storage, pgvector:

apiVersion: llamastack.io/v1alpha2
kind: LlamaStackDistribution
metadata:
  name: llamastack-vllm-pg
spec:
  distribution:
    name: starter
  providers:
    inference:
      - provider: vllm
        endpoint: http://vllm-service.vllm.svc.cluster.local:8000/v1
        secretRefs:
          api_key:
            name: vllm-creds
            key: token
    vectorIo:
      - provider: pgvector
        secretRefs:
          host:
            name: pg-credentials
            key: host
        settings:
          port: 5432
          db: llamastack
  resources:
    models:
      - name: llama3.2-8b
  storage:
    kv:
      type: redis
      endpoint: redis://redis-service.redis.svc.cluster.local:6379
    sql:
      type: postgres
      connectionString:
        name: pg-credentials
        key: dsn
  disabled:
    - safety
  workload:
    replicas: 2

Review Guide

This is a large PR (86 files, ~18k lines). The sections below group changes by area matching the commit structure. Each section is self-contained and can be reviewed independently.

1. v1alpha2 CRD Schema & Conversion (commit: e7cc5c2)

File What to review
api/v1alpha2/llamastackdistribution_types.go New spec/status types. Key design: typed []ProviderConfig slices with CEL validation for provider ID uniqueness. OverrideConfig is mutually exclusive with providers/resources/storage/disabled.
api/v1alpha2/zz_generated.deepcopy.go Auto-generated
api/v1alpha1/llamastackdistribution_conversion.go Bidirectional v1alpha1 ↔ v1alpha2 conversion. Uses JSON blob annotations (annV1Alpha1Extras, annV1Alpha2Extras) for lossless round-trips in both directions.
api/v1alpha1/llamastackdistribution_conversion_test.go Round-trip tests: providers, resources, storage, disabled, TLS, expose hostname, status fields
config/crd/bases/llamastack.io_llamastackdistributions.yaml Generated CRD YAML with OpenAPI and CEL rules

2. Validating Webhook (commit: a45c9f6)

File What to review
api/v1alpha2/llamastackdistribution_webhook.go Validating webhook: provider ID uniqueness across all API types, distribution name validation, model provider reference checks
api/v1alpha2/llamastackdistribution_webhook_test.go Unit tests: cross-slice collision detection, deriveProviderID behavior, unknown distribution rejection, edge cases
config/webhook/* Webhook service, manifests, kustomize config
config/certmanager/* Certificate and issuer for vanilla Kubernetes
config/crd/kustomization.yaml Enabled webhook/cert-manager patches
config/default/kustomization.yaml Enabled webhook and cert-manager components
config/default/manager_webhook_patch.yaml Webhook port and TLS cert volume mount
main.go Webhook registration

3. Config Generation Pipeline (commit: 3feb572)

File What to review
pkg/config/config.go Pipeline: resolve base config → expand providers → expand resources → apply storage → apply disabled APIs → clean registered_resources → override port → render YAML. Key: deep-copy safety, deterministic output
pkg/config/provider.go Provider expansion: remote:: prefix, endpoint → base_url, sorted secret ref iteration, settings merge with override protection
pkg/config/resource.go Model/tool/shield expansion with default provider resolution and provider existence validation
pkg/config/storage.go KV (sqlite/redis) and SQL (sqlite/postgres) with secret env var mapping
pkg/config/secret_resolver.go Resolves secretRefs maps to env vars (LLSD_<PROVIDER_ID>_<KEY>) and ${env.VAR_NAME} substitutions
pkg/config/resolver.go Base config resolver: resolves embedded configs by distribution name
pkg/config/version.go Config version detection (supports versions 1-2)
pkg/config/types.go Shared types: BaseConfig, ProviderEntry, GeneratedConfig
pkg/config/config_test.go Unit tests: determinism, provider/resource expansion, storage, secret resolution, disabled API cleanup, deep-copy safety
pkg/config/configs/*/config.yaml Embedded base configs for starter, starter-gpu, postgres-demo distributions
distributions.json Distribution metadata

4. Controller Integration (commit: bf90527)

File What to review
controllers/v1alpha2_config.go v1alpha2 config handling: determines config source (override / generated / default), creates immutable ConfigMaps with content-hash naming, validates secret/ConfigMap refs, injects secret-backed env vars into pod spec, cleans up old ConfigMaps
controllers/llamastackdistribution_controller.go Integration: calls handleV1Alpha2NativeConfig before standard reconcile. Dual status update path for v1alpha2 CRs
controllers/kubebuilder_rbac.go RBAC markers: added secrets (get/list/watch) and configmaps (delete)
controllers/resource_helper.go Deprecated startupScript. Sets RUN_CONFIG_PATH env var; image's built-in entrypoint.sh handles startup
controllers/resource_helper_test.go Updated assertions: RUN_CONFIG_PATH instead of command/args overrides
controllers/suite_test.go Envtest setup with webhook server
controllers/testing_support_test.go Test constants and helpers
controllers/llamastackdistribution_controller_test.go Envtest integration tests: config generation, ConfigMap creation, secret env var injection, status updates
config/rbac/role.yaml Generated ClusterRole with secrets and configmaps-delete permissions

5. OpenShift Webhook Overlay (commit: e3f405b)

File What to review
config/openshift/kustomization.yaml OpenShift overlay: replaces cert-manager with service-serving certificates
config/openshift/crd_ca_patch.yaml CRD CA injection annotation
config/openshift/manager_webhook_patch.yaml Manager cert volume mount
config/openshift/webhook_ca_patch.yaml Webhook CA injection annotation

6. E2E Tests (commit: c6d24e8)

File What to review
tests/e2e/creation_v1alpha2_test.go v1alpha2 CR creation, ConfigMap generation, Ready phase, secret env var injection into Deployment
tests/e2e/conversion_test.go Cross-version read (v1alpha1 as v1alpha2 and vice versa)
tests/e2e/webhook_validation_test.go Webhook rejects: missing distribution, duplicate provider IDs, invalid provider references
tests/e2e/validation_test.go CRD structure, webhook service/TLS, operator readiness
tests/e2e/creation_test.go Updated v1alpha1 creation tests
tests/e2e/e2e_test.go Test suite registration
tests/e2e/test_utils.go Test helpers
.github/workflows/run-e2e-test.yml CI workflow updates for v1alpha2 targets

7. Documentation & Samples (commit: 3b6972b)

File What to review
docs/migration-v1alpha1-to-v1alpha2.md Migration guide: field mapping tables, before/after examples, step-by-step migration
docs/api-overview.md Full v1alpha2 API reference for both versions
README.md Updated quick start with v1alpha2 examples using secretRefs and list syntax
config/samples/v1alpha1/* Existing samples moved into versioned subdirectory
config/samples/v1alpha2/* New v1alpha2 samples: basic, HA, vLLM+Postgres, networking
specs/002-operator-generated-config/* Updated spec contracts and data model

8. Build Tooling & Release (commit: b5a2df7)

File What to review
Makefile Build target updates for v1alpha2
.gitignore Ignore patterns for generated artifacts
go.mod / go.sum Dependency updates
release/operator.yaml Regenerated release manifest with all v1alpha2 resources

@VaishnaviHire VaishnaviHire marked this pull request as draft February 23, 2026 14:51
@VaishnaviHire VaishnaviHire force-pushed the implement_run_config_schema branch 4 times, most recently from 28c7f4a to afec277 Compare February 27, 2026 08:03
@VaishnaviHire VaishnaviHire force-pushed the implement_run_config_schema branch from b760e03 to 1843bd3 Compare March 3, 2026 14:33
@VaishnaviHire VaishnaviHire changed the title [DRAFT] Implement run config schema feat: Introduce v1alpha2 version of LlamaStackDistribution CRD Mar 3, 2026
@VaishnaviHire VaishnaviHire marked this pull request as ready for review March 3, 2026 14:42
@VaishnaviHire
Copy link
Copy Markdown
Collaborator Author

@Mergifyio rebase

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 5, 2026

rebase

✅ Branch has been successfully rebased

@VaishnaviHire VaishnaviHire force-pushed the implement_run_config_schema branch from 1843bd3 to 31e0e3c Compare March 5, 2026 15:08
Comment thread pkg/config/config.go Outdated
Comment thread pkg/config/config.go
Comment thread pkg/config/config.go Outdated
Comment thread controllers/v1alpha2_config.go
Comment thread controllers/v1alpha2_config.go
Comment thread controllers/v1alpha2_config.go Outdated
Comment thread config/samples/v1alpha1/llamastackdistribution.yaml
@VaishnaviHire VaishnaviHire force-pushed the implement_run_config_schema branch 2 times, most recently from 3fde13b to 673d4e7 Compare March 6, 2026 17:21
@mfleader mfleader self-requested a review March 6, 2026 19:26
Comment thread api/v1alpha2/llamastackdistribution_webhook.go
Comment thread pkg/config/secret_resolver.go Outdated
logger := log.FromContext(ctx)

// Handle v1alpha2 native config generation before standard reconciliation.
v1a2Result, v1a2Err := r.handleV1Alpha2NativeConfig(ctx, key, instance)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for FR-097 (preserve running Deployment on config generation failure).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this covered? I don't see a test that creates a Deployment first, then fails config generation, then checks the Deployment is unchanged.

Comment thread controllers/v1alpha2_config.go
Comment thread controllers/v1alpha2_config.go
Comment thread pkg/config/secret_resolver.go Outdated
Comment thread pkg/config/resource.go
Comment thread pkg/config/resource.go
Comment thread api/v1alpha2/llamastackdistribution_webhook.go
Comment thread tests/e2e/webhook_validation_test.go Outdated
Comment thread tests/e2e/webhook_validation_test.go Outdated
@VaishnaviHire VaishnaviHire force-pushed the implement_run_config_schema branch from 673d4e7 to ac0683d Compare March 9, 2026 09:35
@VaishnaviHire VaishnaviHire force-pushed the implement_run_config_schema branch from fc77468 to 958b6f3 Compare March 20, 2026 11:56
Copy link
Copy Markdown
Collaborator

@eoinfennessy eoinfennessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of config gen pipeline. There are some critical issues that need to be addressed.

Comment thread pkg/config/config.go Outdated
Comment thread pkg/config/config.go Outdated
Comment thread pkg/config/provider.go Outdated
Comment on lines +136 to +138
for k, v := range settingsMap {
cfg[k] = v
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that values from the settings map can override the endpoint, secret refs, and the API key.

Should we skip adding items that are already in cfg? And log a warning?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe add settings to the cfg map first and then add fields like base_url secret_refs and api_key?

We need to be careful that secret_refs can't override api_key too (which it can currently).

Comment thread pkg/config/provider.go Outdated
Comment on lines +122 to +130
for key := range pc.SecretRefs {
ident := providerID + ":" + key
if sub, ok := substitutions[ident]; ok {
cfg[key] = sub
} else {
envName := GenerateEnvVarName(providerID, key)
cfg[key] = "${env." + envName + "}"
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of iteration is non-deterministic. This could potentially cause unnecessary Deployment updates.

We should sort the keys before iterating.

Comment thread pkg/config/secret_resolver.go Outdated
Comment on lines +101 to +107
for key, ref := range pc.SecretRefs {
addSecretToResolution(resolution, secretRefEntry{
ProviderID: providerID,
Field: key,
SecretName: ref.Name,
SecretKey: ref.Key,
})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of iteration is non-deterministic. This could potentially cause unnecessary Deployment updates.

We should sort the keys before iterating.

Comment thread pkg/config/config.go Outdated
Comment on lines +236 to +251
// apiNameToConfigKey maps CRD-style camelCase API names to config.yaml snake_case keys.
var apiNameToConfigKey = map[string]string{
"vectorIo": "vector_io",
"toolRuntime": "tool_runtime",
"postTraining": "post_training",
"datasetIo": "datasetio",
}

// normalizeAPIName converts a CRD-style camelCase API name to the config.yaml
// snake_case key. Names already in snake_case pass through unchanged.
func normalizeAPIName(api string) string {
if mapped, ok := apiNameToConfigKey[api]; ok {
return mapped
}
return api
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all effectively dead code because the disabled enum already specifies snake-case values.

Comment thread pkg/config/config.go
Comment on lines +371 to +382
// RenderConfigYAML serializes the config to deterministic YAML.
func RenderConfigYAML(config *BaseConfig) (string, error) {
// Build an ordered map for deterministic output
out := buildOrderedConfig(config)

data, err := yaml.Marshal(out)
if err != nil {
return "", fmt.Errorf("failed to marshal config YAML: %w", err)
}

return string(data), nil
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function mutates the provided config, which is unconventional and unexpected for a render function.

Consider having buildOrderedConfig write to out["registered_resources"] instead of config.Extra["registered_resources"].

Comment thread pkg/config/types.go
EvalStore map[string]interface{} `json:"eval_store,omitempty" yaml:"eval_store,omitempty"`
DatasetIOStore map[string]interface{} `json:"datasetio_store,omitempty" yaml:"datasetio_store,omitempty"`
Server map[string]interface{} `json:"server,omitempty" yaml:"server,omitempty"`
ExternalProviders map[string]interface{} `json:"external_providers,omitempty" yaml:"external_providers,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is effectively unused because we never use it in buildOrderedConfig.

We should delete it to avoid confusion.

Comment thread pkg/config/resource.go Outdated
}

if mc.ContextLength != nil && *mc.ContextLength > 0 {
if entry["provider_model_id"] == nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not needed. It is always true

Comment thread pkg/config/resource.go
Comment on lines +77 to +80
provider := mc.Provider
if provider == "" {
provider = defaultProvider
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have no validation that the model's provider ID actually exists.

The model can be registered with a non-existent provider and no error is returned. The llama-stack server would fail at startup with a confusing error about an unknown provider.

We should consider adding validation for this at the admission layer if not too complex. Otherwise, we can validate here and return an error so the CR's status can reflect the issue to users.

Copy link
Copy Markdown
Collaborator

@eoinfennessy eoinfennessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of webhooks:

We need to comprehensively test all validation logic (webhook and CEL). Currently we don't do any testing of validation logic.

There are some problems with data loss and stale data in the conversion logic. I added a suggestion to fix this.

Comment thread api/v1alpha2/llamastackdistribution_webhook.go
Comment thread api/v1alpha1/llamastackdistribution_conversion.go
Comment thread api/v1alpha1/llamastackdistribution_conversion.go
@VaishnaviHire VaishnaviHire force-pushed the implement_run_config_schema branch from 958b6f3 to 4021d2f Compare March 23, 2026 07:00
@VaishnaviHire
Copy link
Copy Markdown
Collaborator Author

@eoinfennessy I have addressed the comments and updated the commits. Please take a look.

@eoinfennessy
Copy link
Copy Markdown
Collaborator

eoinfennessy commented Mar 23, 2026

@VaishnaviHire, thanks for addressing the comments. In future review cycles on this PR, please avoid squashing and force-pushing. Instead, please add new commits for each change. This makes it easier for me to review the changes that have been made between PR reviews, which is especially tricky in such a large PR.

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 27, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @VaishnaviHire please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 27, 2026
Copy link
Copy Markdown
Collaborator

@eoinfennessy eoinfennessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-reviewed after the previous suggestions. Thanks for addressing these. Couple of things left from these reviews:

  1. CEL validation tests: Let's add envtest tests to ensure all of our complex CEL validation is actually working.
  2. Split TLSSpec for server and client: see my latest comment in the thread above discussing this
  3. Small bug remaining in resource.go (see below)

Comment thread pkg/config/resource.go
if provider == "" {
return nil, fmt.Errorf("failed to expand model %q: no provider specified and no default inference provider found", mc.Name)
}
if mc.Provider != "" && !providerExists(provider, userProviders, base) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check mc.Provider != "" means: "only validate if the user explicitly set a provider." But this creates a blind spot — if the user omits the provider and the default is used, no existence check happens at all. The default provider could be stale or wrong, and the error would only surface at LlamaStack server startup with a confusing message about an unknown provider.

The fix is simply:

if !providerExists(provider, userProviders, base) {

This validates the provider regardless of whether it came from the user or from the default, which is what you'd want — if we resolved a provider name, we should verify it exists.

Copy link
Copy Markdown
Collaborator

@eoinfennessy eoinfennessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review focussing on /controller. Mostly minor suggestions, but a couple of major things related to surfacing errors and status.

Comment on lines +195 to +199
// Handle v1alpha2 native config generation before standard reconciliation.
v1a2Result, v1a2Err := r.handleV1Alpha2NativeConfig(ctx, key, instance)
if v1a2Err != nil {
logger.Error(v1a2Err, "failed to handle v1alpha2 native config")
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FR-097 states:

If config generation or validation fails during a CR update, the operator MUST preserve the current running Deployment (image, ConfigMap, env vars) unchanged and set status condition ConfigGenerated=False with the failure reason. The running instance MUST NOT be disrupted.

Two gaps:

  1. ConfigGenerated=False is never set. When handleV1Alpha2NativeConfig fails, v1a2Result is nil, so finalizeReconciliation takes the else branch and calls updateStatus — which only writes v1alpha1 status fields. SetV1Alpha2Condition is only called inside persistV1Alpha2Status, which is only reached on success. The constant ReasonConfigGenFailed is declared but unused. The failure is logged to operator stdout but never surfaces in .status.conditions.

  2. No structural guarantee the Deployment is preserved. handleV1Alpha2NativeConfig mutates v1Instance in-place (setting UserConfig and appending env vars) after all fallible operations, then reconcileResources uses the same pointer to reconcile the Deployment. Today the mutation ordering is safe — mutations happen last, after all fallible steps. But this is an implicit invariant: if a fallible step is later added after the UserConfig assignment, reconcileResources would reconcile against a half-modified spec, potentially pointing the Deployment at a ConfigMap that doesn't exist.

Suggested approach: On handleV1Alpha2NativeConfig error, skip reconcileResources, persist ConfigGenerated=False with the failure reason, and return the error to requeue. This satisfies both halves of FR-097: the Deployment is untouched and the failure is visible in status.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also no test asserting that .status.conditions contains ConfigGenerated=False after a failed config generation.

// v1alpha2 Condition reasons.
const (
ReasonConfigGenSucceeded = "ConfigGenerationSucceeded"
ReasonConfigGenFailed = "ConfigGenerationFailed"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused

Comment on lines +232 to +234
if err := r.persistV1Alpha2Status(ctx, key, instance, v1a2Result); err != nil {
logger.Error(err, "failed to update v1alpha2 status")
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should return the status update error to match v1alpha1 behaviour in the else block, and ensure the status is eventually consistent.


// updateStatus refreshes the LlamaStack status.
func (r *LlamaStackDistributionReconciler) updateStatus(ctx context.Context, instance *llamav1alpha1.LlamaStackDistribution, reconcileErr error) error {
// computeStatus computes all status fields on the in-memory v1alpha1 instance
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the stale updateStatus comment above this line

Comment on lines +314 to +336
for _, envVar := range resolution.EnvVars {
if envVar.ValueFrom == nil || envVar.ValueFrom.SecretKeyRef == nil {
continue
}

secretName := envVar.ValueFrom.SecretKeyRef.Name
secretKey := envVar.ValueFrom.SecretKeyRef.Key

secret := &corev1.Secret{}
if err := r.Get(ctx, types.NamespacedName{
Name: secretName,
Namespace: namespace,
}, secret); err != nil {
if k8serrors.IsNotFound(err) {
return fmt.Errorf("failed to find Secret %q in namespace %q (referenced by env var %s)", secretName, namespace, envVar.Name)
}
return fmt.Errorf("failed to get Secret %q: %w", secretName, err)
}

if _, ok := secret.Data[secretKey]; !ok {
return fmt.Errorf("failed to find key %q in Secret %q in namespace %q", secretKey, secretName, namespace)
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe consider aggregating errors here to provide a better UX.

Comment on lines +386 to +395
for i, mc := range spec.Resources.Models {
if mc.Provider != "" {
if _, ok := providerIDs[mc.Provider]; !ok {
return fmt.Errorf(
"resources.models[%d].provider: provider ID %q not found; available providers: %s",
i, mc.Provider, strings.Join(sortedKeys(providerIDs), ", "),
)
}
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could aggregate errors here too.

Comment on lines +495 to +498
} else {
status.Conditions[i].Reason = reason
status.Conditions[i].Message = message
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no ObservedGeneration set on the condition. Without it, a client can't distinguish whether a ConfigGenerated=True condition was set for the current spec generation or a previous one. Consider setting condition.ObservedGeneration = instance.Generation to match the convention used by most Kubernetes controllers.

Comment on lines +991 to +1003
namespace := createTestNamespace(t, "test-v1alpha2-secret-ref")
operatorNamespace := createTestNamespace(t, "test-v1alpha2-secret-op")
t.Setenv("OPERATOR_NAMESPACE", operatorNamespace.Name)

// Create operator config ConfigMap (required by NewLlamaStackDistributionReconciler)
opConfig := &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: "llama-stack-operator-config",
Namespace: operatorNamespace.Name,
},
Data: map[string]string{},
}
require.NoError(t, k8sClient.Create(t.Context(), opConfig))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated 6 times. Consider writing a helper:

func setupV1Alpha2Env(t *testing.T, prefix string) (ns *corev1.Namespace, opNs *corev1.Namespace)

Comment on lines +1030 to +1037
clusterInfo := &cluster.ClusterInfo{
OperatorNamespace: operatorNamespace.Name,
DistributionImages: map[string]string{"starter": testImage},
}
reconciler, err := controllers.NewLlamaStackDistributionReconciler(
t.Context(), k8sClient, scheme.Scheme, clusterInfo,
)
require.NoError(t, err)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated 5 times. Consider a helper:

func newV1Alpha2Reconciler(t *testing.T, opNamespace string) *controllers.LlamaStackDistributionReconciler

Comment thread api/v1alpha2/llamastackdistribution_types.go Outdated
Comment on lines +185 to +187
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agents API has been renamed to responses: llamastack/llama-stack#5195

We probably need to update this in all embedded configs

@VaishnaviHire VaishnaviHire force-pushed the implement_run_config_schema branch from 4021d2f to 43e2ce6 Compare March 30, 2026 18:06
@VaishnaviHire
Copy link
Copy Markdown
Collaborator Author

@eoinfennessy I have addressed the comments. Additionally I added deploy time feature flag for v1alpha2 - an overlay v1alpha1-only that deploys only the v1alpha1 CRD.

Add typed v1alpha2 API (ProvidersSpec, ModelConfig, ExposeConfig,
StorageSpec) with kubebuilder validation markers and CEL rules.
Implement lossless v1alpha1<->v1alpha2 conversion via JSON-blob
annotations for fields that have no v1alpha1 equivalent.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
Implement admission webhook that validates distribution names against
the embedded registry, enforces unique provider IDs per category,
and checks model provider references. Wire up cert-manager and
webhook kustomize overlays.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
Build the config generation pipeline that renders a complete
config.yaml from v1alpha2 spec fields (providers, resources,
storage). Includes distribution registry, provider expansion,
model/tool/shield resource resolution, storage configuration,
secret-ref placeholder injection, and disabled-API pruning.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
Wire the config generation pipeline into the reconciliation loop.
Adds v1alpha2 config source detection, ConfigMap creation with
generated config.yaml, secret env-var injection into pod spec,
RBAC permissions for secrets and configmap deletion, and
controller-level integration tests.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
Add kustomize overlay for OpenShift deployments that patches the
webhook configuration to use the service-serving-cert-signer CA
instead of cert-manager, along with SCC-compatible manager patches.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
Add end-to-end tests covering v1alpha2 CR creation, conversion
round-trips, webhook validation rejection, secret env-var injection,
and TLS configuration. Refactor existing e2e tests into focused
test files with shared utilities.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
Reorganize sample CRs into v1alpha1/ and v1alpha2/ subdirectories.
Add v1alpha2 sample CRs (vLLM+Postgres, HA, networking), API
overview, and v1alpha1-to-v1alpha2 migration guide. Update README
with v1alpha2 quick-start examples.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
Update Makefile with webhook cert-manager targets, add go module
dependencies for the config pipeline and webhook infrastructure,
and regenerate the release operator manifest.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
Add v1alpha1-only overlay for deployment. This allows the v1alpha2 api to
be incrementally enabled for GA releases.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
Remove support for eval, safety and related apis

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
Assisted-by : claude-4.6-opus
@VaishnaviHire VaishnaviHire force-pushed the implement_run_config_schema branch from 43e2ce6 to 2de6d1e Compare March 30, 2026 18:44
@mergify mergify bot removed the needs-rebase label Mar 30, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Apr 2, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @VaishnaviHire please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants