Skip to content

feat(dis-cache-operator): scaffold self-service Azure Managed Redis#3472

Open
arealmaas wants to merge 11 commits into
mainfrom
arealmaas/redis-operator-valkey
Open

feat(dis-cache-operator): scaffold self-service Azure Managed Redis#3472
arealmaas wants to merge 11 commits into
mainfrom
arealmaas/redis-operator-valkey

Conversation

@arealmaas

@arealmaas arealmaas commented May 19, 2026

Copy link
Copy Markdown
Contributor

Add dis-cache-operator (RFC 0014) — self-service Azure Managed Redis

This PR scaffolds a new operator, dis-cache-operator, that reconciles a Redis CR into Azure Managed Redis (Microsoft.Cache/redisEnterprise) via Azure Service Operator. It mirrors the proven patterns from dis-vault-operator (single-resource-per-CR, federated-identity-owned) and dis-pgsql-operator (private endpoint + shared private DNS + AKS VNet link).

See RFC 0014 for the full design.

The operator was initially scaffolded under the name dis-redis-operator; it was renamed to dis-cache-operator in this branch so the naming can later cover Valkey and other managed cache backends without another rename.

Feature Behavior (BDD)

Given a Redis custom resource in the team namespace that references a ready ApplicationIdentity via spec.identityRef,
When the operator reconciles the CR,
Then it computes a deterministic Azure cluster name from namespace + name + environment,
And it creates ASO RedisEnterprise (cluster) and RedisEnterpriseDatabase resources with accessKeysAuthentication=Disabled, TLS-only client protocol by default, and port 10000,
And it creates an ASO PrivateEndpoint targeting the cluster in the configured AKS data subnet,
And it get-or-creates the shared privatelink.redis.azure.net private DNS zone and the AKS VNet link to it (label-managed, not owner-referenced to any single CR),
And it publishes status conditions IdentityReady, ClusterReady, DatabaseReady, PrivateEndpointReady, PrivateDNSReady, AccessPolicyReady, and an aggregated Ready,
And it populates status.azureName, status.hostName, status.port, status.clusterResourceId, status.databaseResourceId, and status.ownerPrincipalId.

Given the referenced identity is not yet ready (missing status.principalId or Ready=True),
When the operator reconciles,
Then it sets IdentityReady=False with reason IdentityNotReady, emits dependent ASO conditions as IdentityNotReady (rather than reading stale state), leaves Azure resources untouched, and requeues after 5 seconds.

Given the Redis CR specifies spec.serviceAccountRef instead of spec.identityRef,
When the operator reconciles,
Then it resolves the principal from the workload-identity annotations (azure.workload.identity/client-id and dis.altinn.cloud/principal-id) on the referenced ServiceAccount.

Given the Redis CR is deleted,
When the operator observes the deletion,
Then Kubernetes garbage collection cascades deletion to the owner-referenced cluster, database, and private endpoint resources, while the shared DNS zone and VNet link remain (they outlive any single CR).

Given the PrivateDnsZonesVirtualNetworkLink is not yet ready,
When the operator computes PrivateDNSReady,
Then it requires both the zone and the VNet link to be Ready before reporting True, so applications never see a True DNS condition while name resolution from AKS still fails.

Given any of ClusterReady, DatabaseReady, PrivateEndpointReady, or PrivateDNSReady is not yet True,
When the reconcile loop completes,
Then the controller requeues after provisioningRequeueDelay instead of waiting only for an owned-resource watch event (the shared DNS zone is label-managed and never fires watches).

Note: AccessPolicyAssignment reconciliation is deferred to a follow-up PR (the upstream ASO type is not yet available in v2.17.0). The AccessPolicyReady condition reports Unknown / Pending until then.

ASCII Diagram

                       ┌──────────────────────────────┐
                       │ Team namespace               │
                       │                              │
                       │  Redis CR ──ref──> AppIdent. │
                       │     │                  │     │
                       │     │ (controller      │     │
                       │     │  resolves        │     │
                       │     │  principalId)    │     │
                       └─────┼──────────────────┼─────┘
                             │                  │
                             ▼                  │
            ┌────────────────────────────┐      │
            │ dis-cache-operator         │      │
            └─────┬───────────┬──────────┘      │
                  │           │                 │
       owns: ┌────▼───┐  ┌────▼──────┐          │
             │ ASO    │  │ ASO       │          │
             │ Cluster│  │ Database  │          │
             └────┬───┘  └─────┬─────┘          │
                  │            │                │
        ┌─────────▼──┐    ┌────▼─────────┐      │
        │ ASO        │    │ (future PR)  │      │
        │ Private    │    │ Access       │      │
        │ Endpoint   │    │ Policy       │      │
        └─────┬──────┘    │ Assignment   │      │
              │           └──────────────┘      │
              │                                 │
              ▼                                 │
   ┌─────────────────────────────────────┐      │
   │ shared (label-managed, namespace-   │      │
   │ scoped, not owner-ref'd to any CR): │      │
   │ - PrivateDnsZone:                   │      │
   │     privatelink.redis.azure.net     │      │
   │ - PrivateDnsZonesVirtualNetworkLink │      │
   │     → AKS VNet                      │      │
   └─────────────────┬───────────────────┘      │
                     │                          │
                     ▼                          │
          ┌──────────────────────┐              │
          │ Azure subscription   │              │
          │ ┌────────────────┐   │   ┌──────────▼───────────┐
          │ │ RedisEnterprise│◀──┼───│ Workload pod         │
          │ │ + Database     │   │   │ (TLS 10000, Entra    │
          │ │ + Priv. EP     │   │   │  token via federated │
          │ └────────────────┘   │   │  workload identity)  │
          └──────────────────────┘   └──────────────────────┘

Test plan

  • cd services/dis-cache-operator && make fmt vet test lint — all green locally
  • make manifests — CRD reflects the RedisPersistence aof/rdb mutual-exclusion XValidation rule
  • CI: dis-cache-lint-test.yml runs golangci-lint and make test on the new path filter
  • CI: dis-cache-release.yml builds the image on merge to main; release-please opens a dis-cache-v0.1.0 PR
  • Apply config/samples/redis_v1alpha1_redis.yaml against a Kind cluster; verify the CR is admitted and conditions surface as expected

Summary by CodeRabbit

  • New Features

    • Introduced a Redis cache operator CRD and controller to provision Azure Managed Redis (cluster, DB, private endpoint) via Kubernetes.
    • Supports application-identity and service-account auth; secure metrics over HTTPS with Prometheus scraping and network policy.
  • Documentation

    • Added operator README, contributor guide, changelog, and samples/quick-start.
  • Tests

    • Added unit and controller test suites and test helpers.
  • Chores

    • Added CI/CD workflows, Docker build, kustomize manifests, RBAC, network policies, and Makefile/tooling.

Review Change Stack

…perator (RFC 0014)

Adds a new operator that reconciles a Redis CR into Azure Managed Redis
(Microsoft.Cache/redisEnterprise) via ASO. Mirrors dis-vault-operator
(single-resource-per-CR, federated-identity-owned) and dis-pgsql-operator
(private endpoint + private DNS + AKS VNet link). Entra-only data-plane
access; no shared keys exposed. Shared privatelink.redis.azure.net zone
(get-or-create, label-managed). Access policy assignment is deferred to a
follow-up PR pending the ASO type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arealmaas arealmaas requested a review from a team as a code owner May 19, 2026 21:28
@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces dis-cache-operator: a Kubernetes operator that provisions Azure Managed Redis (Redis Enterprise) via a Redis CRD, handling identity resolution, deterministic naming, ASO resource creation (cluster, database, private endpoint, DNS), status aggregation, manifests, build tooling, and CI/CD workflows.

Changes

DIS Redis Cache Operator

Layer / File(s) Summary
API Contract & CRD Definition
services/dis-cache-operator/api/v1alpha1/*, services/dis-cache-operator/config/crd/*
Defines Redis CRD with enum-backed SKU/protocol/eviction specs, identity/persistence/modules, status conditions, generated deepcopy methods, and a full OpenAPI CRD manifest with validation.
Configuration, Validation & Identity Resolution
services/dis-cache-operator/internal/config/*, services/dis-cache-operator/internal/redis/*identity*.go
Operator runtime config validation (UUIDs, ARM IDs, subnet parsing) and owner identity resolution from ApplicationIdentity CRs or annotated ServiceAccounts with pending/requeue semantics; unit tests cover parsing and resolution.
Resource Builders & Deterministic Naming
services/dis-cache-operator/internal/redis/builders*.go, services/dis-cache-operator/internal/redis/naming*.go, services/dis-cache-operator/internal/redis/labels.go, services/dis-cache-operator/internal/redis/status.go
Deterministic Azure/Kubernetes naming (sanitization + stable hash), ASO resource builders for RedisEnterprise cluster/database, PrivateEndpoint, shared DNS/VNet link, and helpers to translate modules/persistence; builder tests included.
Controller Reconciliation & Status Management
services/dis-cache-operator/internal/controller/*
RedisReconciler implements fetch → identity resolve → CreateOrUpdate ASO resources → read ASO conditions → aggregate/update Redis status; watches ApplicationIdentity and ServiceAccount changes; network ensureers and resource sync helpers included.
Operator Entrypoint & Manager Setup
services/dis-cache-operator/cmd/main.go
Initializes scheme, parses CLI flags/environment, configures webhook and metrics TLS/auth, constructs controller manager with probes/leader election, registers RedisReconciler, and starts the manager.
Kubernetes Deployment Manifests & RBAC
services/dis-cache-operator/config/manager/*, services/dis-cache-operator/config/rbac/*, services/dis-cache-operator/config/default/*, services/dis-cache-operator/config/network-policy/*, services/dis-cache-operator/config/prometheus/*
Namespace and hardened Deployment, ServiceAccount, ClusterRole/Bindings for ASO and operator actions, leader-election RBAC, metrics auth/reader roles, HTTPS metrics Service (8443), Prometheus ServiceMonitor, NetworkPolicy restricting metrics access, and kustomize overlays/patches.
Kind Development Setup & Sample Resources
services/dis-cache-operator/config/kind/*, services/dis-cache-operator/config/samples/*
Kind overlay with manager patch for local testing, sample ApplicationIdentity and ServiceAccount (workload identity annotations), and sample Redis CRs for both ownership modes.
Build Automation: Makefile & Dockerfile
services/dis-cache-operator/Makefile, services/dis-cache-operator/Dockerfile
Makefile for workspace/tooling/codegen, fmt/vet/lint/test/e2e, Kind orchestration, build/publish (buildx), installer generation, tool installation, and CRD acquisition; Dockerfile is multi-stage producing a static distroless non-root manager binary.
CI/CD Workflows & Release Configuration
.github/workflows/dis-cache-lint-test.yml, .github/workflows/dis-cache-release.yml, .release-please-manifest.json, release-please-config.json, services/dis-cache-operator/go.mod
Lint/test workflow (golangci-lint, make test), release workflow (reusable build/scan, Flux OCI builds for latest and tagged releases), release-please package config and manifest entry (0.1.0), and go.mod declaring Go version and dependencies.
Documentation & Project Configuration
services/dis-cache-operator/README.md, services/dis-cache-operator/AGENTS.md, services/dis-cache-operator/PROJECT, services/dis-cache-operator/CHANGELOG.md, services/dis-cache-operator/hack/boilerplate.go.txt, services/dis-cache-operator/version.txt, .gitignore
README and contributor AGENTS guidance, Kubebuilder PROJECT, CHANGELOG header, license boilerplate, version set to 0.1.0, and .omc/ ignored in .gitignore.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • tjololo
  • bengtfredh
  • sduranc
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch arealmaas/redis-operator-valkey

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🧹 Nitpick comments (3)
pr_description.md (1)

34-81: ⚡ Quick win

Add language identifier to fenced code block.

The ASCII diagram uses a fenced code block without a language specifier. While the content renders fine, adding a language identifier (e.g., text or ascii) improves tooling compatibility and silences the linter.

📝 Proposed fix
-```
+```text
                        ┌──────────────────────────────┐
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pr_description.md` around lines 34 - 81, The fenced ASCII diagram in
pr_description.md lacks a language identifier; update the opening
triple-backtick that precedes the diagram to include a language specifier (e.g.,
change ``` to ```text) so linters and tooling recognize it as plain text—look
for the fenced block containing the big ASCII diagram and modify its opening
fence accordingly.
services/dis-redis-operator/.omc/sessions/0f974381-8734-4756-95d3-bc013825cdcd.json (1)

1-8: 💤 Low value

Consider excluding AI session artifacts from version control.

This file appears to be a session artifact from an AI code generation tool (OMC/Claude Code). Development tool artifacts are typically excluded from source control via .gitignore to reduce repository noise and prevent accumulation of ephemeral session data.

If these artifacts serve a purpose (e.g., audit trail, reproducibility), please clarify; otherwise, consider adding .omc/ to .gitignore.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@services/dis-redis-operator/.omc/sessions/0f974381-8734-4756-95d3-bc013825cdcd.json`
around lines 1 - 8, This file is an ephemeral AI session artifact stored under
the .omc session output (e.g., the session file named
0f974381-8734-4756-95d3-bc013825cdcd.json); exclude these artifacts from version
control by adding the .omc/ directory to .gitignore, remove any
already-committed session files from the repo (unstage/remove them so they no
longer track), and commit the updated .gitignore; if you need to retain certain
session logs for auditing, move those specific files into a dedicated tracked
audit directory and document that policy so only intended files are committed.
services/dis-redis-operator/config/samples/kustomization.yaml (1)

1-4: ⚡ Quick win

Consider including the service-account-based Redis sample.

The review context indicates two sample Redis CRs exist (redis_v1alpha1_redis.yaml and redis_v1alpha1_service_account_redis.yaml), but only the first is listed in this kustomization. If both samples demonstrate different authentication patterns (ApplicationIdentity vs ServiceAccount) and have distinct names, including both would help users test each pattern.

If they would conflict or should be applied separately, this is fine as-is.

📝 Proposed addition if both samples should be included
 namespace: default
 resources:
 - redis_v1alpha1_redis.yaml
+- redis_v1alpha1_service_account_redis.yaml
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/dis-redis-operator/config/samples/kustomization.yaml` around lines 1
- 4, Update the kustomization resource list to include the service-account-based
sample by adding an entry for "redis_v1alpha1_service_account_redis.yaml"
alongside "redis_v1alpha1_redis.yaml" in the resources block of the
kustomization.yaml; ensure the CR filenames and metadata (names/namespaces) in
the manifest "redis_v1alpha1_service_account_redis.yaml" do not conflict with
the existing Redis CR (check metadata.name and metadata.namespace) so both can
be applied together, otherwise document/keep them separate and leave only the
intended sample referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/dis-redis-lint-test.yml:
- Around line 58-61: The CI step currently runs a mutating command ("go mod
tidy") before tests; replace that with non-mutating verification targets so CI
fails on drift instead of fixing files. In the "Running Tests" job step remove
"go mod tidy" and invoke the Make targets that validate dependencies and run
tests without modifying files (e.g., run "make verify-deps && make test-ci" or
your project's equivalent), ensuring the step keeps the existing "make test"
behavior replaced by the CI-safe "make test-ci" target.

In @.github/workflows/dis-redis-release.yml:
- Around line 48-49: The checkout step using
actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd should set
persist-credentials: false to avoid exposing the workflow token in
release/publish jobs; update the Checkout step where uses:
actions/checkout@de0fac2e... to include the persist-credentials: false input,
and make the same change for the other Checkout occurrence referenced (the one
also using actions/checkout in the same workflow).
- Around line 25-28: The workflow-wide permissions block currently grants
packages: write and id-token: write; change the global permissions block
(permissions) to minimal defaults (e.g., contents: read, packages: read or none,
id-token: none) and then grant packages: write and id-token: write only inside
the specific job(s) that perform publishing (e.g., the release/publish job) by
adding a permissions stanza under that job; update the permissions keys
(packages, id-token) accordingly so only publishing jobs get write access while
the global scope remains minimal.
- Around line 51-52: The second invocation of the action
./actions/flux/build-push-image in the build-release-flux-oci-release job is
missing required inputs (azure_subscription_id, azure_app_id, azure_tenant_id);
update the with block for that invocation to include the same three Azure
credential inputs used in the first invocation (azure_subscription_id,
azure_app_id, azure_tenant_id) so the action metadata requirements are satisfied
and the job can authenticate successfully.

In `@services/dis-redis-operator/api/v1alpha1/redis_types.go`:
- Around line 76-88: Add a kubebuilder XValidation annotation to
RedisPersistence to reject specs that set both AOF and RDB; specifically add a
comment annotation above the RedisPersistence type like //
+kubebuilder:validation:XValidation:rule="!(has(self.aof) && has(self.rdb))" and
a friendly message via // +kubebuilder:validation:XValidation:message="Only one
of 'aof' or 'rdb' may be set" so the API server will enforce the mutual
exclusivity for the AOF and RDB fields on the RedisPersistence struct.

In `@services/dis-redis-operator/config/manager/manager.yaml`:
- Around line 46-72: The pod spec currently sets
securityContext.readOnlyRootFilesystem: true but leaves volumeMounts and volumes
empty; add an emptyDir volume (name e.g. tmp-writable) to the pod spec's volumes
and add a matching volumeMount entry in the container's volumeMounts with
mountPath: /tmp and no readOnly flag so the container (and libraries used by
controller-runtime) have a writable /tmp; reference the existing keys
securityContext, volumeMounts, and volumes when making the change.

In `@services/dis-redis-operator/config/rbac/role.yaml`:
- Around line 69-80: The Role currently grants overly broad write permissions on
the source Redis CRs: remove the "create" and "delete" verbs from the verbs list
for the resource "redises" in the Role/ClusterRole defined in role.yaml so the
service account keeps read/watch and needed update/patch operations (e.g. keep
"get", "list", "watch", "patch", "update") but cannot create or delete
user-authored Redis CRs; update the verbs array under the "resources: - redises"
block accordingly.

In `@services/dis-redis-operator/Dockerfile`:
- Line 2: The Dockerfile's builder stage uses the mutable tag
"golang:1.26.3"—replace it with a digest-pinned image (e.g., the same
golang:1.26.3 image referenced by its sha256 digest) so builds are reproducible;
update the FROM line in the Dockerfile's builder stage (the "FROM golang:1.26.3
AS builder" statement) to use the digest form, obtaining the correct sha256 from
Docker Hub or by pulling the image locally, and follow the same digest-pinning
pattern used in the other services (lakmus, k6-image).

In `@services/dis-redis-operator/go.mod`:
- Around line 67-75: Update the vulnerable gRPC dependency in go.mod by changing
the google.golang.org/grpc version to at least v1.79.3 (replace the current
v1.78.0 reference) to remediate CVE-2026-33186, then run go get
google.golang.org/grpc@v1.79.3 and go mod tidy to ensure the lockfile and
indirect deps are updated; verify the service builds and tests pass (check for
any usages in your modules that import google.golang.org/grpc and update imports
if tooling reports mismatches).

In `@services/dis-redis-operator/internal/controller/redis_controller_network.go`:
- Around line 30-72: Both ensureSharedDNSZone and ensureSharedVNetLink are
create-only and skip reconciliation when the resource exists; change them to
upsert (create-or-update) so drift in spec/labels/annotations is reconciled. For
each function (ensureSharedDNSZone calling redispkg.BuildSharedPrivateDNSZone
and ensureSharedVNetLink calling redispkg.BuildSharedVNetLink) after r.Get(...)
finds an existing resource, compare and merge the desired fields (spec, labels,
annotations, relevant status-less fields) into the current object and call
r.Update (or use controllerutil.CreateOrUpdate) instead of returning nil; keep
the create path but handle apierrors.IsAlreadyExists gracefully and handle
update conflicts by retrying or returning the wrapped error. Ensure you update
the same resource types used (networkv1.PrivateDnsZone and
networkv1.PrivateDnsZonesVirtualNetworkLink) and preserve immutable fields (UID,
ResourceVersion, OwnerReferences) when copying values.

In `@services/dis-redis-operator/internal/controller/redis_controller_test.go`:
- Around line 9-14: Replace the tautological spec in the "smoke check:
reconciler scheme is registered" test with a real reconciler behavior assertion:
instantiate the controller's Reconcile logic (or the reconciler type used in
redis_controller_test.go), use a controller-runtime fake client to create a
Redis custom resource in the identity-pending state, call the reconciler's
Reconcile(ctx, request) and assert the returned Result requests a requeue (or
non-zero requeue duration) and that the object's status or conditions
transitioned as expected; update the test to refer to the reconciler type/method
used in this package (the reconciler struct and its Reconcile method) and the
Redis CR name/type to locate the code under test.

In `@services/dis-redis-operator/internal/controller/redis_controller.go`:
- Around line 165-167: The current reconcile early-return checks only
clusterReady and databaseReady (vars clusterReady, databaseReady) and stops
requeueing once they are true, which ignores private networking conditions;
update the readiness gate in the reconcile function so it also checks the
PrivateEndpoint and PrivateDNS condition objects (e.g., look up vars/conditions
named privateEndpointReady and privateDNSReady or their ConditionType values)
and include them in the OR expression that triggers ctrl.Result{RequeueAfter:
provisioningRequeueDelay}, nil so the controller continues periodic requeues
until all four conditions (clusterReady, databaseReady, PrivateEndpoint,
PrivateDNS) have Found==true and Status==metav1.ConditionTrue; ensure you
reference the same condition names used elsewhere in this file and preserve
existing requeue delay (provisioningRequeueDelay) behavior.

In `@services/dis-redis-operator/internal/redis/builders.go`:
- Around line 175-214: BuildPrivateEndpoint currently uses the
clusterKubernetesName parameter to populate PrivateLinkServiceReference.Name
without validation; ensure clusterKubernetesName is non-empty before composing
the pev1.PrivateLinkServiceConnection (e.g., validate at start of
BuildPrivateEndpoint that clusterKubernetesName != "" and return a descriptive
error if empty) so PrivateLinkServiceReference.Name is never left blank during
reconciliation.

In `@services/dis-redis-operator/internal/redis/identity.go`:
- Around line 62-77: In ActiveAuthReferenceName, add an explicit check that
rejects the case when both r.Spec.ServiceAccountRef and r.Spec.IdentityRef are
non-nil and return an error (e.g., "exactly one of identityRef or
serviceAccountRef must be set") before the existing switch so you don't silently
prefer ServiceAccountRef; keep the existing empty-name validation for
ServiceAccountRef.Name and IdentityRef.Name unchanged and mirror the same
invariant enforced by ResolveOwnerIdentity.

In `@services/dis-redis-operator/internal/redis/status.go`:
- Around line 50-88: AggregateReadyCondition currently returns Ready=True when
called with an empty conditions slice; add an explicit guard at the top of
AggregateReadyCondition to detect when len(conditions) == 0 and return a
non-ready condition (e.g., metav1.ConditionUnknown) by calling NewCondition with
redisv1alpha1.ConditionReady, the provided generation, metav1.ConditionUnknown
and a clear reason/message like "NoDependencies" / "no dependency conditions
present". Keep the subsequent hasFalse/hasUnknown logic unchanged so existing
cases still apply when conditions are provided.

In `@services/dis-redis-operator/Makefile`:
- Around line 221-227: The install/uninstall Makefile targets currently suppress
kustomize build failures via the "|| true" in the command assigned to variable
out, which hides real rendering errors; update the install and uninstall recipe
lines that set out (the ones referencing "$(KUSTOMIZE)" build config/crd) to
remove the "|| true" so the shell command fails on kustomize errors, and then
guard the subsequent apply/delete on whether $$out is empty (i.e. only skip when
build produces no output) so install/uninstall for targets install and uninstall
(and their use of KUSTOMIZE/KUBECTL) fail fast on build errors instead of
silently succeeding.

---

Nitpick comments:
In `@pr_description.md`:
- Around line 34-81: The fenced ASCII diagram in pr_description.md lacks a
language identifier; update the opening triple-backtick that precedes the
diagram to include a language specifier (e.g., change ``` to ```text) so linters
and tooling recognize it as plain text—look for the fenced block containing the
big ASCII diagram and modify its opening fence accordingly.

In
`@services/dis-redis-operator/.omc/sessions/0f974381-8734-4756-95d3-bc013825cdcd.json`:
- Around line 1-8: This file is an ephemeral AI session artifact stored under
the .omc session output (e.g., the session file named
0f974381-8734-4756-95d3-bc013825cdcd.json); exclude these artifacts from version
control by adding the .omc/ directory to .gitignore, remove any
already-committed session files from the repo (unstage/remove them so they no
longer track), and commit the updated .gitignore; if you need to retain certain
session logs for auditing, move those specific files into a dedicated tracked
audit directory and document that policy so only intended files are committed.

In `@services/dis-redis-operator/config/samples/kustomization.yaml`:
- Around line 1-4: Update the kustomization resource list to include the
service-account-based sample by adding an entry for
"redis_v1alpha1_service_account_redis.yaml" alongside
"redis_v1alpha1_redis.yaml" in the resources block of the kustomization.yaml;
ensure the CR filenames and metadata (names/namespaces) in the manifest
"redis_v1alpha1_service_account_redis.yaml" do not conflict with the existing
Redis CR (check metadata.name and metadata.namespace) so both can be applied
together, otherwise document/keep them separate and leave only the intended
sample referenced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c402d165-943c-4a8b-bceb-3e2b36b866b7

📥 Commits

Reviewing files that changed from the base of the PR and between d8a1765 and aeda999.

⛔ Files ignored due to path filters (2)
  • rfcs/0014-self-service-managed-redis.md is excluded by !rfcs/**
  • services/dis-redis-operator/go.sum is excluded by !**/*.sum
📒 Files selected for processing (64)
  • .github/workflows/dis-redis-lint-test.yml
  • .github/workflows/dis-redis-release.yml
  • .release-please-manifest.json
  • pr_description.md
  • release-please-config.json
  • services/dis-redis-operator/.omc/sessions/0f974381-8734-4756-95d3-bc013825cdcd.json
  • services/dis-redis-operator/AGENTS.md
  • services/dis-redis-operator/CHANGELOG.md
  • services/dis-redis-operator/Dockerfile
  • services/dis-redis-operator/Makefile
  • services/dis-redis-operator/PROJECT
  • services/dis-redis-operator/README.md
  • services/dis-redis-operator/api/v1alpha1/groupversion_info.go
  • services/dis-redis-operator/api/v1alpha1/redis_types.go
  • services/dis-redis-operator/api/v1alpha1/zz_generated.deepcopy.go
  • services/dis-redis-operator/cmd/main.go
  • services/dis-redis-operator/config/crd/bases/redis.dis.altinn.cloud_redises.yaml
  • services/dis-redis-operator/config/crd/kustomization.yaml
  • services/dis-redis-operator/config/default/deploy_vars_patch.yaml
  • services/dis-redis-operator/config/default/kustomization.yaml
  • services/dis-redis-operator/config/default/manager_metrics_patch.yaml
  • services/dis-redis-operator/config/default/metrics_service.yaml
  • services/dis-redis-operator/config/kind/applicationidentities.yaml
  • services/dis-redis-operator/config/kind/kustomization.yaml
  • services/dis-redis-operator/config/kind/manager_kind_patch.yaml
  • services/dis-redis-operator/config/kind/serviceaccounts.yaml
  • services/dis-redis-operator/config/manager/kustomization.yaml
  • services/dis-redis-operator/config/manager/manager.yaml
  • services/dis-redis-operator/config/network-policy/allow-metrics-traffic.yaml
  • services/dis-redis-operator/config/network-policy/kustomization.yaml
  • services/dis-redis-operator/config/prometheus/kustomization.yaml
  • services/dis-redis-operator/config/prometheus/monitor.yaml
  • services/dis-redis-operator/config/rbac/kustomization.yaml
  • services/dis-redis-operator/config/rbac/leader_election_role.yaml
  • services/dis-redis-operator/config/rbac/leader_election_role_binding.yaml
  • services/dis-redis-operator/config/rbac/metrics_auth_role.yaml
  • services/dis-redis-operator/config/rbac/metrics_auth_role_binding.yaml
  • services/dis-redis-operator/config/rbac/metrics_reader_role.yaml
  • services/dis-redis-operator/config/rbac/role.yaml
  • services/dis-redis-operator/config/rbac/role_binding.yaml
  • services/dis-redis-operator/config/rbac/service_account.yaml
  • services/dis-redis-operator/config/samples/kustomization.yaml
  • services/dis-redis-operator/config/samples/redis_v1alpha1_redis.yaml
  • services/dis-redis-operator/config/samples/redis_v1alpha1_service_account_redis.yaml
  • services/dis-redis-operator/go.mod
  • services/dis-redis-operator/hack/boilerplate.go.txt
  • services/dis-redis-operator/internal/config/config.go
  • services/dis-redis-operator/internal/config/config_test.go
  • services/dis-redis-operator/internal/controller/redis_auth_watch.go
  • services/dis-redis-operator/internal/controller/redis_controller.go
  • services/dis-redis-operator/internal/controller/redis_controller_network.go
  • services/dis-redis-operator/internal/controller/redis_controller_role.go
  • services/dis-redis-operator/internal/controller/redis_controller_status.go
  • services/dis-redis-operator/internal/controller/redis_controller_test.go
  • services/dis-redis-operator/internal/controller/suite_test.go
  • services/dis-redis-operator/internal/redis/builders.go
  • services/dis-redis-operator/internal/redis/builders_test.go
  • services/dis-redis-operator/internal/redis/identity.go
  • services/dis-redis-operator/internal/redis/identity_test.go
  • services/dis-redis-operator/internal/redis/labels.go
  • services/dis-redis-operator/internal/redis/naming.go
  • services/dis-redis-operator/internal/redis/naming_test.go
  • services/dis-redis-operator/internal/redis/status.go
  • services/dis-redis-operator/version.txt

Comment thread .github/workflows/dis-cache-lint-test.yml
Comment thread .github/workflows/dis-redis-release.yml Outdated
Comment thread .github/workflows/dis-cache-release.yml
Comment thread .github/workflows/dis-cache-release.yml
Comment thread services/dis-cache-operator/api/v1alpha1/redis_types.go
Comment thread services/dis-cache-operator/internal/controller/redis_controller.go
Comment thread services/dis-cache-operator/internal/redis/builders.go
Comment thread services/dis-cache-operator/internal/redis/identity.go
Comment thread services/dis-cache-operator/internal/redis/status.go
Comment thread services/dis-redis-operator/Makefile Outdated
arealmaas and others added 2 commits May 19, 2026 19:01
- requeue gate now also waits on private endpoint and shared DNS readiness
- getSharedDNSReady checks both PrivateDnsZone and VNet link before reporting True
- skip stale ASO condition reporting while identity is pending; emit IdentityNotReady instead
- enforce mutual exclusivity of aof/rdb on RedisPersistence via CRD XValidation
- validate clusterKubernetesName in BuildPrivateEndpoint, mutual identity/SA refs in ActiveAuthReferenceName, and empty input in AggregateReadyCondition
- Makefile: stop swallowing kustomize errors in install/uninstall, add test-e2e with guaranteed cleanup, align GOLANGCI_LINT_VERSION with CI (v2.11.4)
- workflows: persist-credentials: false on checkouts, narrow global permissions with per-job write grants, replace mutating go mod tidy with make verify-deps
- redis_types.go: fix Enum marker syntax and restore resource path plural to unblock make manifests

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…operator

Mechanical rename of the operator prefix from dis-redis to dis-cache:
- services/dis-redis-operator/ moved to services/dis-cache-operator/
- workflow files dis-redis-*.yml renamed to dis-cache-*.yml
- Go module path, image name, release tag prefix, Flux artifact name,
  leader-election ID, and all kustomize labels/selectors updated
- CRD Kind (Redis), API group (redis.dis.altinn.cloud), and internal/redis
  package are unchanged — only the operator name carries the new prefix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
services/dis-cache-operator/config/rbac/role.yaml (1)

74-75: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drop create/delete on source Redis CRs.

Line 74 and Line 75 still allow creating/deleting user-authored redises, which is over-privileged for this controller pattern.

Suggested RBAC narrowing
 - apiGroups:
   - redis.dis.altinn.cloud
   resources:
   - redises
   verbs:
-  - create
-  - delete
   - get
   - list
   - patch
   - update
   - watch
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/dis-cache-operator/config/rbac/role.yaml` around lines 74 - 75, The
Role currently grants overly broad permissions by including the "create" and
"delete" verbs for the Redis custom resources; remove the "create" and "delete"
entries from the verbs list that reference the Redis CR (resource name
"redises") in role.yaml so the controller cannot create or delete user-authored
Redis CRs—keep only the necessary read/modify verbs (e.g., get, list, watch,
update, patch) required by the controller's reconcile operations.
services/dis-cache-operator/go.mod (1)

93-93: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Upgrade google.golang.org/grpc from v1.78.0 before release.

Line 93 pins a version with a reported critical auth-bypass advisory. Please bump to a patched release and re-resolve the module graph.

#!/bin/bash
set -euo pipefail

cd services/dis-cache-operator

echo "Current grpc version:"
GOWORK=off go list -m -f '{{.Path}} {{.Version}}' google.golang.org/grpc

echo "Available updates:"
GOWORK=off go list -m -u -json google.golang.org/grpc | sed -n '1,160p'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/dis-cache-operator/go.mod` at line 93, The go.mod pins
google.golang.org/grpc at v1.78.0 which has a critical advisory; update the
module to a patched release (bump the version for google.golang.org/grpc in
go.mod) and re-resolve the module graph by running Go module update commands
(e.g., use go get to fetch the newer grpc version and then run go mod tidy) so
the dependency tree is refreshed; target the latest patched release reported by
go list -m -u for google.golang.org/grpc and verify with GOWORK=off go list -m
-f '{{.Path}} {{.Version}}' google.golang.org/grpc that the version has been
updated.
🧹 Nitpick comments (1)
.github/workflows/dis-cache-lint-test.yml (1)

58-61: ⚡ Quick win

Prefer make test-ci in this workflow.

Line 61 currently runs make test; switching to make test-ci keeps this workflow on the CI-oriented path (DISREDIS_SKIP_ENVTEST=1) and avoids unnecessary envtest dependency in CI.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/dis-cache-lint-test.yml around lines 58 - 61, Replace the
CI test invocation to use the CI-optimized target: change the workflow step that
runs the Make target from "make test" to "make test-ci" so the job follows the
CI path (DISREDIS_SKIP_ENVTEST=1) and avoids pulling in envtest; locate the step
containing the two commands "make verify-deps" and "make test" and update the
second command to "make test-ci".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/dis-cache-lint-test.yml:
- Line 29: The checkout steps use
actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd (the two occurrences
of the checkout step) without disabling credential persistence; update both
checkout steps to include persist-credentials: false so the job does not
automatically expose the GITHUB_TOKEN to subsequent steps. Locate the two
occurrences of the actions/checkout usage in the workflow and add the
persist-credentials: false input to each step, keeping the existing version pin
intact.

In `@services/dis-cache-operator/Makefile`:
- Around line 199-202: The Makefile currently prefixes the build step with a
leading hyphen which causes make to ignore failures; edit the recipe so the line
containing "$(CONTAINER_TOOL) buildx build --push --platform=$(PLATFORMS) --tag
${IMG} -f Dockerfile.cross ." is not prefixed with '-' (and likewise remove any
'-' before the matching "$(CONTAINER_TOOL) buildx rm dis-cache-operator-builder"
cleanup if present) so that buildx publish failures propagate and the target
fails instead of reporting success.

---

Duplicate comments:
In `@services/dis-cache-operator/config/rbac/role.yaml`:
- Around line 74-75: The Role currently grants overly broad permissions by
including the "create" and "delete" verbs for the Redis custom resources; remove
the "create" and "delete" entries from the verbs list that reference the Redis
CR (resource name "redises") in role.yaml so the controller cannot create or
delete user-authored Redis CRs—keep only the necessary read/modify verbs (e.g.,
get, list, watch, update, patch) required by the controller's reconcile
operations.

In `@services/dis-cache-operator/go.mod`:
- Line 93: The go.mod pins google.golang.org/grpc at v1.78.0 which has a
critical advisory; update the module to a patched release (bump the version for
google.golang.org/grpc in go.mod) and re-resolve the module graph by running Go
module update commands (e.g., use go get to fetch the newer grpc version and
then run go mod tidy) so the dependency tree is refreshed; target the latest
patched release reported by go list -m -u for google.golang.org/grpc and verify
with GOWORK=off go list -m -f '{{.Path}} {{.Version}}' google.golang.org/grpc
that the version has been updated.

---

Nitpick comments:
In @.github/workflows/dis-cache-lint-test.yml:
- Around line 58-61: Replace the CI test invocation to use the CI-optimized
target: change the workflow step that runs the Make target from "make test" to
"make test-ci" so the job follows the CI path (DISREDIS_SKIP_ENVTEST=1) and
avoids pulling in envtest; locate the step containing the two commands "make
verify-deps" and "make test" and update the second command to "make test-ci".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d008b9cb-d32f-4b98-84e3-fce749798707

📥 Commits

Reviewing files that changed from the base of the PR and between aeda999 and 1e78749.

⛔ Files ignored due to path filters (2)
  • rfcs/0014-self-service-managed-redis.md is excluded by !rfcs/**
  • services/dis-cache-operator/go.sum is excluded by !**/*.sum
📒 Files selected for processing (64)
  • .github/workflows/dis-cache-lint-test.yml
  • .github/workflows/dis-cache-release.yml
  • .release-please-manifest.json
  • pr_description.md
  • release-please-config.json
  • services/dis-cache-operator/.omc/sessions/0f974381-8734-4756-95d3-bc013825cdcd.json
  • services/dis-cache-operator/AGENTS.md
  • services/dis-cache-operator/CHANGELOG.md
  • services/dis-cache-operator/Dockerfile
  • services/dis-cache-operator/Makefile
  • services/dis-cache-operator/PROJECT
  • services/dis-cache-operator/README.md
  • services/dis-cache-operator/api/v1alpha1/groupversion_info.go
  • services/dis-cache-operator/api/v1alpha1/redis_types.go
  • services/dis-cache-operator/api/v1alpha1/zz_generated.deepcopy.go
  • services/dis-cache-operator/cmd/main.go
  • services/dis-cache-operator/config/crd/bases/redis.dis.altinn.cloud_redises.yaml
  • services/dis-cache-operator/config/crd/kustomization.yaml
  • services/dis-cache-operator/config/default/deploy_vars_patch.yaml
  • services/dis-cache-operator/config/default/kustomization.yaml
  • services/dis-cache-operator/config/default/manager_metrics_patch.yaml
  • services/dis-cache-operator/config/default/metrics_service.yaml
  • services/dis-cache-operator/config/kind/applicationidentities.yaml
  • services/dis-cache-operator/config/kind/kustomization.yaml
  • services/dis-cache-operator/config/kind/manager_kind_patch.yaml
  • services/dis-cache-operator/config/kind/serviceaccounts.yaml
  • services/dis-cache-operator/config/manager/kustomization.yaml
  • services/dis-cache-operator/config/manager/manager.yaml
  • services/dis-cache-operator/config/network-policy/allow-metrics-traffic.yaml
  • services/dis-cache-operator/config/network-policy/kustomization.yaml
  • services/dis-cache-operator/config/prometheus/kustomization.yaml
  • services/dis-cache-operator/config/prometheus/monitor.yaml
  • services/dis-cache-operator/config/rbac/kustomization.yaml
  • services/dis-cache-operator/config/rbac/leader_election_role.yaml
  • services/dis-cache-operator/config/rbac/leader_election_role_binding.yaml
  • services/dis-cache-operator/config/rbac/metrics_auth_role.yaml
  • services/dis-cache-operator/config/rbac/metrics_auth_role_binding.yaml
  • services/dis-cache-operator/config/rbac/metrics_reader_role.yaml
  • services/dis-cache-operator/config/rbac/role.yaml
  • services/dis-cache-operator/config/rbac/role_binding.yaml
  • services/dis-cache-operator/config/rbac/service_account.yaml
  • services/dis-cache-operator/config/samples/kustomization.yaml
  • services/dis-cache-operator/config/samples/redis_v1alpha1_redis.yaml
  • services/dis-cache-operator/config/samples/redis_v1alpha1_service_account_redis.yaml
  • services/dis-cache-operator/go.mod
  • services/dis-cache-operator/hack/boilerplate.go.txt
  • services/dis-cache-operator/internal/config/config.go
  • services/dis-cache-operator/internal/config/config_test.go
  • services/dis-cache-operator/internal/controller/redis_auth_watch.go
  • services/dis-cache-operator/internal/controller/redis_controller.go
  • services/dis-cache-operator/internal/controller/redis_controller_network.go
  • services/dis-cache-operator/internal/controller/redis_controller_role.go
  • services/dis-cache-operator/internal/controller/redis_controller_status.go
  • services/dis-cache-operator/internal/controller/redis_controller_test.go
  • services/dis-cache-operator/internal/controller/suite_test.go
  • services/dis-cache-operator/internal/redis/builders.go
  • services/dis-cache-operator/internal/redis/builders_test.go
  • services/dis-cache-operator/internal/redis/identity.go
  • services/dis-cache-operator/internal/redis/identity_test.go
  • services/dis-cache-operator/internal/redis/labels.go
  • services/dis-cache-operator/internal/redis/naming.go
  • services/dis-cache-operator/internal/redis/naming_test.go
  • services/dis-cache-operator/internal/redis/status.go
  • services/dis-cache-operator/version.txt
💤 Files with no reviewable changes (26)
  • services/dis-cache-operator/version.txt
  • services/dis-cache-operator/config/crd/kustomization.yaml
  • services/dis-cache-operator/config/rbac/kustomization.yaml
  • services/dis-cache-operator/config/manager/kustomization.yaml
  • services/dis-cache-operator/internal/controller/suite_test.go
  • services/dis-cache-operator/hack/boilerplate.go.txt
  • services/dis-cache-operator/config/default/manager_metrics_patch.yaml
  • services/dis-cache-operator/config/rbac/metrics_auth_role_binding.yaml
  • services/dis-cache-operator/.omc/sessions/0f974381-8734-4756-95d3-bc013825cdcd.json
  • services/dis-cache-operator/config/samples/kustomization.yaml
  • services/dis-cache-operator/Dockerfile
  • services/dis-cache-operator/config/default/deploy_vars_patch.yaml
  • services/dis-cache-operator/CHANGELOG.md
  • services/dis-cache-operator/config/kind/applicationidentities.yaml
  • services/dis-cache-operator/config/network-policy/kustomization.yaml
  • services/dis-cache-operator/internal/controller/redis_controller_test.go
  • services/dis-cache-operator/config/rbac/metrics_auth_role.yaml
  • services/dis-cache-operator/internal/redis/naming_test.go
  • services/dis-cache-operator/api/v1alpha1/groupversion_info.go
  • services/dis-cache-operator/config/prometheus/kustomization.yaml
  • services/dis-cache-operator/config/rbac/metrics_reader_role.yaml
  • services/dis-cache-operator/internal/redis/naming.go
  • services/dis-cache-operator/internal/config/config_test.go
  • services/dis-cache-operator/internal/config/config.go
  • services/dis-cache-operator/config/kind/serviceaccounts.yaml
  • services/dis-cache-operator/internal/controller/redis_controller_role.go
✅ Files skipped from review due to trivial changes (9)
  • services/dis-cache-operator/config/rbac/leader_election_role_binding.yaml
  • services/dis-cache-operator/README.md
  • services/dis-cache-operator/config/rbac/role_binding.yaml
  • services/dis-cache-operator/config/kind/kustomization.yaml
  • services/dis-cache-operator/PROJECT
  • services/dis-cache-operator/AGENTS.md
  • services/dis-cache-operator/config/default/kustomization.yaml
  • services/dis-cache-operator/api/v1alpha1/zz_generated.deepcopy.go
  • pr_description.md

Comment thread .github/workflows/dis-cache-lint-test.yml
Comment thread services/dis-cache-operator/Makefile
Comment thread pr_description.md Outdated
@arealmaas arealmaas changed the title feat(dis-redis-operator): scaffold self-service Azure Managed Redis feat(dis-cache-operator): scaffold self-service Azure Managed Redis May 20, 2026
arealmaas and others added 6 commits May 20, 2026 10:18
…w fixes

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- workflow: set persist-credentials: false on both checkout steps in
  dis-cache-lint-test.yml so GITHUB_TOKEN is not implicitly exposed to
  subsequent steps
- Makefile: drop the leading '-' from the docker-buildx push command so
  publish failures propagate (the '-' on the create/rm cleanup steps is
  retained intentionally — those legitimately tolerate failures)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- verify-deps cleanup now chmod -R u+w the temp module cache before rm,
  since go mod download writes files read-only and the trap-rm previously
  failed with Permission denied in CI
- add empty .trivyignore so the image-scan workflow can find the expected
  ignorefile (mirrors dis-vault-operator)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Dockerfile: pin golang:1.26.3 to its OCI image-index digest (sha256:6df14f4a...)
  so the builder image is reproducible, matching the lakmus pattern
- go.mod: bump google.golang.org/grpc v1.78.0 -> v1.79.3 to remediate
  CVE-2026-33186 (indirect, pulled by ASO/controller-runtime deps); cel.dev/expr
  carried up to v0.25.1 via go mod tidy

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arealmaas

Copy link
Copy Markdown
Contributor Author

Code review (functionality focus)

Reviewed against dis-vault-operator and dis-pgsql-operator patterns. Found 5 issues, in rough order of severity:

1. CRITICAL — PrivateEndpointsPrivateDnsZoneGroup is never created, so DNS resolution to the cluster will fail.

The RFC explicitly calls out creating a PrivateDnsZoneGroup per CR to bind the PE A-record into the shared privatelink.redis.azure.net zone. The K8s name helper PrivateDNSZoneGroupKubernetesName exists, but no controller path ever builds or upserts the resource. The PE is created and the shared zone + AKS VNet link are created, but without the zone group there is no A-record in the zone, so workloads in AKS resolving <cluster>.<region>.redis.azure.net will hit a CNAME to privatelink.redis.azure.net with no matching record — i.e. unreachable cluster. Note: PrivateEndpointsPrivateDnsZoneGroup already exists in ASO v2.17.0 under network/v1api20220701 (the package already imported as pev1), so no ASO bump is needed.

desiredPE, err := redispkg.BuildPrivateEndpoint(&redisObj, r.Config, desiredCluster.Name)
if err != nil {
return ctrl.Result{}, err
}
if err := r.upsertPrivateEndpoint(ctx, &redisObj, desiredPE); err != nil {
return ctrl.Result{}, err
}
if err := r.ensureSharedPrivateDNS(ctx, &redisObj); err != nil {
return ctrl.Result{}, err
}
}

}
// PrivateDNSZoneGroupKubernetesName returns the Kubernetes name used for the PrivateDnsZoneGroup CR.
func PrivateDNSZoneGroupKubernetesName(redisName string) string {
return DeterministicKubernetesName(redisName, privateDNSZoneGroupSuffix)
}

2. MAJOR — spec.version is accepted and validated by the CRD but never propagated to ASO.

RedisSpec.Version has a kubebuilder regex validator and appears in the CRD, but BuildASORedisEnterprise never reads it (the only Version reference there is MinimumTlsVersion). ASO's RedisEnterprise_Spec in v1api20250401 (v2.17.0) does not expose a Redis version field at all, so the CR field cannot currently be honored. Users will set spec.version: "7.4" and silently get whatever Azure picks. Either drop the field from v1alpha1 until ASO surfaces it, or document it as a no-op.

// Version is the Redis version (e.g. "7", "7.4"). Optional; defaults to the ASO default.
// +optional
// +kubebuilder:validation:Pattern="^[0-9]+(\\.[0-9]+)?$"
Version string `json:"version,omitempty"`

location := cfg.Location
skuName := cachev1.Sku_Name(r.Spec.SKU)
if r.Spec.SKU == "" {
skuName = cachev1.Sku_Name_Balanced_B0
}
ha := cachev1.ClusterProperties_HighAvailability_Enabled
if r.Spec.HighAvailability != nil && !*r.Spec.HighAvailability {
ha = cachev1.ClusterProperties_HighAvailability_Disabled
}
tls := cachev1.ClusterProperties_MinimumTlsVersion("1.2")
zones := []string{}
if r.Spec.HighAvailability == nil || *r.Spec.HighAvailability {
zones = []string{"1", "2", "3"}
}
tags := maps.Clone(r.Spec.Tags)
if len(tags) == 0 {
tags = nil
}
cluster := &cachev1.RedisEnterprise{
ObjectMeta: metav1.ObjectMeta{
Name: ClusterKubernetesName(r.Name),
Namespace: r.Namespace,
Labels: map[string]string{
ManagedResourceOwnerLabel: r.Name,
},
},
Spec: cachev1.RedisEnterprise_Spec{
AzureName: azureName,
Location: &location,
HighAvailability: &ha,
MinimumTlsVersion: &tls,
Owner: &genruntime.KnownResourceReference{
ARMID: fmt.Sprintf("/subscriptions/%s/resourceGroups/%s", cfg.SubscriptionID, cfg.ResourceGroup),
},
Sku: &cachev1.Sku{
Name: &skuName,
},
Tags: tags,
Zones: zones,
},
}
return cluster, nil
}

3. MAJOR — Access policy assignment deferred → deployed cluster is unauthenticatable, and Ready will never become True.

redis_controller_role.go is intentionally a placeholder. With AccessKeysAuthentication=Disabled (correct default) and no RedisEnterpriseDatabaseAccessPolicyAssignment, no Entra principal has data-plane access — the cluster cannot be used. Also, because AccessPolicyReady is hard-wired to Unknown and AggregateReadyCondition treats Unknown as DependenciesPending, the top-level Ready condition is permanently Unknown. Two follow-ups: (a) the in-code comment says ASO v2.18.0 will expose RedisEnterpriseDatabaseAccessPolicyAssignment in v1api20250401, but v2.18.0 only ships the legacy RedisAccessPolicyAssignment (for Microsoft.Cache/redis) — the enterprise variant is still upstream-pending. Worth noting in the RFC / PR description so consumers know v1alpha1 won't deliver a working Redis yet. (b) Consider gating cluster creation behind a feature flag until the access policy is implementable, so the operator can't ship a half-baked stack to a real cluster.

package controller
// Access policy assignment reconciliation lives here in a future PR. The Azure resource type
// `RedisEnterpriseDatabaseAccessPolicyAssignment` is not yet exposed by Azure Service Operator at
// the ASO version pinned for this slice (v2.17.0). When ASO publishes the type in v1api20250401
// (tracked upstream), wire reconciliation here mirroring the dis-vault-operator role assignment
// pattern: deterministic GUID for AzureName, owner-ref to the database, principalId from the
// resolved identity, and label-managed lifecycle for delete-and-recreate replacement.

))
// AccessPolicyAssignment is deferred to a follow-up PR (ASO type pending in upstream).
accessCond := applyCondition(redispkg.NewCondition(
redisv1alpha1.ConditionAccessPolicyReady,
redisObj.Generation,
metav1.ConditionUnknown,
"Pending",
"access policy assignment is not yet implemented in this slice",
))
applyCondition(redispkg.AggregateReadyCondition(
redisObj.Generation,
identityCond,
clusterCond,
databaseCond,
peCond,
dnsCond,
accessCond,
))

4. MAJOR — "Shared" DNS zone is actually per-namespace, but every K8s CR points at the same Azure ARM resource.

BuildSharedPrivateDNSZone(namespace, …) creates a PrivateDnsZone K8s resource in the Redis CR's namespace, while its ARM owner is /subscriptions/<sub>/resourceGroups/<DNSZoneResourceGroup> and its AzureName is privatelink.redis.azure.net. So team-a/ns-a and team-b/ns-b each get their own ASO PrivateDnsZone CR, but both manage the same Azure resource. The dis-vault-operator avoids this (no shared Azure resources) and dis-pgsql-operator avoids it by having unique per-DB zone names. Risks here are (a) ASO contention on the same Azure resource from multiple K8s CRs, and (b) if a deletion path is ever added, the first cleanup wipes the Azure resource still in use by other teams. If the intent is one Azure zone for the whole cluster, the K8s CR should live in a single fixed namespace (e.g. operator's own namespace) rather than being templated per-CR namespace.

// BuildSharedPrivateDNSZone returns the desired shared privatelink.redis.azure.net zone.
func BuildSharedPrivateDNSZone(namespace string, cfg config.OperatorConfig) *networkv1.PrivateDnsZone {
loc := "global"
return &networkv1.PrivateDnsZone{
ObjectMeta: metav1.ObjectMeta{
Name: RedisPrivateLinkZoneName,
Namespace: namespace,
Labels: map[string]string{
ManagedByLabel: ManagedByValue,
},
},
Spec: networkv1.PrivateDnsZone_Spec{
AzureName: RedisPrivateLinkZoneName,
Location: &loc,
Owner: &genruntime.KnownResourceReference{
ARMID: fmt.Sprintf("/subscriptions/%s/resourceGroups/%s", cfg.SubscriptionID, cfg.DNSZoneResourceGroup),
},
},
}
}
// BuildSharedVNetLink returns the desired AKS VNet link for the shared zone.
func BuildSharedVNetLink(namespace string, cfg config.OperatorConfig) *networkv1.PrivateDnsZonesVirtualNetworkLink {
loc := "global"
regFalse := false
linkName := SharedVNetLinkName(cfg.Environment)
return &networkv1.PrivateDnsZonesVirtualNetworkLink{
ObjectMeta: metav1.ObjectMeta{
Name: linkName,
Namespace: namespace,
Labels: map[string]string{
ManagedByLabel: ManagedByValue,
},
},
Spec: networkv1.PrivateDnsZonesVirtualNetworkLink_Spec{
AzureName: linkName,
Location: &loc,
RegistrationEnabled: &regFalse,
Owner: &genruntime.KnownResourceReference{
Name: RedisPrivateLinkZoneName,
},
VirtualNetwork: &networkv1.SubResource{
Reference: &genruntime.ResourceReference{
ARMID: cfg.AKSVNetID,
},
},
},
}
}

func (r *RedisReconciler) ensureSharedPrivateDNS(ctx context.Context, redisObj *redisv1alpha1.Redis) error {
logger := log.FromContext(ctx).WithValues("redis", types.NamespacedName{Namespace: redisObj.Namespace, Name: redisObj.Name})
if err := r.ensureSharedDNSZone(ctx, redisObj.Namespace, logger); err != nil {
return fmt.Errorf("ensure shared DNS zone: %w", err)
}
if err := r.ensureSharedVNetLink(ctx, redisObj.Namespace, logger); err != nil {
return fmt.Errorf("ensure shared VNet link: %w", err)
}
return nil
}

5. NIT — Dead code in builders.go.

PrivateDNSZoneGroupKubernetesName (declared, never called) and privateEndpointConnectionID = "redis-enterprise" (declared, never used; the live groupID is correctly "redisEnterprise" inline). The former dies once issue 1 is fixed; the latter should just be removed.

clusterKubernetesSuffix = "cluster"
databaseKubernetesSuffix = "db"
privateEndpointSuffix = "pe"
privateDNSZoneGroupSuffix = "pdzg"
dnsZoneVNetLinkBaseName = "aks-link"
privateEndpointConnectionID = "redis-enterprise"
)

// PrivateDNSZoneGroupKubernetesName returns the Kubernetes name used for the PrivateDnsZoneGroup CR.
func PrivateDNSZoneGroupKubernetesName(redisName string) string {
return DeterministicKubernetesName(redisName, privateDNSZoneGroupSuffix)
}


Comparison summary: identity resolution, status condition shape, deterministic naming, and the controller skeleton mirror dis-vault-operator cleanly. The PE wiring follows dis-pgsql-operator. The main divergence and risk area is the shared-DNS strategy described in the RFC — without zone-group binding plus a single-source-of-truth K8s namespace, the v1 networking story doesn't reach a usable cluster.

🤖 Generated with Claude Code

arealmaas and others added 2 commits May 20, 2026 15:34
Trivy image scan flagged two HIGH CVEs in the manager binary:
- CVE-2026-29181 in go.opentelemetry.io/otel v1.40.0 (fixed in 1.41.0)
- CVE-2026-39883 in go.opentelemetry.io/otel/sdk v1.40.0 (fixed in 1.43.0)

Align with dis-pgsql-operator and dis-vault-operator, which already
pin the OTel stack at v1.43.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…NS pattern

The PE was being created but no PrivateEndpointsPrivateDnsZoneGroup bound
it into the shared privatelink.redis.azure.net zone, so DNS resolution
from AKS to the cluster's private IP would never work. This adds the
missing per-CR DnsZoneGroup, owner-referenced to the Redis CR (cascades
on delete), pointing at the shared zone via ARM ID.

Also aligns the shared-DNS reconciliation with dis-pgsql-operator:
- Use SyncSpecAndLabels for in-place drift detection instead of
  create-only get-or-create.
- Tag shared resources with managed-by=dis-cache-operator.
- Wire the new zone group into getSharedDNSReady so PrivateDNSReady
  only flips true once the full DNS path (zone + link + zone group)
  is ready.
- Replace inline groupID literal with a named constant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
services/dis-cache-operator/internal/redis/builders.go (1)

278-284: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Use a fixed operator namespace for shared DNS ASO objects.

These builders model shared Azure resources, but they place the K8s ASO objects in the caller namespace. Across multiple Redis namespaces, this can create multiple ASO objects targeting the same ARM resource, risking reconciliation contention and ambiguous lifecycle semantics.

Also applies to: 302-310

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/dis-cache-operator/internal/redis/builders.go` around lines 278 -
284, Builders like BuildSharedPrivateDNSZone currently set the Kubernetes object
Namespace to the caller-provided namespace, causing duplicate ASO objects;
change these builders to use a fixed operator namespace from the OperatorConfig
(e.g., cfg.OperatorNamespace) instead of the function parameter namespace so
shared Azure resources are represented by a single ASO object; update
BuildSharedPrivateDNSZone and the other shared-resource builders referenced
around lines 302-310 to set ObjectMeta.Namespace = cfg.OperatorNamespace and
adjust any callers if necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/dis-cache-operator/internal/k8s/resource_sync.go`:
- Around line 12-23: SyncSpecAndLabels dereferences existingSpec unconditionally
which can panic if callers pass nil; add a nil guard at the start of
SyncSpecAndLabels: if existingSpec == nil { updated = true /* mark spec as
changed since no existing value */ } else { keep the existing
equality.Semantic.DeepEqual(*existingSpec, desiredSpec) branch and assign
*existingSpec = desiredSpec when different }. Ensure the rest of the function
(labels handling and returned map,bool) still executes so callers get the
expected label merge and update flag even when existingSpec is nil.

---

Outside diff comments:
In `@services/dis-cache-operator/internal/redis/builders.go`:
- Around line 278-284: Builders like BuildSharedPrivateDNSZone currently set the
Kubernetes object Namespace to the caller-provided namespace, causing duplicate
ASO objects; change these builders to use a fixed operator namespace from the
OperatorConfig (e.g., cfg.OperatorNamespace) instead of the function parameter
namespace so shared Azure resources are represented by a single ASO object;
update BuildSharedPrivateDNSZone and the other shared-resource builders
referenced around lines 302-310 to set ObjectMeta.Namespace =
cfg.OperatorNamespace and adjust any callers if necessary.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 28430fea-a542-4fe6-86c2-a9708e9cf99a

📥 Commits

Reviewing files that changed from the base of the PR and between 9f6e289 and bce97a9.

📒 Files selected for processing (7)
  • services/dis-cache-operator/config/rbac/role.yaml
  • services/dis-cache-operator/internal/controller/redis_controller.go
  • services/dis-cache-operator/internal/controller/redis_controller_network.go
  • services/dis-cache-operator/internal/k8s/resource_sync.go
  • services/dis-cache-operator/internal/redis/builders.go
  • services/dis-cache-operator/internal/redis/builders_test.go
  • services/dis-cache-operator/internal/redis/labels.go

Comment on lines +12 to +23
func SyncSpecAndLabels[S any](
existingSpec *S,
desiredSpec S,
existingLabels map[string]string,
desiredLabels map[string]string,
) (map[string]string, bool) {
updated := false

if !equality.Semantic.DeepEqual(*existingSpec, desiredSpec) {
*existingSpec = desiredSpec
updated = true
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a nil guard for existingSpec to avoid panic.

*existingSpec is dereferenced unconditionally at Line 20. Since this is an exported helper, a nil caller input will crash reconciliation paths unexpectedly.

Suggested patch
 func SyncSpecAndLabels[S any](
 	existingSpec *S,
 	desiredSpec S,
 	existingLabels map[string]string,
 	desiredLabels map[string]string,
 ) (map[string]string, bool) {
 	updated := false
+
+	if existingSpec == nil {
+		return existingLabels, false
+	}
 
 	if !equality.Semantic.DeepEqual(*existingSpec, desiredSpec) {
 		*existingSpec = desiredSpec
 		updated = true
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func SyncSpecAndLabels[S any](
existingSpec *S,
desiredSpec S,
existingLabels map[string]string,
desiredLabels map[string]string,
) (map[string]string, bool) {
updated := false
if !equality.Semantic.DeepEqual(*existingSpec, desiredSpec) {
*existingSpec = desiredSpec
updated = true
}
func SyncSpecAndLabels[S any](
existingSpec *S,
desiredSpec S,
existingLabels map[string]string,
desiredLabels map[string]string,
) (map[string]string, bool) {
updated := false
if existingSpec == nil {
return existingLabels, false
}
if !equality.Semantic.DeepEqual(*existingSpec, desiredSpec) {
*existingSpec = desiredSpec
updated = true
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/dis-cache-operator/internal/k8s/resource_sync.go` around lines 12 -
23, SyncSpecAndLabels dereferences existingSpec unconditionally which can panic
if callers pass nil; add a nil guard at the start of SyncSpecAndLabels: if
existingSpec == nil { updated = true /* mark spec as changed since no existing
value */ } else { keep the existing equality.Semantic.DeepEqual(*existingSpec,
desiredSpec) branch and assign *existingSpec = desiredSpec when different }.
Ensure the rest of the function (labels handling and returned map,bool) still
executes so callers get the expected label merge and update flag even when
existingSpec is nil.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants