feat: database secret by Logaka · Pull Request #447 · Netcracker/qubership-dbaas

Logaka · 2026-05-15T12:25:45Z

No description provided.

…abase-secret # Conflicts: # dbaas-operator/config/rbac/role.yaml # dbaas-operator/helm-templates/dbaas-operator/templates/ClusterRole.yaml # dbaas/dbaas-integration-tests/src/test/java/com/netcracker/it/dbaas/test/declarative/OperatorIT.java

kichasov · 2026-05-20T11:08:58Z

Outdated comment on connectionPropertiesToSecretData

databasesecret_controller.go:250:

// ... on the dbaas-aggregator side the type is List<Map<String,Object>>.

The endpoint used by this controller returns single-CP shape (map[string]any), not a list. Please update the comment to match. For example:

// The aggregator returns a single map (DatabaseResponseV3SingleCP.connectionProperties)
// for the requested userRole; the map shape is dynamic and dictated by the adapter.

kichasov · 2026-05-20T11:10:03Z

spec.classifier, spec.type, spec.secretName should be immutable

databasesecret_types.go:24–42 — none of the identity fields carry an XValidation immutability rule. Compare with ExternalDatabase, which marks classifier, type, and dbName with XValidation:rule="self == oldSelf".

Without immutability:

Changing secretName produces a new Secret next to the old one; consumers mounting the old name keep reading stale credentials.
Changing classifier or type makes the next reconcile query a different database in dbaas-aggregator; the same Secret then suddenly points at credentials for a different DB — consumers connect to the wrong system without any indication.

Both are silent-failure scenarios for the consuming microservices.

Suggested fix: add XValidation:rule="self == oldSelf" to all three fields, mirroring the EDB types:

// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="spec.classifier is immutable after creation"
Classifier Classifier `json:"classifier"`

// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="spec.type is immutable after creation"
Type string `json:"type"`

// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="spec.secretName is immutable after creation"
SecretName string `json:"secretName"`

kichasov · 2026-05-20T11:12:24Z

spec.userRole should be immutable too

databasesecret_types.go:34–36:

// userRole is the role/permission level for the generated credentials.
// +optional
UserRole string `json:"userRole,omitempty"`

Currently mutable. Changing it on an existing CR makes the next reconcile request credentials for a different DB user; the same Secret then suddenly carries different-role credentials, and the previous user is not invalidated by the operator. Same silent-failure shape as the other identity fields.

One CR per role is a cleaner contract — if a microservice needs a different role, create a new DatabaseSecret with a new secretName.

Suggested fix: add the immutability rule, consistent with the other spec fields:

// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="spec.userRole is immutable after creation"
// +optional
UserRole string `json:"userRole,omitempty"`

kichasov · 2026-05-20T11:14:26Z

Inaccurate godoc on DatabaseSecret — label enforcement is controller-side, not aggregator-side

databasesecret_types.go:56–63 says:

"CEL validation of metadata.labels at root schema level is not supported by controller-gen; enforcement is done through the aggregator error flow."

But the controller does a real pre-flight check at databasesecret_controller.go:91–96:

if s.Labels["app.kubernetes.io/name"] == "" {
    return invalidSpec(ctx, &s.Status.Phase, &s.Status.Conditions, s.Generation,
        r.Recorder, s,
        "label app.kubernetes.io/name is required — ...")
}

The aggregator is never reached when the label is missing — the controller short-circuits with InvalidSpec/InvalidConfiguration. The aggregator-error path is at most a defence-in-depth fallback.

Suggested fix: rewrite the sentence to reflect the actual enforcement:

// The label `app.kubernetes.io/name` is required — its value is sent as originService
// in the get-by-classifier request to dbaas-aggregator. CRD-level CEL validation of
// metadata.labels is not supported by controller-gen, so enforcement is done at the
// controller level via a pre-flight check that sets InvalidConfiguration/InvalidSpec.

kichasov · 2026-05-20T11:15:58Z

DatabaseNotReady (404) should use a constant requeue interval, not exponential backoff

databasesecret_controller.go:234–239:

case http.StatusNotFound:
    markTransientFailure(...)
    r.Recorder.Eventf(...)
    return ctrl.Result{}, err   // exponential backoff via rate limiter

Returning err engages controller-runtime's exponential rate limiter (5ms → 10ms → 20ms → … → 1000s). For a "wait for upstream provisioning" signal that's the wrong shape:

Early retries hammer the aggregator at sub-second intervals while the DB is still being provisioned.
Later retries get spaced out into minutes, so once the DB is actually ready, the consumer waits much longer than necessary.

The DatabaseDeclaration controller has the same pattern — waiting for an async provisioning to complete — and uses a constant interval pollRequeueAfter (~5s, declared in conditions.go).

Suggested fix: mirror the DD pattern for the 404 branch:

case http.StatusNotFound:
    markTransientFailure(...)
    r.Recorder.Eventf(...)
    return ctrl.Result{RequeueAfter: pollRequeueAfter}, nil

This gives predictable reaction time once the DB is ready and bounds the rate of 404 calls to the aggregator.

kichasov · 2026-05-20T11:19:30Z

Materialized Secret has no managed-by label

databasesecret_controller.go:163–179 — the Secret is created with only Name, Namespace, Data, and an OwnerReference. No labels.

Adding the standard app.kubernetes.io/managed-by: dbaas-operator label (plus optionally app.kubernetes.io/name carried over from the CR's own label, which is already required) would:

Make operator-managed Secrets kubectl get secret -l app.kubernetes.io/managed-by=dbaas-operator-filterable.
Conform to Kubernetes recommended labels convention.
Help platform tools (CD pipelines, GitOps drift detection) treat operator-managed Secrets correctly.

Suggested fix inside the CreateOrUpdate mutator:

_, err = controllerutil.CreateOrUpdate(ctx, r.Client, coreSecret, func() error {
    coreSecret.Data = secretData
    if coreSecret.Labels == nil {
        coreSecret.Labels = map[string]string{}
    }
    coreSecret.Labels["app.kubernetes.io/managed-by"] = "dbaas-operator"
    coreSecret.Labels["app.kubernetes.io/name"] = s.Labels["app.kubernetes.io/name"]
    return ctrl.SetControllerReference(s, coreSecret, r.Scheme)
})

kichasov · 2026-05-20T11:21:00Z

Sibling-name check is O(N) per reconcile — consider a field index

databasesecret_controller.go:115–126 lists every DatabaseSecret in the namespace on each reconcile and linearly scans for a matching spec.secretName:

var siblings dbaasv1.DatabaseSecretList
if err := r.List(ctx, &siblings, client.InNamespace(s.Namespace)); err != nil {
    return ctrl.Result{}, err
}
for i := range siblings.Items {
    if siblings.Items[i].UID != s.UID && siblings.Items[i].Spec.SecretName == s.Spec.SecretName {
        return r.markSecretConflict(...)
    }
}

On namespaces with hundreds of DatabaseSecrets this scales linearly per reconcile. The EDB controller has the same shape of lookup (Secret → EDBs) and uses a field index for O(1) resolution — same pattern would fit here.

Suggested fix: register a field index on spec.secretName at startup:

const secretNameIndex = "spec.secretName"

if err := mgr.GetFieldIndexer().IndexField(ctx, &dbaasv1.DatabaseSecret{}, secretNameIndex,
    func(obj client.Object) []string {
        return []string{obj.(*dbaasv1.DatabaseSecret).Spec.SecretName}
    }); err != nil { return err }

Then in reconcile:

r.List(ctx, &siblings,
    client.InNamespace(s.Namespace),
    client.MatchingFields{secretNameIndex: s.Spec.SecretName})

Cheap optimization; safe to defer until scale becomes a concern.

# Conflicts: # dbaas/dbaas-integration-tests/src/test/java/com/netcracker/it/dbaas/test/declarative/OperatorIT.java

…abase-secret # Conflicts: # dbaas/dbaas-integration-tests/src/test/java/com/netcracker/it/dbaas/test/declarative/OperatorIT.java

kichasov · 2026-05-22T12:55:25Z

Empty connectionProperties should be treated as transient, not permanent

databasesecret_controller.go:153–158:

if len(dbResp.ConnectionProperties) == 0 {
    return invalidSpec(ctx, &s.Status.Phase, &s.Status.Conditions, s.Generation,
        r.Recorder, s,
        fmt.Sprintf("aggregator returned empty connectionProperties for type=%s", s.Spec.Type))
}

invalidSpec → markPermanentFailure → InvalidConfiguration / Stalled=True, so the CR stops retrying until the user touches the spec.

In practice this branch is unreachable for a valid 200 response: the aggregator constructs DatabaseResponseV3SingleCP via ConnectionPropertiesUtils.getConnectionProperties(...) which throws NotExistingConnectionPropertiesException if the requested role has no matching entry — that becomes a non-200 HTTP response, not a 200 with an empty map. Any path that returns 200 here goes through the same constructor.

Defensive check is still worth keeping — if the aggregator ever regresses, we want the controller to notice — but the diagnosis is then "aggregator returned an unexpected shape", not "the user's spec is wrong". The user has no spec change they can make to fix it; only a controller restart or an aggregator fix will help. That's the textbook transient signal.

Suggestion: swap invalidSpec for a transient failure with backoff. Reuse the existing aggregator-error machinery so dashboards/alerts pick it up the same way as any other 5xx-class symptom:

if len(dbResp.ConnectionProperties) == 0 {
    msg := fmt.Sprintf("aggregator returned empty connectionProperties for type=%s — unexpected response shape, retrying", s.Spec.Type)
    markTransientFailure( & s.Status.Phase, & s.Status.Conditions, s.Generation,
        EventReasonAggregatorError, msg)
    r.Recorder.Eventf(s, corev1.EventTypeWarning, EventReasonAggregatorError,
        "%s (requestId=%s)", msg, requestID)
    return ctrl.Result{RequeueAfter: pollRequeueAfter}, nil
}

kichasov · 2026-05-22T12:57:37Z

Symmetric sibling-name conflict deadlocks both DatabaseSecret CRs

When two CRs are created with the same spec.secretName close in time, both see each other in the sibling check at databasesecret_controller.go:134-140 and both call markSecretConflict → both end up in InvalidConfiguration / Stalled=True.

User deletes one of them to resolve the conflict. The other one stays stuck in InvalidConfiguration forever:

Deletion of the sibling doesn't trigger a reconcile on the survivor (controller only watches the CR itself and NamespaceBinding).
The survivor's own spec hasn't changed, so GenerationChangedPredicate won't fire.
User has to manually bump an annotation on the survivor to make the controller notice the conflict is gone.

Suggested fix: watch DatabaseSecret deletes and re-enqueue siblings sharing the same secretName via the existing field index:

Watches(&dbaasv1.DatabaseSecret{},
    handler.EnqueueRequestsFromMapFunc(r.enqueueSiblingsBySecretName),
    builder.WithPredicates(predicate.Funcs{
        CreateFunc:  func(_ event.CreateEvent) bool { return false },
        UpdateFunc:  func(_ event.UpdateEvent) bool { return false },
        DeleteFunc:  func(_ event.DeleteEvent) bool { return true },
        GenericFunc: func(_ event.GenericEvent) bool { return false },
    }))

with the mapping reusing secretNameIndex to find peers in the same namespace.

kichasov · 2026-05-22T12:59:11Z

Step 9.4 IsNotFound success-path skips markSucceeded

databasesecret_controller.go:221–231:

case apierrors.IsNotFound(err):
    ...
    if createErr := r.Create(ctx, recreated); createErr != nil && !apierrors.IsAlreadyExists(createErr) {
        return ctrl.Result{}, createErr
    }
    return ctrl.Result{Requeue: true}, nil

When r.Create actually succeeds (createErr == nil), the code still returns Requeue: true without calling markSucceeded. The CR stays in Processing until the next reconcile. The "let next reconcile confirm" comment only applies to the AlreadyExists race; on plain success the status should be updated now.

Suggested fix: split the two cases:

case apierrors.IsNotFound(err):
    recreated, buildErr := r.buildOwnedSecret(s, secretData)
    if buildErr != nil {
        return ctrl.Result{}, buildErr
    }
    createErr := r.Create(ctx, recreated)
    if createErr == nil {
        markSucceeded(&s.Status.Phase, &s.Status.Conditions, s.Generation, EventReasonSecretCreated)
        r.Recorder.Eventf(s, corev1.EventTypeNormal, EventReasonSecretCreated,
            "Secret %q re-created after race (requestId=%s)", s.Spec.SecretName, requestID)
        return ctrl.Result{}, nil
    }
    if !apierrors.IsAlreadyExists(createErr) {
        return ctrl.Result{}, createErr
    }
    return ctrl.Result{Requeue: true}, nil   // AlreadyExists race — let next reconcile confirm

kichasov · 2026-05-22T13:00:41Z

Step 9.2 race recovery discards already-fetched dbResp

databasesecret_controller.go:186–194:

if apierrors.IsNotFound(err) {
    // race: deleted after AlreadyExists → retry create
    return ctrl.Result{Requeue: true}, nil
}

When the Secret is deleted between our Create→AlreadyExists and the Get, we requeue. But newSecret (built from dbResp) is still valid in scope — we can retry Create immediately on the same data and avoid another GetDatabaseByClassifier round-trip on the next reconcile.

Suggested fix:

if apierrors.IsNotFound(err) {
    // Race: someone deleted the Secret after our AlreadyExists. Retry Create
    // with the data we already have — no need for another aggregator call.
    if createErr := r.Create(ctx, newSecret); createErr == nil {
        markSucceeded(&s.Status.Phase, &s.Status.Conditions, s.Generation, EventReasonSecretCreated)
        r.Recorder.Eventf(s, corev1.EventTypeNormal, EventReasonSecretCreated,
            "Secret %q created after race (requestId=%s)", s.Spec.SecretName, requestID)
        return ctrl.Result{}, nil
    } else if !apierrors.IsAlreadyExists(createErr) {
        return ctrl.Result{}, createErr
    }
    // Still AlreadyExists — let next reconcile sort it out.
    return ctrl.Result{Requeue: true}, nil
}

Cold-path optimization; not a bug.

kichasov · 2026-05-22T13:06:52Z

DatabaseNotFound (404) polls forever with no deadline signal

databasesecret_controller.go:282–288:

if errors.As(err, &aggErr) && aggErr.IsDatabaseNotFound() {
    markTransientFailure(&s.Status.Phase, &s.Status.Conditions, s.Generation,
        EventReasonDatabaseNotFound, aggErr.UserMessage())
    r.Recorder.Eventf(s, corev1.EventTypeWarning, EventReasonDatabaseNotFound,
        "database not found in dbaas-aggregator, waiting for provisioning (requestId=%s)", requestID)
    return ctrl.Result{RequeueAfter: pollRequeueAfter}, nil
}

The wait-for-provisioning intent is correct — a DatabaseSecret may be applied before its DatabaseDeclaration finishes. But there's no upper bound: a CR with a typo in classifier (wrong microserviceName, wrong scope, declaration in another namespace) will requeue every pollRequeueAfter forever, emitting a Warning event each cycle, with no Stalled=True transition and nothing alertable on a dashboard.

Suggested fix (status-only, minimally invasive): record firstNotFoundAt timestamp in Status on the first 404, and after a threshold (e.g. 10 min) flip Stalled=True with reason DatabaseNotFoundTimeout. Keep requeueing (the aggregator still owns the operation), but stop emitting per-cycle Warning events once stalled. Operators get a detectable stuck-state signal without changing the happy-path semantics.

Severity: Low — not a correctness bug, but typos in classifier are operationally invisible today.

…ansient DatabaseSecret previously failed permanently (InvalidConfiguration / InvalidSpec) when dbaas-aggregator returned HTTP 200 with an empty connectionProperties map. This response is not produced by a healthy aggregator+adapter pair — the aggregator throws NotExistingConnectionPropertiesException on missing role rather than returning an empty payload — so when it does occur it indicates a transient inconsistency in the adapter or role registry, not a user spec error. Switch to BackingOff with a new EmptyConnectionProperties event reason and requeue with pollRequeueAfter, mirroring how we already treat 5xx and DatabaseNotFound responses. The user's spec remains valid; the operator retries until the adapter populates the payload. Refs PR #447 review comment F2.

…onflicts When two DatabaseSecret CRs claimed the same spec.secretName before either reconciled, both moved to InvalidConfiguration/SecretConflict symmetrically. Deleting one did not wake the other (its own spec hadn't changed), so the survivor stayed deadlocked. Two changes: 1. Tiebreak by creationTimestamp with UID lexical fallback. Only the younger claimant moves to SecretConflict; the older claimant proceeds normally. The UID fallback keeps the result stable when peers race within the same second of creationTimestamp resolution. 2. Add a Watches on DatabaseSecret in SetupWithManager that re-enqueues every sibling sharing spec.secretName whenever any DatabaseSecret in the namespace is created, deleted, or has a spec change. A CR sitting in SecretConflict now recovers automatically once the older claimant is removed or rebound to a different secret name. GenerationChangedPredicate filters out status-only updates to keep the fan-out quiet. Unit tests cover older-wins, recovery-after-winner-deletion, and the fan-out map func. The existing IT testDatabaseSecretSharedSecretName is updated: the older CR must remain unaffected by the younger sibling. Refs PR #447 review comment F3.

… post-GC Step 9.4 of the DatabaseSecret reconciler handles the race where the target Secret is deleted between the Update's preceding Get and the Update itself (IsNotFound). The recovery path recreates the Secret but always returned Requeue=true without marking the CR succeeded, even when the Create call returned nil — meaning we just wrote the correct content and could confirm immediately. The next reconcile would refetch from the aggregator just to verify a write we already knew was good, flipping the CR's phase through Processing again. Split the post-Create branches: on nil, mark succeeded and emit the SecretCreated event; on AlreadyExists, keep the existing requeue (we don't know whose content won the race); on any other error, bubble up. Refs PR #447 review comment F4.

The DatabaseSecret reconciler's Step 9.2 handles the race where the target Secret disappears between r.Create returning AlreadyExists and the subsequent r.Get to verify ownership. Previously this returned Requeue=true, forcing the next reconcile to repeat the whole pipeline — namespace ownership, sibling check, aggregator round-trip, Secret build — just to end up back at the same Create call. newSecret is still valid in the current scope, so retry Create inline: on nil, mark succeeded; on AlreadyExists (double race), keep the Requeue; on any other error, bubble up. Saves one aggregator call on a rare race recovery path. Refs PR #447 review comment F5.

A DatabaseSecret whose classifier had a typo or referenced a database in another namespace would sit in BackingOff/DatabaseNotFound forever, producing a Warning event on every reconcile cycle and giving SRE no alertable signal that the wait had become hopeless. Add a databaseNotFoundTimeout (10 min) and a Status.firstNotFoundAt timestamp that records the start of the current NotFound streak. On the first reconcile that finds the streak past the threshold: - emit a one-shot DatabaseNotFoundTimeout Warning event, then - switch the Ready condition reason to DatabaseNotFoundTimeout so the condition itself acts as the "already-notified" marker and no further per-cycle Warnings are emitted. Polling continues — Phase stays BackingOff, Stalled stays False — so a late-arriving database still unsticks the CR automatically. The marker is cleared on any successful aggregator response or on a non-NotFound error so a fresh streak starts a fresh timer. Unit tests cover the threshold crossing, suppressed re-emit on subsequent reconciles past the threshold, and clean recovery after the database finally appears. CRD manifest and Helm CRD template are synced with the new status field. Refs PR #447 review comment F6.

Logaka added 2 commits May 15, 2026 16:48

feat: database secret implementation

43b261c

Logaka requested a review from ArkuNC as a code owner May 15, 2026 12:25

Logaka self-assigned this May 15, 2026

Logaka added 3 commits May 21, 2026 17:04

feat: change secret creation behavior

93294a9

Merge branch 'feat/operator-dev' into feat/database-secret

08b19e9

# Conflicts: # dbaas/dbaas-integration-tests/src/test/java/com/netcracker/it/dbaas/test/declarative/OperatorIT.java

feat: add DatabaseSecret to NamespaceBinding watches

6c0c3b4

Logaka temporarily deployed to pages May 21, 2026 18:44 — with GitHub Actions Inactive

Merge remote-tracking branch 'origin/feat/operator-dev' into feat/dat…

3a545c2

…abase-secret # Conflicts: # dbaas/dbaas-integration-tests/src/test/java/com/netcracker/it/dbaas/test/declarative/OperatorIT.java

kichasov deleted the branch feat/operator-dev May 22, 2026 10:56

kichasov closed this May 22, 2026

kichasov reopened this May 22, 2026

kichasov added 5 commits May 22, 2026 16:40

kichasov merged commit 664e674 into feat/operator-dev May 22, 2026
12 checks passed

kichasov deleted the feat/database-secret branch May 22, 2026 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: database secret#447

feat: database secret#447
kichasov merged 11 commits into
feat/operator-devfrom
feat/database-secret

Logaka commented May 15, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 22, 2026

Uh oh!

kichasov commented May 22, 2026

Uh oh!

kichasov commented May 22, 2026

Uh oh!

kichasov commented May 22, 2026

Uh oh!

kichasov commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Logaka commented May 15, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 20, 2026

Uh oh!

kichasov commented May 22, 2026

Uh oh!

kichasov commented May 22, 2026

Uh oh!

kichasov commented May 22, 2026

Uh oh!

kichasov commented May 22, 2026

Uh oh!

kichasov commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants