feat: database secret#447
Conversation
…abase-secret # Conflicts: # dbaas-operator/config/rbac/role.yaml # dbaas-operator/helm-templates/dbaas-operator/templates/ClusterRole.yaml # dbaas/dbaas-integration-tests/src/test/java/com/netcracker/it/dbaas/test/declarative/OperatorIT.java
|
Outdated comment on
// ... on the dbaas-aggregator side the type is List<Map<String,Object>>.The endpoint used by this controller returns single-CP shape ( // The aggregator returns a single map (DatabaseResponseV3SingleCP.connectionProperties)
// for the requested userRole; the map shape is dynamic and dictated by the adapter. |
|
Without immutability:
Both are silent-failure scenarios for the consuming microservices. Suggested fix: add // +kubebuilder:validation:XValidation:rule="self == oldSelf",message="spec.classifier is immutable after creation"
Classifier Classifier `json:"classifier"`
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="spec.type is immutable after creation"
Type string `json:"type"`
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="spec.secretName is immutable after creation"
SecretName string `json:"secretName"` |
|
// userRole is the role/permission level for the generated credentials.
// +optional
UserRole string `json:"userRole,omitempty"`Currently mutable. Changing it on an existing CR makes the next reconcile request credentials for a different DB user; the same Secret then suddenly carries different-role credentials, and the previous user is not invalidated by the operator. Same silent-failure shape as the other identity fields. One CR per role is a cleaner contract — if a microservice needs a different role, create a new Suggested fix: add the immutability rule, consistent with the other spec fields: // +kubebuilder:validation:XValidation:rule="self == oldSelf",message="spec.userRole is immutable after creation"
// +optional
UserRole string `json:"userRole,omitempty"` |
|
Inaccurate godoc on
But the controller does a real pre-flight check at if s.Labels["app.kubernetes.io/name"] == "" {
return invalidSpec(ctx, &s.Status.Phase, &s.Status.Conditions, s.Generation,
r.Recorder, s,
"label app.kubernetes.io/name is required — ...")
}The aggregator is never reached when the label is missing — the controller short-circuits with Suggested fix: rewrite the sentence to reflect the actual enforcement: // The label `app.kubernetes.io/name` is required — its value is sent as originService
// in the get-by-classifier request to dbaas-aggregator. CRD-level CEL validation of
// metadata.labels is not supported by controller-gen, so enforcement is done at the
// controller level via a pre-flight check that sets InvalidConfiguration/InvalidSpec. |
|
case http.StatusNotFound:
markTransientFailure(...)
r.Recorder.Eventf(...)
return ctrl.Result{}, err // exponential backoff via rate limiterReturning
The DatabaseDeclaration controller has the same pattern — waiting for an async provisioning to complete — and uses a constant interval Suggested fix: mirror the DD pattern for the 404 branch: case http.StatusNotFound:
markTransientFailure(...)
r.Recorder.Eventf(...)
return ctrl.Result{RequeueAfter: pollRequeueAfter}, nilThis gives predictable reaction time once the DB is ready and bounds the rate of 404 calls to the aggregator. |
|
Materialized Secret has no
Adding the standard
Suggested fix inside the _, err = controllerutil.CreateOrUpdate(ctx, r.Client, coreSecret, func() error {
coreSecret.Data = secretData
if coreSecret.Labels == nil {
coreSecret.Labels = map[string]string{}
}
coreSecret.Labels["app.kubernetes.io/managed-by"] = "dbaas-operator"
coreSecret.Labels["app.kubernetes.io/name"] = s.Labels["app.kubernetes.io/name"]
return ctrl.SetControllerReference(s, coreSecret, r.Scheme)
}) |
|
Sibling-name check is O(N) per reconcile — consider a field index
var siblings dbaasv1.DatabaseSecretList
if err := r.List(ctx, &siblings, client.InNamespace(s.Namespace)); err != nil {
return ctrl.Result{}, err
}
for i := range siblings.Items {
if siblings.Items[i].UID != s.UID && siblings.Items[i].Spec.SecretName == s.Spec.SecretName {
return r.markSecretConflict(...)
}
}On namespaces with hundreds of Suggested fix: register a field index on const secretNameIndex = "spec.secretName"
if err := mgr.GetFieldIndexer().IndexField(ctx, &dbaasv1.DatabaseSecret{}, secretNameIndex,
func(obj client.Object) []string {
return []string{obj.(*dbaasv1.DatabaseSecret).Spec.SecretName}
}); err != nil { return err }Then in reconcile: r.List(ctx, &siblings,
client.InNamespace(s.Namespace),
client.MatchingFields{secretNameIndex: s.Spec.SecretName})Cheap optimization; safe to defer until scale becomes a concern. |
# Conflicts: # dbaas/dbaas-integration-tests/src/test/java/com/netcracker/it/dbaas/test/declarative/OperatorIT.java
…abase-secret # Conflicts: # dbaas/dbaas-integration-tests/src/test/java/com/netcracker/it/dbaas/test/declarative/OperatorIT.java
|
Empty
if len(dbResp.ConnectionProperties) == 0 {
return invalidSpec(ctx, &s.Status.Phase, &s.Status.Conditions, s.Generation,
r.Recorder, s,
fmt.Sprintf("aggregator returned empty connectionProperties for type=%s", s.Spec.Type))
}
In practice this branch is unreachable for a valid 200 response: the aggregator constructs Defensive check is still worth keeping — if the aggregator ever regresses, we want the controller to notice — but the diagnosis is then "aggregator returned an unexpected shape", not "the user's spec is wrong". The user has no spec change they can make to fix it; only a controller restart or an aggregator fix will help. That's the textbook transient signal. Suggestion: swap if len(dbResp.ConnectionProperties) == 0 {
msg := fmt.Sprintf("aggregator returned empty connectionProperties for type=%s — unexpected response shape, retrying", s.Spec.Type)
markTransientFailure( & s.Status.Phase, & s.Status.Conditions, s.Generation,
EventReasonAggregatorError, msg)
r.Recorder.Eventf(s, corev1.EventTypeWarning, EventReasonAggregatorError,
"%s (requestId=%s)", msg, requestID)
return ctrl.Result{RequeueAfter: pollRequeueAfter}, nil
} |
|
Symmetric sibling-name conflict deadlocks both DatabaseSecret CRs When two CRs are created with the same User deletes one of them to resolve the conflict. The other one stays stuck in
Suggested fix: watch Watches(&dbaasv1.DatabaseSecret{},
handler.EnqueueRequestsFromMapFunc(r.enqueueSiblingsBySecretName),
builder.WithPredicates(predicate.Funcs{
CreateFunc: func(_ event.CreateEvent) bool { return false },
UpdateFunc: func(_ event.UpdateEvent) bool { return false },
DeleteFunc: func(_ event.DeleteEvent) bool { return true },
GenericFunc: func(_ event.GenericEvent) bool { return false },
}))with the mapping reusing |
|
Step 9.4
case apierrors.IsNotFound(err):
...
if createErr := r.Create(ctx, recreated); createErr != nil && !apierrors.IsAlreadyExists(createErr) {
return ctrl.Result{}, createErr
}
return ctrl.Result{Requeue: true}, nilWhen Suggested fix: split the two cases: case apierrors.IsNotFound(err):
recreated, buildErr := r.buildOwnedSecret(s, secretData)
if buildErr != nil {
return ctrl.Result{}, buildErr
}
createErr := r.Create(ctx, recreated)
if createErr == nil {
markSucceeded(&s.Status.Phase, &s.Status.Conditions, s.Generation, EventReasonSecretCreated)
r.Recorder.Eventf(s, corev1.EventTypeNormal, EventReasonSecretCreated,
"Secret %q re-created after race (requestId=%s)", s.Spec.SecretName, requestID)
return ctrl.Result{}, nil
}
if !apierrors.IsAlreadyExists(createErr) {
return ctrl.Result{}, createErr
}
return ctrl.Result{Requeue: true}, nil // AlreadyExists race — let next reconcile confirm |
|
Step 9.2 race recovery discards already-fetched
if apierrors.IsNotFound(err) {
// race: deleted after AlreadyExists → retry create
return ctrl.Result{Requeue: true}, nil
}When the Secret is deleted between our Suggested fix: if apierrors.IsNotFound(err) {
// Race: someone deleted the Secret after our AlreadyExists. Retry Create
// with the data we already have — no need for another aggregator call.
if createErr := r.Create(ctx, newSecret); createErr == nil {
markSucceeded(&s.Status.Phase, &s.Status.Conditions, s.Generation, EventReasonSecretCreated)
r.Recorder.Eventf(s, corev1.EventTypeNormal, EventReasonSecretCreated,
"Secret %q created after race (requestId=%s)", s.Spec.SecretName, requestID)
return ctrl.Result{}, nil
} else if !apierrors.IsAlreadyExists(createErr) {
return ctrl.Result{}, createErr
}
// Still AlreadyExists — let next reconcile sort it out.
return ctrl.Result{Requeue: true}, nil
}Cold-path optimization; not a bug. |
|
if errors.As(err, &aggErr) && aggErr.IsDatabaseNotFound() {
markTransientFailure(&s.Status.Phase, &s.Status.Conditions, s.Generation,
EventReasonDatabaseNotFound, aggErr.UserMessage())
r.Recorder.Eventf(s, corev1.EventTypeWarning, EventReasonDatabaseNotFound,
"database not found in dbaas-aggregator, waiting for provisioning (requestId=%s)", requestID)
return ctrl.Result{RequeueAfter: pollRequeueAfter}, nil
}The wait-for-provisioning intent is correct — a Suggested fix (status-only, minimally invasive): record Severity: Low — not a correctness bug, but typos in |
…ansient DatabaseSecret previously failed permanently (InvalidConfiguration / InvalidSpec) when dbaas-aggregator returned HTTP 200 with an empty connectionProperties map. This response is not produced by a healthy aggregator+adapter pair — the aggregator throws NotExistingConnectionPropertiesException on missing role rather than returning an empty payload — so when it does occur it indicates a transient inconsistency in the adapter or role registry, not a user spec error. Switch to BackingOff with a new EmptyConnectionProperties event reason and requeue with pollRequeueAfter, mirroring how we already treat 5xx and DatabaseNotFound responses. The user's spec remains valid; the operator retries until the adapter populates the payload. Refs PR #447 review comment F2.
…onflicts When two DatabaseSecret CRs claimed the same spec.secretName before either reconciled, both moved to InvalidConfiguration/SecretConflict symmetrically. Deleting one did not wake the other (its own spec hadn't changed), so the survivor stayed deadlocked. Two changes: 1. Tiebreak by creationTimestamp with UID lexical fallback. Only the younger claimant moves to SecretConflict; the older claimant proceeds normally. The UID fallback keeps the result stable when peers race within the same second of creationTimestamp resolution. 2. Add a Watches on DatabaseSecret in SetupWithManager that re-enqueues every sibling sharing spec.secretName whenever any DatabaseSecret in the namespace is created, deleted, or has a spec change. A CR sitting in SecretConflict now recovers automatically once the older claimant is removed or rebound to a different secret name. GenerationChangedPredicate filters out status-only updates to keep the fan-out quiet. Unit tests cover older-wins, recovery-after-winner-deletion, and the fan-out map func. The existing IT testDatabaseSecretSharedSecretName is updated: the older CR must remain unaffected by the younger sibling. Refs PR #447 review comment F3.
… post-GC Step 9.4 of the DatabaseSecret reconciler handles the race where the target Secret is deleted between the Update's preceding Get and the Update itself (IsNotFound). The recovery path recreates the Secret but always returned Requeue=true without marking the CR succeeded, even when the Create call returned nil — meaning we just wrote the correct content and could confirm immediately. The next reconcile would refetch from the aggregator just to verify a write we already knew was good, flipping the CR's phase through Processing again. Split the post-Create branches: on nil, mark succeeded and emit the SecretCreated event; on AlreadyExists, keep the existing requeue (we don't know whose content won the race); on any other error, bubble up. Refs PR #447 review comment F4.
The DatabaseSecret reconciler's Step 9.2 handles the race where the target Secret disappears between r.Create returning AlreadyExists and the subsequent r.Get to verify ownership. Previously this returned Requeue=true, forcing the next reconcile to repeat the whole pipeline — namespace ownership, sibling check, aggregator round-trip, Secret build — just to end up back at the same Create call. newSecret is still valid in the current scope, so retry Create inline: on nil, mark succeeded; on AlreadyExists (double race), keep the Requeue; on any other error, bubble up. Saves one aggregator call on a rare race recovery path. Refs PR #447 review comment F5.
A DatabaseSecret whose classifier had a typo or referenced a database in another namespace would sit in BackingOff/DatabaseNotFound forever, producing a Warning event on every reconcile cycle and giving SRE no alertable signal that the wait had become hopeless. Add a databaseNotFoundTimeout (10 min) and a Status.firstNotFoundAt timestamp that records the start of the current NotFound streak. On the first reconcile that finds the streak past the threshold: - emit a one-shot DatabaseNotFoundTimeout Warning event, then - switch the Ready condition reason to DatabaseNotFoundTimeout so the condition itself acts as the "already-notified" marker and no further per-cycle Warnings are emitted. Polling continues — Phase stays BackingOff, Stalled stays False — so a late-arriving database still unsticks the CR automatically. The marker is cleared on any successful aggregator response or on a non-NotFound error so a fresh streak starts a fresh timer. Unit tests cover the threshold crossing, suppressed re-emit on subsequent reconciles past the threshold, and clean recovery after the database finally appears. CRD manifest and Helm CRD template are synced with the new status field. Refs PR #447 review comment F6.
No description provided.