Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions CODEBASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# wrapper: Codebase Reference

## 1. Purpose

Wrapper is the pack delivery engine for the ONT platform. It manages the lifecycle of pre-compiled OCI artifact deliveries (`InfrastructureClusterPack`) to target clusters: enforcing 6 delivery gates (gates 0-5) before submitting a `pack-deploy` Kueue Job, tracking delivered state via `InfrastructurePackInstance`, and managing drift visibility via `InfrastructurePackReceipt`. Wrapper does NOT compile packs (conductor/compiler), sign packs (conductor agent on management cluster), own RBAC governance (guardian), or manage cluster lifecycle (platform). It does not apply Helm or Kustomize at runtime.

Wrapper has NO own CRD type definitions. `api/v1alpha1/` contains only `.gitkeep`. All types consumed by wrapper (InfrastructureClusterPack, InfrastructurePackExecution, InfrastructurePackInstance, InfrastructurePackReceipt, PackOperationResult, DriftSignal) are defined in seam-core (Decision G).

---

## 2. Key Files and Locations

### Controllers (`internal/controller/`)

#### `packexecution_reconciler.go`

`PackExecutionReconciler` (L74 comment block, `Reconcile()` L121). Manages the 6-gate delivery pipeline.

**Gate check flow** (all gates at L175-417):

| Gate | Line | Condition | Blocks on |
|------|------|-----------|-----------|
| 0 | L176 | ConductorReady | `isConductorReadyForCluster()` L799 -- checks RunnerConfig in `ont-system` has `status.capabilities` non-empty |
| 1 | L221 | Signature | `ClusterPack.status.Signed=true` |
| 2 | L289 | Revocation | ClusterPack conditions Revoked != True |
| 3 | L306 | PermissionSnapshot | `isPermissionSnapshotCurrent()` L716 -- reads PermissionSnapshot via unstructured (no cross-operator type import) |
| 4 | L343 | RBACProfile | `isRBACProfileProvisioned()` L755 -- checks `provisioned=true` on the pack's RBACProfile |
| 5 | L378 | WrapperRunnerRBAC | `isWrapperRunnerRBACReady()` L849 -- SubjectAccessReview verifies wrapper-runner SA has required permissions |

`gateRequeueInterval = 30 * time.Second` (L61). Failing a gate sets `ConditionTypePackExecutionPending=True` with `ReasonGatesClearing` and returns `RequeueAfter: gateRequeueInterval`.

`RBACReadyChecker` type at L101: `func(ctx, *InfrastructurePackExecution) (bool, string, error)`. Production uses `isWrapperRunnerRBACReady`; test stub set via `r.RBACChecker` field (L107).

`findLatestPOR()` at L1162: lists all PackOperationResult CRs in namespace labeled with `packExecutionRef`, returns the one with highest `Spec.Revision`. Called at L466 to check completion status.

#### `clusterpack_reconciler.go`

`ClusterPackReconciler.Reconcile()` L67. Called on ClusterPack create/update.

`handleClusterPackDeletion()` L393: three steps + step 2.5:
1. L396: List all PackInstances cluster-wide, delete those where `spec.clusterPackRef == cp.Name`.
2. L415: List all PackExecutions cluster-wide, delete those where `spec.clusterPackRef.name == cp.Name`.
3. Step 2.5 (L434): Delete DriftSignal named `drift-{cp.Name}` in `seam-tenant-{clusterName}` for each target cluster.
4. L449: Remove finalizer `clusterPackFinalizer` so API server can delete the ClusterPack object.

`handleRollback()` L306: SSA-patches ClusterPack spec back to a previous version. Normal reconcile then creates PackExecution for the rolled-back version.

PackExecution creation (L230): for each cluster in `spec.targetClusters`, creates one PackExecution in `seam-tenant-{cluster}`. Skips if PackInstance with current version already exists (L243). Skips if PackExecution already exists (L258).

---

## 3. Primary Data Flows

**Pack deploy path**: ClusterPack created --> `ClusterPackReconciler` creates PackExecution in `seam-tenant-{cluster}` --> `PackExecutionReconciler` runs 6-gate check --> all gates pass --> Kueue Job (`pack-deploy`, `conductor-execute:dev` image) submitted --> conductor execute-mode `executeSplitPath()` applies RBAC + cluster-scoped + workload OCI layers --> writes PackOperationResult --> `PackExecutionReconciler` reads POR via `findLatestPOR()` L1162 --> creates PackInstance on management cluster.

**ClusterPack deletion path**: Finalizer prevents deletion --> `handleClusterPackDeletion()` L393 runs 3 steps (PackInstances, PackExecutions, DriftSignals) --> removes finalizer --> API server deletes ClusterPack object. Conductor `teardownOrphanedReceipt()` then cleans up deployed resources on the tenant cluster.

**Pack rollback**: `spec.rollbackToRevision` set on ClusterPack --> `handleRollback()` L306 patches spec --> `clearRollbackField()` L378 clears the field --> normal reconcile creates new PackExecution for rolled-back version.

**Single-active-revision (POR)**: `conductor/internal/persistence/operationresult_writer.go` writes POR with `Revision` incremented. Predecessor labeled `ontai.dev/superseded=true`, retained max 10. `findLatestPOR()` L1162 selects highest revision.

---

## 4. PackExecution naming and supersession

PackExecution name: `{packName}-{targetCluster}`. PackInstance name: `{basePackName}-{targetCluster}`. Same base name enables supersession: when a newer ClusterPack version arrives, the existing PackInstance is replaced in-place (same name, new content) rather than creating a new object. This is the upgrade path.

---

## 5. Invariants

| ID | Rule | Location |
|----|------|----------|
| CP-INV-010 | Kueue is not used for any operation in platform. Pack-deploy Jobs are the only Kueue Jobs in wrapper. | `packexecution_reconciler.go` |
| Decision G | Wrapper has no own CRD type definitions | `api/v1alpha1/.gitkeep` |

---

## 6. Open Items

**PLATFORM-BL-WRAPPER-RUNNER-RBAC-LIFECYCLE (platform)**: `ensureWrapperRunnerResources()` in `platform/internal/controller/taloscluster_helpers.go` creates wrapper-runner SA/Role/RoleBinding/ClusterRoleBinding at tenant onboarding. `handleTalosClusterDeletion()` does NOT delete `ClusterRoleBinding wrapper-runner-{cluster}`. This is a platform open item, not a wrapper open item.

**CLUSTERPACK-BL-VERSION-CLEANUP (conductor)**: `DeployedResources` field exists in `InfrastructurePackReceiptSpec` at `seam-core/api/v1alpha1/packreceipt_types.go:74`. When PackInstance version N+1 replaces N, resources present in N's PackReceipt but absent from N+1's manifests are NOT cleaned up. Version-upgrade orphan diff is absent from `conductor/internal/agent/packinstance_pull_loop.go`. No schema addition needed; only implementation missing.

---

## 7. Test Contract

| Package | Coverage |
|---------|----------|
| `test/unit/controller` | PackExecutionReconciler (all 6 gates, POR revision selection), ClusterPackReconciler (deletion, rollback) |
| `test/e2e` | Stub files; all skip when `MGMT_KUBECONFIG` absent; skip reasons reference backlog item IDs |
27 changes: 23 additions & 4 deletions docs/wrapper-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -310,10 +310,29 @@ Stateful defaults (require explicit human approval to override):

### 6.2 Rollback

PackExecution referencing a previous ClusterPack version. The previous version
must still be Available and not Revoked. Signing was already performed when the
version was first applied. Same diff engine and execution order apply. No special
reverse logic.
Rollback to any retained historical revision is triggered by setting
`spec.rollbackToRevision` on the ClusterPack CR to the target POR revision number.
N-step rollback is supported: any revision retained in the superseded POR history
is reachable in one operation.

**Mechanism:**

The POR writer retains superseded PORs by labeling them `ontai.dev/superseded=true`
instead of deleting them. Each superseded POR carries its original `clusterPackVersion`,
`rbacDigest`, and `workloadDigest` fields unchanged. Up to 10 superseded PORs are
retained per ClusterPack; the oldest (lowest revision) is pruned when the cap is exceeded.

When `spec.rollbackToRevision > 0`, the ClusterPackReconciler:
1. Lists ALL PORs labeled `ontai.dev/cluster-pack={cp.Name}` in the ClusterPack namespace (both active and superseded).
2. Finds the POR where `spec.revision == rollbackToRevision`. If not found, clears the field without patching spec.
3. Reads `clusterPackVersion`, `rbacDigest`, `workloadDigest` directly from that POR.
4. Patches `ClusterPack.spec`: sets `version`, `rbacDigest`, `workloadDigest` to the target values; clears `rollbackToRevision`.
5. Removes the `infrastructure.ontai.dev/spec-checksum-snapshot` annotation so the immutability check re-records the rolled-back state on the next reconcile pass.
6. Returns -- the version change triggers normal PackExecution creation.

The resulting PackExecution runs a normal pack-deploy Job against the target OCI
layer digests. The new POR records `upgradeDirection=Rollback`. The PackReceipt on
the tenant cluster is overwritten with the target version's resource inventory.

---

Expand Down
112 changes: 112 additions & 0 deletions internal/controller/clusterpack_reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,15 @@ func (r *ClusterPackReconciler) Reconcile(ctx context.Context, req ctrl.Request)
// Falling through ensures status conditions are set in this reconcile pass.
}

// Step A2 — Rollback check. Evaluated before the spec-snapshot annotation so that
// a Governor-authorized rollback (spec.rollbackToRevision > 0) can patch the spec
// and clear the annotation without triggering the immutability gate. On the next
// reconcile pass the annotation is absent, gets re-recorded from the rolled-back
// spec, and normal reconciliation proceeds. wrapper-schema.md §6.2.
if cp.Spec.RollbackToRevision > 0 {
return r.handleRollback(ctx, cp)
}

// Step B — Record spec snapshot annotation on first reconcile, BEFORE setting
// up the deferred status patch. This must happen first because calling
// r.Client.Patch() after the status patch setup would overwrite the in-memory
Expand Down Expand Up @@ -287,6 +296,94 @@ func (r *ClusterPackReconciler) Reconcile(ctx context.Context, req ctrl.Request)
return ctrl.Result{}, nil
}

// handleRollback processes a Governor-initiated rollback request (spec.rollbackToRevision > 0).
// It lists ALL PORs for this ClusterPack (both active and superseded), finds the one at
// spec.rollbackToRevision, reads its version/digest fields, patches the ClusterPack spec
// back to that artifact state, clears the spec-snapshot annotation (so the immutability check
// re-records the rolled-back state on the next reconcile), and clears rollbackToRevision.
// N-step rollback: any prior retained revision is reachable in one operation.
// wrapper-schema.md §6.2. seam-core-schema.md §7.8.
func (r *ClusterPackReconciler) handleRollback(ctx context.Context, cp *seamcorev1alpha1.InfrastructureClusterPack) (ctrl.Result, error) {
logger := log.FromContext(ctx)
targetRevision := cp.Spec.RollbackToRevision

// List ALL PORs for this ClusterPack: active and superseded.
porList := &seamcorev1alpha1.PackOperationResultList{}
if err := r.Client.List(ctx, porList,
client.InNamespace(cp.Namespace),
client.MatchingLabels{"ontai.dev/cluster-pack": cp.Name},
); err != nil {
return ctrl.Result{}, fmt.Errorf("handleRollback: list PORs for %s: %w", cp.Name, err)
}

if len(porList.Items) == 0 {
logger.Info("handleRollback: no POR found for ClusterPack — cannot rollback, clearing field",
"clusterPack", cp.Name, "namespace", cp.Namespace)
return r.clearRollbackField(ctx, cp)
}

// Find the POR at exactly the requested revision.
var targetPOR *seamcorev1alpha1.PackOperationResult
for i := range porList.Items {
if porList.Items[i].Spec.Revision == targetRevision {
targetPOR = &porList.Items[i]
break
}
}

if targetPOR == nil {
logger.Info("handleRollback: target revision not found in retained POR history — clearing field",
"clusterPack", cp.Name, "requestedRevision", targetRevision)
return r.clearRollbackField(ctx, cp)
}

targetVersion := targetPOR.Spec.ClusterPackVersion
targetRBACDigest := targetPOR.Spec.RBACDigest
targetWorkloadDigest := targetPOR.Spec.WorkloadDigest

if targetVersion == "" {
logger.Info("handleRollback: target POR has no version recorded — clearing field",
"clusterPack", cp.Name, "targetRevision", targetRevision)
return r.clearRollbackField(ctx, cp)
}

logger.Info("handleRollback: rolling back ClusterPack",
"clusterPack", cp.Name, "fromVersion", cp.Spec.Version,
"toVersion", targetVersion, "targetRevision", targetRevision)

// Patch spec back to target version + digests, clear rollbackToRevision,
// and remove the spec-snapshot annotation so the immutability check re-records
// the rolled-back state on the next reconcile pass.
const specSnapshotAnnotation = "infrastructure.ontai.dev/spec-checksum-snapshot"
patch := client.MergeFrom(cp.DeepCopy())
cp.Spec.Version = targetVersion
cp.Spec.RBACDigest = targetRBACDigest
cp.Spec.WorkloadDigest = targetWorkloadDigest
cp.Spec.RollbackToRevision = 0
if cp.Annotations != nil {
delete(cp.Annotations, specSnapshotAnnotation)
}
if err := r.Client.Patch(ctx, cp, patch); err != nil {
return ctrl.Result{}, fmt.Errorf("handleRollback: patch ClusterPack spec: %w", err)
}

r.Recorder.Eventf(cp, nil, corev1.EventTypeNormal, "RollbackApplied", "RollbackApplied",
"ClusterPack rolled back from %s to %s (POR revision %d).", cp.Spec.Version, targetVersion, targetRevision)
logger.Info("handleRollback: spec patched, normal reconcile will create PackExecution for rolled-back version",
"clusterPack", cp.Name, "version", targetVersion)
return ctrl.Result{}, nil
}

// clearRollbackField resets spec.rollbackToRevision to 0 when rollback cannot proceed.
func (r *ClusterPackReconciler) clearRollbackField(ctx context.Context, cp *seamcorev1alpha1.InfrastructureClusterPack) (ctrl.Result, error) {
patch := client.MergeFrom(cp.DeepCopy())
cp.Spec.RollbackToRevision = 0
if err := r.Client.Patch(ctx, cp, patch); err != nil {
return ctrl.Result{}, fmt.Errorf("clearRollbackField: patch ClusterPack: %w", err)
}
return ctrl.Result{}, nil
}

// handleClusterPackDeletion runs the cleanup sequence for a ClusterPack that has
// a non-zero DeletionTimestamp:
// 1. Delete all PackInstances whose spec.clusterPackRef matches cp.Name.
Expand Down Expand Up @@ -334,6 +431,21 @@ func (r *ClusterPackReconciler) handleClusterPackDeletion(ctx context.Context, c
"packExecution", pe.Name, "namespace", pe.Namespace, "clusterPack", cp.Name)
}

// 2.5. Delete DriftSignals for each target cluster.
// Convention: DriftSignal name = "drift-{cp.Name}", namespace = "seam-tenant-{clusterName}".
for _, clusterName := range cp.Spec.TargetClusters {
tenantNS := "seam-tenant-" + clusterName
signalName := "drift-" + cp.Name
signal := &seamcorev1alpha1.DriftSignal{
ObjectMeta: metav1.ObjectMeta{Name: signalName, Namespace: tenantNS},
}
if err := r.Client.Delete(ctx, signal); err != nil && !apierrors.IsNotFound(err) {
return ctrl.Result{}, fmt.Errorf("delete DriftSignal %s/%s: %w", tenantNS, signalName, err)
}
logger.Info("deleted DriftSignal during ClusterPack cleanup",
"driftSignal", signalName, "namespace", tenantNS, "clusterPack", cp.Name)
}

// 3. Remove the finalizer so the API server can delete the ClusterPack object.
cp.Finalizers = removeString(cp.Finalizers, clusterPackFinalizer)
if err := r.Client.Update(ctx, cp); err != nil {
Expand Down
Loading
Loading