feat(guardian): RECON tracks C/F/G/H/I/J -- KUBEBUILDER guard, failurePolicy split, lineage archiver, scoped OperatorContext#22
Merged
Conversation
… seam-core Adds replace directive for github.com/ontai-dev/platform. Updates cluster_rbacpolicy_controller to use platformseamv1alpha1.TalosCluster from platform/api/seam/v1alpha1. Registers platform scheme in main.go. All unit tests updated to use the platform types.
Replace seam-core -> seam in go.mod replace/require. Update all Go import paths from github.com/ontai-dev/seam-core/ to github.com/ontai-dev/seam/. Add seam-sdk replace + require.
Replace ../seam-core with ../seam following the seam-core -> seam filesystem rename. Module path github.com/ontai-dev/seam was already updated in Phase 4; this aligns the local path pointer.
… Guardian singleton - Remove Guardian singleton CRD and types (Guardian, GuardianSpec, GuardianStatus, GuardianList) - Remove setCNPGCondition and simplify RunWithRetry to 2-arg form (no kube client) - Remove Scheme/Recorder/OperatorNamespace from BootstrapController; replace Guardian CR condition writes with in-memory WebhookModeGate and NamespaceEnforcementRegistry - Fix one-way ratchet: return early after Initialising->ObserveOnly transition so Enforcing check only runs in the next reconcile - Rename 7 CRD YAML files from security.ontai.dev_*.yaml to guardian.ontai.dev_*.yaml - Update groupversion_info.go: group annotation and GroupVersion.Group to guardian.ontai.dev - Update all finalizer constants, kubebuilder markers, GVR/GVK Group fields, apiVersion strings in unstructured objects across all guardian packages
Fresh documentation from current codebase. security.ontai.dev replaced with guardian.ontai.dev throughout. Guardian singleton CR removed -- Deployment readiness is the health signal. security-system namespace removed. seam-core references replaced with seam. wrapper replaced with dispatcher. LineageRecord and SeamMembership removed from guardian CRD table (owned by seam).
…gs and tests; fix runner.ontai.dev -> seam.ontai.dev in epg_controller; add seam-sdk/platform checkouts to CI
…ions; drop unused metav1 import
Extends MigrationRunner with 6 new migrations (006-011) that create the domain memory tables on management CNPG: governance_events, identity_resolution_events, snapshot_distribution_events, receipt_events, lineage_archive, lineage_sdns (with FK to lineage_archive). All migrations are idempotent (CREATE TABLE IF NOT EXISTS). T-WI4-8 complete.
- domain_memory.go: nil-safe write helpers for all four event types
- IdentityBindingReconciler: writes governance_events on every reconcile, identity_resolution_events on trust anchor resolution
- IdentityProviderReconciler: writes governance_events on every reconcile, identity_resolution_events when validation passes
- EPGReconciler: writes snapshot_distribution_events per PermissionSnapshot upsert
- TenantSnapshotRunnable: writes receipt_events on new PermissionSnapshotReceipt creation
- DomainMemoryWriter field is optional (nil-safe); failures are discarded to never block reconciliation
- clusterFromNamespaceDM derives cluster context from seam-tenant-{cluster} namespace prefix
- Unit tests: domain_memory_test.go covers all four write paths + nil-writer no-panic
…t reader RBAC Guardian management bootstrap now provisions ClusterRole + ClusterRoleBinding for the seam LineageController SA (system:serviceaccount:seam-system:lineage-controller) to get/list/watch PermissionSnapshots across all namespaces. ensureSeamLineageControllerRBAC uses SSA (ForceOwnership, field-owner=guardian) and is called from BootstrapAnnotationRunnable.Start after createThirdPartyProfiles, gated on management role (ManagementClusterName != ""). INV-004: Guardian provisions; seam does not self-grant. lineage_controller_rbac_test.go: 3 unit tests.
Implements guardian domain memory persistence for LineageRecord CRs: - LineageArchiveStore interface + InsertLineageArchive() on SQLAuditStore writes to migration #10 lineage_archive table - LazyLineageArchiveStore mirrors LazyAuditDatabase lazy-init pattern for startup sequencing - LineageArchiver polls seam.ontai.dev/v1alpha1/lineagerecords cluster-wide every 60s (LINEAGE_ARCHIVE_INTERVAL env), archives new/changed records tracked by resourceVersion - cnpgStartupRunnable.Start() calls lazyLineage.Set() then starts LineageArchiver goroutine after CNPG connects - Unit tests: 5 database + 6 archiver tests all green Unblocks TC-MC-16 (lineage_archive row verification).
…rootRef LineageRecord spec uses rootBinding.rootKind/rootName/rootNamespace. Prior implementation used a non-existent spec.lineage.rootRef path, leaving root_cr_kind/name/namespace empty in lineage_archive rows. Fixes TC-MC-16 row content.
…ne (feature/post-migration-wis)
…async archive writes RECON-G1: New OperatorContextConflictHandler (ValidatingAdmissionHandler) rejects OperatorContext CREATE/UPDATE when the incoming CR's scope overlaps at the same specificity level as an existing CR in the same namespace. Specificity: 2=exact clusterRefs, 1=clusterRoles, 0=global. Overlap at equal specificity with shared cluster or both-global triggers denial. Self-update (same name) is excluded. List failure admits rather than blocking all OperatorContext ops during API unavailability. Registered via new RegisterOperatorContextConflict method on AdmissionWebhookServer. New ValidatingWebhookConfiguration (failurePolicy=Fail) at /validate-operator-context. 6 unit tests covering all allow/deny permutations including self-update and different-specificity coexistence. RECON-F4: Decouple LineageArchiver DB writes from poll loop via buffered event channel (capacity=500) drained by a pool of 3 writer goroutines. Poll goroutine never blocks on Postgres. Channel saturation drops events (logged) rather than stalling the watcher. Absorbs upgrade waves of ~100 clusters with 5 records each without blocking. RECON-F4.
…tConflict wiring RECON-I2: Split webhook failurePolicy strategy: - guardian-rbac-webhook (validating-webhook-configuration.yaml): failurePolicy=Ignore. RBAC admission checks are liveness-critical; Guardian restart must not block operators from creating RBAC resources. The bootstrap RBAC window (INV-020) closes permanently after first startup; subsequent restart windows are short and acceptable at Ignore. - guardian-lineage-immutability-webhook: failurePolicy=Fail (unchanged) -- lineage immutability violations corrupt causal history. - guardian-operator-context-webhook: failurePolicy=Fail (unchanged) -- unchecked OperatorContext conflicts corrupt governance state. - guardian-operator-cr-webhook: failurePolicy=Fail (unchanged) -- operator CR guard is safety-critical. Wire RegisterOperatorContextConflict in main.go: creates a dedicated local dynamic client from mgr.GetConfig() and registers the conflict handler at /validate-operator-context. Called after all other webhook registrations. 4 new unit tests: - TestWebhookPaths_RBACAndOperatorContextAreDistinct: path separation required for independent WebhookConfiguration failurePolicy assignment. - TestWebhookPaths_AllDistinct: all 7 guardian webhook paths are unique. - TestRBACWebhookConfig_FailurePolicyIsIgnore: reads YAML and verifies Ignore. - TestOperatorContextWebhookConfig_FailurePolicyIsFail: reads YAML and verifies Fail.
…inaries All four envtest suites (controller, webhook, lineage, epg) now check KUBEBUILDER_ASSETS at TestMain startup and call os.Exit(0) when absent, matching the pattern already established in the platform integration suite. BinaryAssetsDirectory is set from the env var so setup-envtest paths are honoured without falling back to the /usr/local/kubebuilder default.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
go test ./...passes (all unit + integration suites)