Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions charts/operator/crds/platform.nanohype.dev_platforms.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,43 @@ spec:
hosting one or more AgentFleets, with its own budget, identity, and
guardrails.
properties:
attribution:
description: |-
Attribution opts the Platform into per-session human attribution. When
set, the operator provisions a session role — assumable by the tenant
IRSA role with the operator carried as STS SourceIdentity, scoped to the
tenant baseline (Bedrock invoke) and NOT broad sts:AssumeRole — plus a
ClusterRole letting the tenant ServiceAccount impersonate the named
operators at the apiserver. fab's role-session entrypoint consumes both,
so an agent's AWS + Kubernetes actions attribute to a named human.
nil = unattributed (the default).
properties:
operators:
description: |-
Operators is the set of human identities (e.g. email addresses) a
session in this Platform may act as. Each value becomes both an allowed
STS SourceIdentity on the session role's trust policy and a resourceNames
entry on the impersonate ClusterRole, so the SAME string binds the AWS
and Kubernetes audit records. Use a canonical form (a lowercased email);
it must byte-match the operator's own RBAC subject name.
items:
type: string
minItems: 1
type: array
sessionRoleMaxDurationSeconds:
default: 3600
description: |-
SessionRoleMaxDurationSeconds caps the assumed session lifetime. Because
the caller is the tenant IRSA role, AWS STS role chaining hard-caps a
chained session at 3600s regardless of this value; larger values only
matter if the caller ever changes. Defaults to 3600.
format: int32
maximum: 43200
minimum: 900
type: integer
required:
- operators
type: object
budget:
description: Budget references a BudgetPolicy CR in the same namespace.
properties:
Expand Down Expand Up @@ -230,6 +267,11 @@ spec:
phase:
description: 'Phase: Pending, Provisioning, Ready, Suspended, Failed.'
type: string
sessionRoleArn:
description: |-
SessionRoleArn is the per-Platform attribution session role, created when
spec.attribution is set. Empty when attribution is off.
type: string
suspendedAt:
description: |-
SuspendedAt is the timestamp at which the kill-switch fired. When
Expand Down
20 changes: 20 additions & 0 deletions docs/crd-reference/v1alpha1.md
Original file line number Diff line number Diff line change
Expand Up @@ -621,6 +621,24 @@ Package v1alpha1 contains API Schema definitions for the platform v1alpha1 API g



#### AttributionSpec



AttributionSpec configures per-session human attribution for a Platform. See
github.com/nanohype/fab docs/attribution.md for the consumer side.



_Appears in:_
- [PlatformSpec](#platformspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `operators` _string array_ | Operators is the set of human identities (e.g. email addresses) a<br />session in this Platform may act as. Each value becomes both an allowed<br />STS SourceIdentity on the session role's trust policy and a resourceNames<br />entry on the impersonate ClusterRole, so the SAME string binds the AWS<br />and Kubernetes audit records. Use a canonical form (a lowercased email);<br />it must byte-match the operator's own RBAC subject name. | | MinItems: 1 <br /> |
| `sessionRoleMaxDurationSeconds` _integer_ | SessionRoleMaxDurationSeconds caps the assumed session lifetime. Because<br />the caller is the tenant IRSA role, AWS STS role chaining hard-caps a<br />chained session at 3600s regardless of this value; larger values only<br />matter if the caller ever changes. Defaults to 3600. | 3600 | Maximum: 43200 <br />Minimum: 900 <br />Optional: \{\} <br /> |


#### BudgetRef


Expand Down Expand Up @@ -737,6 +755,7 @@ _Appears in:_
| `identity` _[IdentitySpec](#identityspec)_ | Identity controls how the IRSA role is named + which Bedrock models are<br />reachable. | | |
| `compliance` _[ComplianceSpec](#compliancespec)_ | Compliance flags drive stricter defaults across the Platform. | | Optional: \{\} <br /> |
| `isolation` _string_ | Isolation: namespace (default) or vCluster (hard isolation). | namespace | Enum: [namespace vcluster] <br />Optional: \{\} <br /> |
| `attribution` _[AttributionSpec](#attributionspec)_ | Attribution opts the Platform into per-session human attribution. When<br />set, the operator provisions a session role — assumable by the tenant<br />IRSA role with the operator carried as STS SourceIdentity, scoped to the<br />tenant baseline (Bedrock invoke) and NOT broad sts:AssumeRole — plus a<br />ClusterRole letting the tenant ServiceAccount impersonate the named<br />operators at the apiserver. fab's role-session entrypoint consumes both,<br />so an agent's AWS + Kubernetes actions attribute to a named human.<br />nil = unattributed (the default). | | Optional: \{\} <br /> |


#### PlatformStatus
Expand All @@ -754,6 +773,7 @@ _Appears in:_
| --- | --- | --- | --- |
| `phase` _string_ | Phase: Pending, Provisioning, Ready, Suspended, Failed. | | Optional: \{\} <br /> |
| `iamRoleArn` _string_ | IamRoleArn is the per-Platform IRSA role created by the controller. | | Optional: \{\} <br /> |
| `sessionRoleArn` _string_ | SessionRoleArn is the per-Platform attribution session role, created when<br />spec.attribution is set. Empty when attribution is off. | | Optional: \{\} <br /> |
| `namespace` _string_ | Namespace is the tenant namespace the controller provisioned. | | Optional: \{\} <br /> |
| `observedGeneration` _integer_ | ObservedGeneration is the last spec.generation the controller reconciled. | | Optional: \{\} <br /> |
| `suspendedAt` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.33/#time-v1-meta)_ | SuspendedAt is the timestamp at which the kill-switch fired. When<br />non-nil the operator stops reattaching the baseline IAM policy and<br />the AgentFleetReconciler scales fleets to zero. Resets to nil only<br />when ops clears the iam:TagRole 'platform.nanohype.dev/suspended'<br />marker on the tenant IRSA role. | | Optional: \{\} <br /> |
Expand Down
39 changes: 39 additions & 0 deletions operators/api/platform/v1alpha1/platform_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,40 @@ type PlatformSpec struct {
// +kubebuilder:default=namespace
// +optional
Isolation string `json:"isolation,omitempty"`

// Attribution opts the Platform into per-session human attribution. When
// set, the operator provisions a session role — assumable by the tenant
// IRSA role with the operator carried as STS SourceIdentity, scoped to the
// tenant baseline (Bedrock invoke) and NOT broad sts:AssumeRole — plus a
// ClusterRole letting the tenant ServiceAccount impersonate the named
// operators at the apiserver. fab's role-session entrypoint consumes both,
// so an agent's AWS + Kubernetes actions attribute to a named human.
// nil = unattributed (the default).
// +optional
Attribution *AttributionSpec `json:"attribution,omitempty"`
}

// AttributionSpec configures per-session human attribution for a Platform. See
// github.com/nanohype/fab docs/attribution.md for the consumer side.
type AttributionSpec struct {
// Operators is the set of human identities (e.g. email addresses) a
// session in this Platform may act as. Each value becomes both an allowed
// STS SourceIdentity on the session role's trust policy and a resourceNames
// entry on the impersonate ClusterRole, so the SAME string binds the AWS
// and Kubernetes audit records. Use a canonical form (a lowercased email);
// it must byte-match the operator's own RBAC subject name.
// +kubebuilder:validation:MinItems=1
Operators []string `json:"operators"`

// SessionRoleMaxDurationSeconds caps the assumed session lifetime. Because
// the caller is the tenant IRSA role, AWS STS role chaining hard-caps a
// chained session at 3600s regardless of this value; larger values only
// matter if the caller ever changes. Defaults to 3600.
// +kubebuilder:validation:Minimum=900
// +kubebuilder:validation:Maximum=43200
// +kubebuilder:default=3600
// +optional
SessionRoleMaxDurationSeconds *int32 `json:"sessionRoleMaxDurationSeconds,omitempty"`
}

// BudgetRef points at a BudgetPolicy by name.
Expand Down Expand Up @@ -91,6 +125,11 @@ type PlatformStatus struct {
// +optional
IamRoleArn string `json:"iamRoleArn,omitempty"`

// SessionRoleArn is the per-Platform attribution session role, created when
// spec.attribution is set. Empty when attribution is off.
// +optional
SessionRoleArn string `json:"sessionRoleArn,omitempty"`

// Namespace is the tenant namespace the controller provisioned.
// +optional
Namespace string `json:"namespace,omitempty"`
Expand Down
30 changes: 30 additions & 0 deletions operators/api/platform/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

42 changes: 42 additions & 0 deletions operators/config/crd/bases/platform.nanohype.dev_platforms.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,43 @@ spec:
hosting one or more AgentFleets, with its own budget, identity, and
guardrails.
properties:
attribution:
description: |-
Attribution opts the Platform into per-session human attribution. When
set, the operator provisions a session role — assumable by the tenant
IRSA role with the operator carried as STS SourceIdentity, scoped to the
tenant baseline (Bedrock invoke) and NOT broad sts:AssumeRole — plus a
ClusterRole letting the tenant ServiceAccount impersonate the named
operators at the apiserver. fab's role-session entrypoint consumes both,
so an agent's AWS + Kubernetes actions attribute to a named human.
nil = unattributed (the default).
properties:
operators:
description: |-
Operators is the set of human identities (e.g. email addresses) a
session in this Platform may act as. Each value becomes both an allowed
STS SourceIdentity on the session role's trust policy and a resourceNames
entry on the impersonate ClusterRole, so the SAME string binds the AWS
and Kubernetes audit records. Use a canonical form (a lowercased email);
it must byte-match the operator's own RBAC subject name.
items:
type: string
minItems: 1
type: array
sessionRoleMaxDurationSeconds:
default: 3600
description: |-
SessionRoleMaxDurationSeconds caps the assumed session lifetime. Because
the caller is the tenant IRSA role, AWS STS role chaining hard-caps a
chained session at 3600s regardless of this value; larger values only
matter if the caller ever changes. Defaults to 3600.
format: int32
maximum: 43200
minimum: 900
type: integer
required:
- operators
type: object
budget:
description: Budget references a BudgetPolicy CR in the same namespace.
properties:
Expand Down Expand Up @@ -230,6 +267,11 @@ spec:
phase:
description: 'Phase: Pending, Provisioning, Ready, Suspended, Failed.'
type: string
sessionRoleArn:
description: |-
SessionRoleArn is the per-Platform attribution session role, created when
spec.attribution is set. Empty when attribution is off.
type: string
suspendedAt:
description: |-
SuspendedAt is the timestamp at which the kill-switch fired. When
Expand Down
19 changes: 19 additions & 0 deletions operators/config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@ rules:
- patch
- update
- watch
- apiGroups:
- ""
resources:
- users
verbs:
- impersonate
- apiGroups:
- agentgateway.dev
resources:
Expand Down Expand Up @@ -220,6 +226,19 @@ rules:
- get
- patch
- update
- apiGroups:
- rbac.authorization.k8s.io
resources:
- clusterrolebindings
- clusterroles
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- resource.k8s.io
resources:
Expand Down
45 changes: 45 additions & 0 deletions operators/internal/controller/platform_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ type PlatformReconciler struct {
// +kubebuilder:rbac:groups="",resources=namespaces;resourcequotas;limitranges,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=networking.k8s.io,resources=networkpolicies,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=argoproj.io,resources=appprojects,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=rbac.authorization.k8s.io,resources=clusterroles;clusterrolebindings,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups="",resources=users,verbs=impersonate

// Reconcile drives a Platform CR toward its desired state.
func (r *PlatformReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
Expand Down Expand Up @@ -105,6 +107,15 @@ func (r *PlatformReconciler) Reconcile(ctx context.Context, req ctrl.Request) (c
logger.Error(err, "IAM role cleanup failed; will retry")
return ctrl.Result{}, err
}
// Attribution resources (no-ops when attribution was never enabled).
if err := r.deleteSessionRole(ctx, platform, r.IAMCfg.Environment); err != nil {
logger.Error(err, "session role cleanup failed; will retry")
return ctrl.Result{}, err
}
if err := r.deleteOperatorImpersonateRBAC(ctx, platform); err != nil {
logger.Error(err, "impersonate RBAC cleanup failed; will retry")
return ctrl.Result{}, err
}
controllerutil.RemoveFinalizer(platform, finalizerName)
if err := r.Update(ctx, platform); err != nil {
return ctrl.Result{}, fmt.Errorf("remove finalizer: %w", err)
Expand Down Expand Up @@ -199,6 +210,40 @@ func (r *PlatformReconciler) Reconcile(ctx context.Context, req ctrl.Request) (c
return ctrl.Result{}, err
}

// Per-session human attribution (optional). Provision the session role
// (assumable by the tenant IRSA role with the operator as STS
// SourceIdentity) + the apiserver impersonate RBAC. Reconciles in both
// directions: removing spec.attribution tears the pair back down. The
// session role honors the kill-switch via the susp.Suspended flag (baseline
// detached when suspended, like the tenant role).
if platform.Spec.Attribution != nil {
if susp.RoleARN != "" {
sessionARN, err := r.ensureSessionRole(ctx, platform, susp.RoleARN, susp.Suspended, r.IAMCfg)
if err != nil {
logger.Error(err, "ensureSessionRole failed")
return ctrl.Result{}, err
}
if sessionARN != "" {
platform.Status.SessionRoleArn = sessionARN
}
}
if err := r.ensureOperatorImpersonateRBAC(ctx, platform); err != nil {
logger.Error(err, "ensureOperatorImpersonateRBAC failed")
return ctrl.Result{}, err
}
} else if platform.Status.SessionRoleArn != "" {
// Attribution was enabled and is now removed — tear the pair down.
if err := r.deleteSessionRole(ctx, platform, r.IAMCfg.Environment); err != nil {
logger.Error(err, "deleteSessionRole (attribution removed) failed")
return ctrl.Result{}, err
}
if err := r.deleteOperatorImpersonateRBAC(ctx, platform); err != nil {
logger.Error(err, "deleteOperatorImpersonateRBAC (attribution removed) failed")
return ctrl.Result{}, err
}
platform.Status.SessionRoleArn = ""
}

if susp.Suspended {
platform.Status.Phase = phaseSuspended
if platform.Status.SuspendedAt == nil {
Expand Down
11 changes: 9 additions & 2 deletions operators/internal/controller/platform_iam.go
Original file line number Diff line number Diff line change
Expand Up @@ -317,12 +317,19 @@ func (r *PlatformReconciler) reconcileManagedPolicies(ctx context.Context, roleN
}

// deleteIamRole is the finalizer counterpart: detach all policies and
// delete the role. Tolerates NotFound so re-runs are safe.
// delete the tenant role. Tolerates NotFound so re-runs are safe.
func (r *PlatformReconciler) deleteIamRole(ctx context.Context, p *platformv1alpha1.Platform, environment string) error {
return r.detachAndDeleteRole(ctx, tenantRoleName(environment, p))
}

// detachAndDeleteRole detaches every managed policy from a role and deletes
// it. Shared by the tenant-role and session-role finalizers. Tolerates
// NotFound at every step so re-runs (and roles that were never created) are
// safe no-ops.
func (r *PlatformReconciler) detachAndDeleteRole(ctx context.Context, name string) error {
if r.IAM == nil {
return nil
}
name := tenantRoleName(environment, p)
var marker *string
for {
listOut, err := r.IAM.ListAttachedRolePolicies(ctx, &iam.ListAttachedRolePoliciesInput{
Expand Down
Loading
Loading