Skip to content

attribution: per-session human identity (STS SourceIdentity + k8s impersonation)#30

Merged
stxkxs merged 3 commits into
mainfrom
attribution-source-identity
Jun 20, 2026
Merged

attribution: per-session human identity (STS SourceIdentity + k8s impersonation)#30
stxkxs merged 3 commits into
mainfrom
attribution-source-identity

Conversation

@stxkxs

@stxkxs stxkxs commented Jun 12, 2026

Copy link
Copy Markdown
Member

Draft — the per-session human-attribution hook. Closes the "every agent action collapses to one anonymous IRSA role" gap so an evidence engine can bind actions to a named human. Marked draft because it needs companion platform IAM/RBAC (below) stood up before it does anything.

What

FAB_OPERATOR=<human> + FAB_SESSION_ROLE_ARN → the in-pod session, before the agent loop:

  • AWS — assumes the session role with the operator as STS SourceIdentity (via the aws CLI already in the image; zero new deps), exports the temp creds, and drops the pod's IRSA web-identity vars. Every AWS call — Bedrock InvokeModel and any aws the Bash tool runs — is recorded in CloudTrail under SourceIdentity=<operator>.
  • Kubernetes — writes a kubeconfig that authenticates with the SA token but impersonates the operator, so apiserver audit records impersonatedUser=<operator>.

The operator is validated STS-clean up front so the same string binds both streams; creds + kubeconfig are computed before any env mutation (no half-attributed env); fail-closed if setup fails. Inert for every other runtime and when FAB_OPERATOR is unset.

Crossbearing consumes it

CloudTrail SourceIdentityAttrSTSSourceIdentity; K8s impersonatedUserAttrK8sImpersonation. Both genuinely extracted and bound by the engine (verified against its ingesters during review). This is the "after" state of the divergence demo — corroborated agent actions attribute to a named human instead of a faceless role.

Needs (before merge) — platform IAM/RBAC

Documented in docs/attribution.md:

  • A session role whose trust policy lets the tenant IRSA role sts:AssumeRole + sts:SetSourceIdentity, with bedrock:InvokeModel in its permission policy.
  • K8s impersonate RBAC for the session SA, scoped to the operator user(s).

Notes

  • Reviewed adversarially (security/STS mechanics, fab conventions, crossbearing-consumability) — all sound; review fixes folded in (transactional env, IRSA-var cleanup, STS-clean operator validation, role-check-before-STS, doc accuracy on what crossbearing actually joins).
  • Limitations: process-wide operator (a real impl threads the requesting human per workflow onto AgentSandbox.spec), credential-TTL hard cliff (fail-safe), success-only records.
  • 20 tests; full suite 320/320; typecheck + eslint + prettier clean.

🤖 Generated with Claude Code

stxkxs and others added 2 commits June 11, 2026 18:53
…ersonation)

By default a fab session acts as the pod's tenant IRSA role, bound to no
named human — so every Bedrock call and every `aws`/`kubectl` the agent
runs traces to a role, not a person. That is exactly the gap an evidence
engine surfaces. This is the opt-in that closes it.

─── src/attribution.ts ───
Set FAB_OPERATOR (a named human) + FAB_SESSION_ROLE_ARN and the in-pod
session, before the agent loop:
  - assumes the session role carrying the operator as STS SourceIdentity
    (via the `aws` CLI already in the image — no new dependency, same
    child-process posture as the claude-cli runtime), exports the temp
    creds, and drops the pod's IRSA web-identity vars so exactly one
    credential mechanism remains. Every AWS call — the Bedrock InvokeModel
    inference call and any `aws` the Bash tool runs — is then recorded in
    CloudTrail under SourceIdentity=<operator>.
  - writes a kubeconfig that authenticates with the SA token but
    impersonates the operator (KUBECONFIG), so apiserver audit records
    impersonatedUser=<operator>.
The operator is validated STS-clean up front so the SAME string binds both
streams. Credential assumption + kubeconfig are computed before any env
mutation (no half-attributed env). Fail-closed: if attribution was
requested but setup fails, the session aborts rather than run unattributed.

─── Wiring ───
role-session.ts applies it once per session pod (after a cheap role check
that avoids a wasted STS call on a typo'd role); sdk-k8s.ts forwards
FAB_OPERATOR / FAB_SESSION_ROLE_ARN / FAB_SESSION_DURATION onto the pod.
Inert for every other runtime and when FAB_OPERATOR is unset.

─── Why SourceIdentity, not Bedrock requestMetadata ───
The Agent SDK doesn't expose the InvokeModel request, so fab can't stamp
requestMetadata from this path. SourceIdentity rides the credentials the
SDK already resolves and is crossbearing's strongest binding — it
attributes the agent's aws/kubectl tool-call records, which crossbearing
corroborates.

docs/attribution.md covers the required platform IAM (session-role trust
policy allowing sts:AssumeRole + sts:SetSourceIdentity; the role needs
bedrock:InvokeModel) and the k8s impersonate RBAC, plus limitations
(process-wide operator, credential-TTL hard cliff, success-only records).
20 tests; full suite green; lint + format clean.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
…at model

A quality-check hardening pass over the per-session attribution feature. No
behavior change to the happy path; every edit closes a verified gap from an
adversarial review. Findings were all non-blocking polish — this folds them in.

─── Input validation (src/attribution.ts) ───────────────────────────────
- FAB_SESSION_ROLE_ARN is now shape-validated up front (ROLE_ARN_RE) so a
  typo fails fast with a clear message instead of an opaque STS error after a
  network round-trip — matching the rigor already applied to the operator.
  Accepts every real partition (aws / aws-cn / aws-us-gov / aws-iso-*) and
  role paths; rejects non-ARNs, wrong service/resource, and bad account ids.
- FAB_SESSION_DURATION parse is gated on a decimal-only regex before Number(),
  so hex/float/garbage ("0x384", "3600.5", "3600abc") reject deterministically
  rather than slipping through Number()'s coercion.
- The post-assume credential cleanup is expanded from 3 to 8 env vars
  (adds AWS_CONTAINER_CREDENTIALS_FULL_URI/RELATIVE_URI, AWS_PROFILE,
  AWS_SHARED_CREDENTIALS_FILE, AWS_CONFIG_FILE) so the assumed SourceIdentity
  creds are the only source the default provider chain can resolve regardless
  of pod env shape. The comment is rescoped to the default chain — it no longer
  overclaims "exactly one mechanism" and notes it cannot stop explicit re-auth
  with the still-mounted web-identity token. AWS_REGION is deliberately kept so
  Bedrock base-URL derivation still works.
- The MAX_DURATION_SECONDS / durationSeconds comments now record that STS role
  chaining caps a chained session at 3600s, so values in (3600, 43200] fail
  closed at assume-role time rather than silently.

─── Tests (+6 net; suite 320 → 326) ─────────────────────────────────────
Pins the security-load-bearing behavior a green baseline couldn't catch:
- role-session.ts fail-closed: a real spy on SdkRuntime.prototype.runRoleSession
  asserts exit 1 AND that the runtime is never reached when attribution throws
  (deterministic + offline — an invalid operator throws before any aws call).
- applySessionIdentity throw-leaves-env-pristine: a throwing CliRunner proves
  the compute-both-bindings-before-mutate ordering leaves no half-attributed
  state (IRSA vars survive, no creds/kubeconfig set).
- kubeconfig file mode asserted 0600 (statSync), not just contents.
- non-decimal duration rejection, the 8-var cleanup, and GovCloud-partition
  ARN acceptance.

─── Docs ─────────────────────────────────────────────────────────────────
- docs/attribution.md gains a "Threat model" section stating plainly that this
  attributes a cooperating agent and is NOT a containment control: both
  bindings are droppable under bypassPermissions (drop KUBECONFIG; re-auth via
  the mounted web-identity token), AWS resistance relies on STS SourceIdentity
  stickiness, and "strongest binding" is relative to requestMetadata, not
  absolute. It names the platform backstops (session SA holds no direct RBAC;
  session role denies broad sts:AssumeRole). The module header carries the same
  scope note.
- A "Duration and the role-chaining cap" section + a config-table note explain
  the 3600s ceiling. An operator-must-byte-match-RBAC caveat covers the
  silent AWS-attributed/K8s-denied split.
- The feature is no longer orphaned: the three FAB_OPERATOR/FAB_SESSION_* vars
  are now surfaced consistently in docs/transports.md, README.md, CLAUDE.md's
  architecture map, and .env.example, all pointing at docs/attribution.md.

Build / lint / format / test all green. Still draft — the companion session
role IAM and impersonate RBAC live platform-side (eks-agent-platform) and must
exist before FAB_OPERATOR does anything.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
The attribution and sdk-k8s session-role fixtures hard-coded a real AWS
account id (351619759866) inside their role ARNs. fab is a public repo, so
this PR's branch was publishing a real account number in plain text.

Swap it for the RFC-style documentation placeholder 111111111111 across
both files. Each occurrence is a paired input + expected value (the ARN is
fed in and asserted back unchanged), so the substitution is behavior-neutral
— vitest still passes (30 tests). No production code touched; fixtures only.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
stxkxs added a commit to nanohype/eks-agent-platform that referenced this pull request Jun 20, 2026
…mpersonate RBAC

A Platform opts into per-session human attribution with spec.attribution; the
operator then provisions, per tenant, the two resources fab's role-session
entrypoint needs, and tears them down on removal or deletion. The consumer side
is documented in nanohype/fab docs/attribution.md.

─── API (api/platform/v1alpha1) ──────────────────────────────────────────
- PlatformSpec.Attribution (*AttributionSpec, optional): Operators []string
  (required, min 1) and SessionRoleMaxDurationSeconds (*int32, 900–43200,
  default 3600). nil = unattributed, the default.
- PlatformStatus.SessionRoleArn carries the provisioned session role ARN.
- Each operator string is reused verbatim as BOTH an allowed STS SourceIdentity
  and an impersonate resourceName, so the same identity binds the AWS and
  Kubernetes audit records.

─── Session role (platform_session_iam.go) ───────────────────────────────
- ensureSessionRole mints <env>-<platform>-session (same 64-char + FNV-1a hash
  scheme as the tenant role). Trust: only the tenant IRSA role may assume, and
  only while setting one of the Platform's operators as SourceIdentity — Action
  [sts:AssumeRole, sts:SetSourceIdentity] with a StringEquals condition on
  sts:SourceIdentity. Permissions: the tenant baseline policy (Bedrock invoke)
  and the same permissions boundary — never broad sts:AssumeRole.
- Idempotent: GetRole→Create on miss; on hit it refreshes the trust policy
  (the operator list can change) and converges the baseline attachment.
- Kill-switch parity: when the Platform is suspended the session role's baseline
  is detached, so a suspended tenant can't keep invoking Bedrock through it.
- deleteIamRole/deleteSessionRole share a detachAndDeleteRole helper.

─── Impersonate RBAC (platform_rbac.go) ──────────────────────────────────
- ensureOperatorImpersonateRBAC creates a ClusterRole granting impersonate on
  exactly the named operator users (never impersonate *) bound to the
  tenant-runtime ServiceAccount, named <tenant-ns>-impersonate. fab's session
  kubeconfig authenticates with that SA token while impersonating the operator,
  so apiserver audit records impersonatedUser=<operator>. Cluster-scoped, so
  reaped through the finalizer rather than OwnerReferences.

─── Wiring (platform_controller.go) ──────────────────────────────────────
- Reconcile provisions the pair when spec.attribution is set (after the tenant
  IRSA role + SA exist), records status.SessionRoleArn, and tears the pair down
  when attribution is removed. Finalizer cleans up both (no-ops when never
  enabled). RBAC markers added for clusterroles/clusterrolebindings and (for
  escalation prevention) impersonate on users.

─── Tests / codegen ──────────────────────────────────────────────────────
- 10 unit tests (session-role trust/baseline/duration/idempotency/suspend/
  delete via fakeIAM; impersonate RBAC create/update/delete via the
  controller-runtime fake client). Build, vet, golangci-lint, and the
  internal + api unit suites pass locally; CI is green, including the envtest
  conformance suite.
- Deepcopy, the CRD (config + Helm chart), the RBAC role, and the CRD reference
  doc are generated.

Pairs with nanohype/fab#30.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
@stxkxs stxkxs marked this pull request as ready for review June 20, 2026 03:51
@stxkxs stxkxs merged commit 546f701 into main Jun 20, 2026
5 checks passed
@stxkxs stxkxs deleted the attribution-source-identity branch June 20, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant