You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The P2pService gRPC interface currently accepts unauthenticated callers on the cluster network. This RFC proposes the server enforce two independent signals: a bound, audience-scoped projected ServiceAccount token verified via the Kubernetes TokenReview API, and a device-possession check confirming the caller's pod actually requests a fabric device (InfiniBand, RoCE, or AWS EFA).
Motivation
Without server-side enforcement, any workload able to reach the coordinators gRPC port can publish or fetch tensor metadata and trigger weight transfers. This allows for workload reconnaissance, and false registration of metadata which could negatively affect other peers.
L3/L4 NetworkPolicy or an L7 service mesh like Istio can be configured by an operator to lock down access to the coordinator's gRPC interface, but this requires correct configuration by an operator. Instead, I believe there is an easier out of the box solution that should result in minimal operator configuration, and work in all kubernetes environments, even ones without service mesh or a CNI that supports fine grained NetworkPolicies.
ServiceAccount tokens are an out of the box already available K8s identity primitive. Projected tokens are bound to the caller pod (carrying its name and UID as claims the apiserver returns in the TokenReview response), audience-scoped against replay across services, rotated in place by kubelet, invalidated when the bound pod or ServiceAccount is deleted, and short-lived by default. Every Kubernetes workload already has one, so adopting them adds no new operator state.
A token alone is not enough: a leaked or relayed token would otherwise let any workload on the cluster invoke the P2P interface. Pairing token verification with a device-possession check anchors authorization to physical capability. In our proposed scheme, a caller must also hold a device-plugin resource (e.g. rdma/ib, rdma/roce, vpc.amazonaws.com/efa) or a DRA ResourceClaim whose deviceClassName matches the server's allowlist. True NIC-rooted attestation (SPDM on ConnectX/BlueField, Nitro on EFA) would be stronger but the in-pod tooling is not mature enough today, so we rely on the apiserver's allocation record as the possession signal.
Device authorization is a control on who is allowed to populate the coordinator. Any caller who passes the device check can register tensor metadata pointing at arbitrary bytes, and the receiver will RDMA-pull those bytes into its model. This proposal is not a replacement for a scheme where clients verify the content integrity of weights before pulling them.
Prior art
I could not find a similar named feature, but this proposal composes three already named things: bound ServiceAccount token identity, a SPIFFE-style selector over caller properties, and in-application enforcement (rather than relying on mesh or admission). The closest in-tree precedent is the Kubernetes Node Authorizer, which authorizes kubelet requests based on what the scheduler assigned to that node; the same idea applied to fabric devices, in a userspace server.
sequenceDiagram
participant C as Caller (worker pod)
participant L as Tower Layer (P2pService only)
participant K as kube-apiserver
participant R as Reflector store
C->>L: gRPC + Bearer (projected SA token)
L->>K: TokenReview (audience-scoped, cached)
K-->>L: ns, sa, pod_name, pod_uid
L->>R: get(pod_name, ns)
R-->>L: Pod + DRA claims
Note over L: check pod_uid match<br/>and device-plugin or DRA class
alt allowed
L->>C: forward to P2pService
else denied
L-->>C: PermissionDenied (Enforce) / warn (Permissive)
end
Loading
The middleware lives in modelexpress_server/src/auth/layer.rs as a tower Layer/Service rather than a tonic Interceptor because every step after extraction is async. The layer wraps only the P2pService route on the generated server (not via Server::layer), so routing is the allowlist; health, registry, and model services are never wrapped.
State and caching
The request path does no Kubernetes I/O on a cache hit. A background tokio task drives a cluster-wide kube-rs watch into an in-memory Store<Pod> (and Store<ResourceClaim> when DRA is configured); the auth layer reads through Arc<AuthState> into the same store. The watch is cluster-wide because caller pods may live in any namespace; an optional label selector can help narrow down watches on large clusters to put less pressure on the API server.
Each reflector carries an Arc<AtomicBool> healthy flag that the task clears whenever its watch stream ends. The decision function checks the flag before consulting the store and returns Unavailable (denied under Enforce) if false. Without this, a deleted pod the store has not yet forgotten would still be "found" by name.
Token verification is wrapped in two caches: a positive cache keyed by SHA-256 of the bearer token (the token itself is never logged or stored) and a negative cache for definitively-bad tokens. The negative cache prevents a flood of distinct invalid tokens from becoming a flood of TokenReview calls; transient apiserver errors are not cached.
Modes and defaults
The server resolves one of three modes from MODEL_EXPRESS_SECURITY_MODE (or p2pAuth.mode in helm): off skips verification, permissive verifies and logs violations but never blocks, enforce rejects every failed verification. By default, unset resolves to permissive in-cluster and off otherwise, so upgrades do not silently start rejecting traffic and local dev sees no surprise denials.
Operator and client experience
A caller pod needs one additional piece of configuration: a projected-token volume whose audience matches the server's. Kubernetes accepts arbitrary projected-token audiences by default, so no caller-side RBAC changes. The Python client defaults its token path to /var/run/secrets/tokens/modelexpress, so callers following that convention set no environment variables. The interceptor re-reads the token on TTL or mtime change and never logs its value.
Server-side RBAC is bounded: a ClusterRole granting authentication.k8s.io/tokenreviews: create plus cluster-wide pods: list,watch (and resourceclaims: list,watch under DRA). The helm chart creates this when p2pAuth.enabled=true, with an opt-out for operators who manage cluster-scoped RBAC out of band.
Alternatives considered
Service mesh for caller identity - Mesh-mTLS proves cluster-mesh membership, not workload identity. It is also operator-configured policy enforced by a sidecar the application cannot see; a missing or misapplied AuthorizationPolicy fails open silently. A mesh remains useful for transport encryption and L7 telemetry, but it is not a substitute for the server checking identity itself.
Reliance on NetworkPolicy - L3/L4 filtering on pod labels or namespaces. Can restrict who reaches the P2P port but proves nothing about who the caller actually is, and says nothing about whether the caller holds a fabric device. It also depends on the CNI implementing NetworkPolicy correctly and on the policy being present in every namespace where the server runs; missing CNI support or a misapplied selector fails open with no signal. Complementary to in-server auth, not a replacement.
Notes
The implementation of server gRPC ransport TLS is split into a separate, opt-in branch (weaton/tls) so the auth change does not force a topology decision (mesh-terminated mTLS vs app-level TLS) on operators.
The Python worker_server (WorkerService) that exchanges tensor descriptors between workers is outside this control plane and listens plaintext. It doesn't seem worth extending these auth protections to a direct P2P transfer model, which requires workload trust between peers and has no central authority or coordinator. I think device-auth being unsupported explicitly seems reasonable.
References
The implementation is on branch weaton/p2p-grpc-auth (draft PR wseaton/modelexpress#1 for inline review); the optional transport TLS contribution is on weaton/tls. Operator documentation is in docs/DEPLOYMENT.md under "P2P Caller Authentication" (auth branch) and "Transport TLS" (TLS branch).
P2P gRPC security
Status: Draft
Author: Will Eaton (
weaton@redhat.com)Prototype branches:
weaton/p2p-grpc-auth(server-side authn/authz, draft PR wseaton/modelexpress#1),weaton/tls(optional transport TLS, offmain)Summary
The
P2pServicegRPC interface currently accepts unauthenticated callers on the cluster network. This RFC proposes the server enforce two independent signals: a bound, audience-scoped projected ServiceAccount token verified via the KubernetesTokenReviewAPI, and a device-possession check confirming the caller's pod actually requests a fabric device (InfiniBand, RoCE, or AWS EFA).Motivation
Without server-side enforcement, any workload able to reach the coordinators gRPC port can publish or fetch tensor metadata and trigger weight transfers. This allows for workload reconnaissance, and false registration of metadata which could negatively affect other peers.
L3/L4 NetworkPolicy or an L7 service mesh like Istio can be configured by an operator to lock down access to the coordinator's gRPC interface, but this requires correct configuration by an operator. Instead, I believe there is an easier out of the box solution that should result in minimal operator configuration, and work in all kubernetes environments, even ones without service mesh or a CNI that supports fine grained NetworkPolicies.
ServiceAccounttokens are an out of the box already available K8s identity primitive. Projected tokens are bound to the caller pod (carrying its name and UID as claims the apiserver returns in the TokenReview response), audience-scoped against replay across services, rotated in place by kubelet, invalidated when the bound pod or ServiceAccount is deleted, and short-lived by default. Every Kubernetes workload already has one, so adopting them adds no new operator state.A token alone is not enough: a leaked or relayed token would otherwise let any workload on the cluster invoke the P2P interface. Pairing token verification with a device-possession check anchors authorization to physical capability. In our proposed scheme, a caller must also hold a device-plugin resource (e.g.
rdma/ib,rdma/roce,vpc.amazonaws.com/efa) or a DRAResourceClaimwhosedeviceClassNamematches the server's allowlist. True NIC-rooted attestation (SPDM on ConnectX/BlueField, Nitro on EFA) would be stronger but the in-pod tooling is not mature enough today, so we rely on the apiserver's allocation record as the possession signal.Device authorization is a control on who is allowed to populate the coordinator. Any caller who passes the device check can register tensor metadata pointing at arbitrary bytes, and the receiver will RDMA-pull those bytes into its model. This proposal is not a replacement for a scheme where clients verify the content integrity of weights before pulling them.
Prior art
I could not find a similar named feature, but this proposal composes three already named things: bound ServiceAccount token identity, a SPIFFE-style selector over caller properties, and in-application enforcement (rather than relying on mesh or admission). The closest in-tree precedent is the Kubernetes Node Authorizer, which authorizes kubelet requests based on what the scheduler assigned to that node; the same idea applied to fabric devices, in a userspace server.
Related reading:
Design
Verification flow
sequenceDiagram participant C as Caller (worker pod) participant L as Tower Layer (P2pService only) participant K as kube-apiserver participant R as Reflector store C->>L: gRPC + Bearer (projected SA token) L->>K: TokenReview (audience-scoped, cached) K-->>L: ns, sa, pod_name, pod_uid L->>R: get(pod_name, ns) R-->>L: Pod + DRA claims Note over L: check pod_uid match<br/>and device-plugin or DRA class alt allowed L->>C: forward to P2pService else denied L-->>C: PermissionDenied (Enforce) / warn (Permissive) endThe middleware lives in
modelexpress_server/src/auth/layer.rsas a towerLayer/Servicerather than a tonicInterceptorbecause every step after extraction isasync. The layer wraps only theP2pServiceroute on the generated server (not viaServer::layer), so routing is the allowlist; health, registry, and model services are never wrapped.State and caching
The request path does no Kubernetes I/O on a cache hit. A background tokio task drives a cluster-wide
kube-rswatch into an in-memoryStore<Pod>(andStore<ResourceClaim>when DRA is configured); the auth layer reads throughArc<AuthState>into the same store. The watch is cluster-wide because caller pods may live in any namespace; an optional label selector can help narrow down watches on large clusters to put less pressure on the API server.Each reflector carries an
Arc<AtomicBool>healthy flag that the task clears whenever its watch stream ends. The decision function checks the flag before consulting the store and returnsUnavailable(denied under Enforce) if false. Without this, a deleted pod the store has not yet forgotten would still be "found" by name.Token verification is wrapped in two caches: a positive cache keyed by SHA-256 of the bearer token (the token itself is never logged or stored) and a negative cache for definitively-bad tokens. The negative cache prevents a flood of distinct invalid tokens from becoming a flood of
TokenReviewcalls; transient apiserver errors are not cached.Modes and defaults
The server resolves one of three modes from
MODEL_EXPRESS_SECURITY_MODE(orp2pAuth.modein helm):offskips verification,permissiveverifies and logs violations but never blocks,enforcerejects every failed verification. By default, unset resolves topermissivein-cluster andoffotherwise, so upgrades do not silently start rejecting traffic and local dev sees no surprise denials.Operator and client experience
A caller pod needs one additional piece of configuration: a projected-token volume whose
audiencematches the server's. Kubernetes accepts arbitrary projected-token audiences by default, so no caller-side RBAC changes. The Python client defaults its token path to/var/run/secrets/tokens/modelexpress, so callers following that convention set no environment variables. The interceptor re-reads the token on TTL ormtimechange and never logs its value.Server-side RBAC is bounded: a ClusterRole granting
authentication.k8s.io/tokenreviews: createplus cluster-widepods: list,watch(andresourceclaims: list,watchunder DRA). The helm chart creates this whenp2pAuth.enabled=true, with an opt-out for operators who manage cluster-scoped RBAC out of band.Alternatives considered
Service mesh for caller identity - Mesh-mTLS proves cluster-mesh membership, not workload identity. It is also operator-configured policy enforced by a sidecar the application cannot see; a missing or misapplied
AuthorizationPolicyfails open silently. A mesh remains useful for transport encryption and L7 telemetry, but it is not a substitute for the server checking identity itself.Reliance on
NetworkPolicy- L3/L4 filtering on pod labels or namespaces. Can restrict who reaches the P2P port but proves nothing about who the caller actually is, and says nothing about whether the caller holds a fabric device. It also depends on the CNI implementing NetworkPolicy correctly and on the policy being present in every namespace where the server runs; missing CNI support or a misapplied selector fails open with no signal. Complementary to in-server auth, not a replacement.Notes
The implementation of server gRPC ransport TLS is split into a separate, opt-in branch (
weaton/tls) so the auth change does not force a topology decision (mesh-terminated mTLS vs app-level TLS) on operators.The Python
worker_server(WorkerService) that exchanges tensor descriptors between workers is outside this control plane and listens plaintext. It doesn't seem worth extending these auth protections to a direct P2P transfer model, which requires workload trust between peers and has no central authority or coordinator. I think device-auth being unsupported explicitly seems reasonable.References
The implementation is on branch
weaton/p2p-grpc-auth(draft PR wseaton/modelexpress#1 for inline review); the optional transport TLS contribution is onweaton/tls. Operator documentation is indocs/DEPLOYMENT.mdunder "P2P Caller Authentication" (auth branch) and "Transport TLS" (TLS branch).