Skip to content

[RFC] Coordinator gRPC authorization using device assertions #298

@wseaton

Description

@wseaton

P2P gRPC security

Status: Draft
Author: Will Eaton (weaton@redhat.com)
Prototype branches: weaton/p2p-grpc-auth (server-side authn/authz, draft PR wseaton/modelexpress#1), weaton/tls (optional transport TLS, off main)

Summary

The P2pService gRPC interface currently accepts unauthenticated callers on the cluster network. This RFC proposes the server enforce two independent signals: a bound, audience-scoped projected ServiceAccount token verified via the Kubernetes TokenReview API, and a device-possession check confirming the caller's pod actually requests a fabric device (InfiniBand, RoCE, or AWS EFA).

Motivation

Without server-side enforcement, any workload able to reach the coordinators gRPC port can publish or fetch tensor metadata and trigger weight transfers. This allows for workload reconnaissance, and false registration of metadata which could negatively affect other peers.

L3/L4 NetworkPolicy or an L7 service mesh like Istio can be configured by an operator to lock down access to the coordinator's gRPC interface, but this requires correct configuration by an operator. Instead, I believe there is an easier out of the box solution that should result in minimal operator configuration, and work in all kubernetes environments, even ones without service mesh or a CNI that supports fine grained NetworkPolicies.

ServiceAccount tokens are an out of the box already available K8s identity primitive. Projected tokens are bound to the caller pod (carrying its name and UID as claims the apiserver returns in the TokenReview response), audience-scoped against replay across services, rotated in place by kubelet, invalidated when the bound pod or ServiceAccount is deleted, and short-lived by default. Every Kubernetes workload already has one, so adopting them adds no new operator state.

A token alone is not enough: a leaked or relayed token would otherwise let any workload on the cluster invoke the P2P interface. Pairing token verification with a device-possession check anchors authorization to physical capability. In our proposed scheme, a caller must also hold a device-plugin resource (e.g. rdma/ib, rdma/roce, vpc.amazonaws.com/efa) or a DRA ResourceClaim whose deviceClassName matches the server's allowlist. True NIC-rooted attestation (SPDM on ConnectX/BlueField, Nitro on EFA) would be stronger but the in-pod tooling is not mature enough today, so we rely on the apiserver's allocation record as the possession signal.

Device authorization is a control on who is allowed to populate the coordinator. Any caller who passes the device check can register tensor metadata pointing at arbitrary bytes, and the receiver will RDMA-pull those bytes into its model. This proposal is not a replacement for a scheme where clients verify the content integrity of weights before pulling them.

Prior art

I could not find a similar named feature, but this proposal composes three already named things: bound ServiceAccount token identity, a SPIFFE-style selector over caller properties, and in-application enforcement (rather than relying on mesh or admission). The closest in-tree precedent is the Kubernetes Node Authorizer, which authorizes kubelet requests based on what the scheduler assigned to that node; the same idea applied to fabric devices, in a userspace server.

Related reading:

Design

Verification flow

sequenceDiagram
  participant C as Caller (worker pod)
  participant L as Tower Layer (P2pService only)
  participant K as kube-apiserver
  participant R as Reflector store

  C->>L: gRPC + Bearer (projected SA token)
  L->>K: TokenReview (audience-scoped, cached)
  K-->>L: ns, sa, pod_name, pod_uid
  L->>R: get(pod_name, ns)
  R-->>L: Pod + DRA claims
  Note over L: check pod_uid match<br/>and device-plugin or DRA class
  alt allowed
    L->>C: forward to P2pService
  else denied
    L-->>C: PermissionDenied (Enforce) / warn (Permissive)
  end
Loading

The middleware lives in modelexpress_server/src/auth/layer.rs as a tower Layer/Service rather than a tonic Interceptor because every step after extraction is async. The layer wraps only the P2pService route on the generated server (not via Server::layer), so routing is the allowlist; health, registry, and model services are never wrapped.

State and caching

The request path does no Kubernetes I/O on a cache hit. A background tokio task drives a cluster-wide kube-rs watch into an in-memory Store<Pod> (and Store<ResourceClaim> when DRA is configured); the auth layer reads through Arc<AuthState> into the same store. The watch is cluster-wide because caller pods may live in any namespace; an optional label selector can help narrow down watches on large clusters to put less pressure on the API server.

Each reflector carries an Arc<AtomicBool> healthy flag that the task clears whenever its watch stream ends. The decision function checks the flag before consulting the store and returns Unavailable (denied under Enforce) if false. Without this, a deleted pod the store has not yet forgotten would still be "found" by name.

Token verification is wrapped in two caches: a positive cache keyed by SHA-256 of the bearer token (the token itself is never logged or stored) and a negative cache for definitively-bad tokens. The negative cache prevents a flood of distinct invalid tokens from becoming a flood of TokenReview calls; transient apiserver errors are not cached.

Modes and defaults

The server resolves one of three modes from MODEL_EXPRESS_SECURITY_MODE (or p2pAuth.mode in helm): off skips verification, permissive verifies and logs violations but never blocks, enforce rejects every failed verification. By default, unset resolves to permissive in-cluster and off otherwise, so upgrades do not silently start rejecting traffic and local dev sees no surprise denials.

Operator and client experience

A caller pod needs one additional piece of configuration: a projected-token volume whose audience matches the server's. Kubernetes accepts arbitrary projected-token audiences by default, so no caller-side RBAC changes. The Python client defaults its token path to /var/run/secrets/tokens/modelexpress, so callers following that convention set no environment variables. The interceptor re-reads the token on TTL or mtime change and never logs its value.

Server-side RBAC is bounded: a ClusterRole granting authentication.k8s.io/tokenreviews: create plus cluster-wide pods: list,watch (and resourceclaims: list,watch under DRA). The helm chart creates this when p2pAuth.enabled=true, with an opt-out for operators who manage cluster-scoped RBAC out of band.

Alternatives considered

Service mesh for caller identity - Mesh-mTLS proves cluster-mesh membership, not workload identity. It is also operator-configured policy enforced by a sidecar the application cannot see; a missing or misapplied AuthorizationPolicy fails open silently. A mesh remains useful for transport encryption and L7 telemetry, but it is not a substitute for the server checking identity itself.

Reliance on NetworkPolicy - L3/L4 filtering on pod labels or namespaces. Can restrict who reaches the P2P port but proves nothing about who the caller actually is, and says nothing about whether the caller holds a fabric device. It also depends on the CNI implementing NetworkPolicy correctly and on the policy being present in every namespace where the server runs; missing CNI support or a misapplied selector fails open with no signal. Complementary to in-server auth, not a replacement.

Notes

The implementation of server gRPC ransport TLS is split into a separate, opt-in branch (weaton/tls) so the auth change does not force a topology decision (mesh-terminated mTLS vs app-level TLS) on operators.

The Python worker_server (WorkerService) that exchanges tensor descriptors between workers is outside this control plane and listens plaintext. It doesn't seem worth extending these auth protections to a direct P2P transfer model, which requires workload trust between peers and has no central authority or coordinator. I think device-auth being unsupported explicitly seems reasonable.

References

The implementation is on branch weaton/p2p-grpc-auth (draft PR wseaton/modelexpress#1 for inline review); the optional transport TLS contribution is on weaton/tls. Operator documentation is in docs/DEPLOYMENT.md under "P2P Caller Authentication" (auth branch) and "Transport TLS" (TLS branch).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions