[RFC] Coordinator gRPC authorization using device assertions

# P2P gRPC security

**Status:** Draft
**Author:** Will Eaton (`weaton@redhat.com`)  
**Prototype branches:** [`weaton/p2p-grpc-auth`](https://github.com/wseaton/modelexpress/tree/weaton/p2p-grpc-auth) (server-side authn/authz, draft PR [wseaton/modelexpress#1](https://github.com/wseaton/modelexpress/pull/1)), [`weaton/tls`](https://github.com/wseaton/modelexpress/tree/weaton/tls) (optional transport TLS, off `main`)

## Summary

The `P2pService` gRPC interface currently accepts unauthenticated callers on the cluster network. This RFC proposes the server enforce two independent signals: a bound, audience-scoped projected ServiceAccount token verified via the Kubernetes `TokenReview` API, and a device-possession check confirming the caller's pod actually requests a fabric device (InfiniBand, RoCE, or AWS EFA).

## Motivation

Without server-side enforcement, any workload able to reach the coordinators gRPC port can publish or fetch tensor metadata and trigger weight transfers. This allows for workload reconnaissance, and false registration of metadata which could negatively affect other peers.

L3/L4 NetworkPolicy or an L7 service mesh like Istio can be configured by an operator to lock down access to the coordinator's gRPC interface, but this requires correct configuration by an operator. Instead, I believe there is an easier out of the box solution that should result in minimal operator configuration, and work in all kubernetes environments, even ones without service mesh or a CNI that supports fine grained NetworkPolicies.

`ServiceAccount` tokens are an out of the box already available K8s identity primitive. Projected tokens are bound to the caller pod (carrying its name and UID as claims the apiserver returns in the TokenReview response), audience-scoped against replay across services, rotated in place by kubelet, invalidated when the bound pod or ServiceAccount is deleted, and short-lived by default. Every Kubernetes workload already has one, so adopting them adds no new operator state.

A token alone is not enough: a leaked or relayed token would otherwise let any workload on the cluster invoke the P2P interface. Pairing token verification with a device-possession check anchors authorization to physical capability. In our proposed scheme, a caller must also hold a device-plugin resource (e.g. `rdma/ib`, `rdma/roce`, `vpc.amazonaws.com/efa`) or a DRA `ResourceClaim` whose `deviceClassName` matches the server's allowlist. True NIC-rooted attestation (SPDM on ConnectX/BlueField, Nitro on EFA) would be stronger but the in-pod tooling is not mature enough today, so we rely on the apiserver's allocation record as the possession signal.

Device authorization is a control on who is allowed to populate the coordinator. Any caller who passes the device check can register tensor metadata pointing at arbitrary bytes, and the receiver will RDMA-pull those bytes into its model. This proposal is not a replacement for a scheme where clients verify the content integrity of weights before pulling them.

## Prior art

I could not find a similar named feature, but this proposal composes three already named things: bound ServiceAccount token identity, a SPIFFE-style *selector* over caller properties, and in-application enforcement (rather than relying on mesh or admission). The closest in-tree precedent is the [Kubernetes Node Authorizer](https://kubernetes.io/docs/reference/access-authn-authz/node/), which authorizes kubelet requests based on what the scheduler assigned to that node; the same idea applied to fabric devices, in a userspace server.

Related reading:

- SPIFFE/SPIRE [workload attestor (Kubernetes)](https://github.com/spiffe/spire/blob/main/doc/plugin_agent_workloadattestor_k8s.md) and [selector concepts](https://spiffe.io/docs/latest/spire-about/spire-concepts/)
- Kubernetes [BoundServiceAccountTokenVolume](https://kubernetes.io/docs/reference/access-authn/service-accounts-admin/#bound-service-account-tokens) and [TokenRequest API](https://kubernetes.io/docs/reference/access-authn/authentication/#service-account-tokens)
- Linkerd, [Using Kubernetes's new bound service account tokens for secure workload identity](https://linkerd.io/2021/12/28/using-kubernetess-new-bound-service-account-tokens-for-secure-workload-identity/) (2021)
- Google, [BeyondProd](https://cloud.google.com/docs/security/beyondprod) (zero-trust workload identity, motivation only)

## Design

### Verification flow

```mermaid
sequenceDiagram
  participant C as Caller (worker pod)
  participant L as Tower Layer (P2pService only)
  participant K as kube-apiserver
  participant R as Reflector store

  C->>L: gRPC + Bearer (projected SA token)
  L->>K: TokenReview (audience-scoped, cached)
  K-->>L: ns, sa, pod_name, pod_uid
  L->>R: get(pod_name, ns)
  R-->>L: Pod + DRA claims
  Note over L: check pod_uid match<br/>and device-plugin or DRA class
  alt allowed
    L->>C: forward to P2pService
  else denied
    L-->>C: PermissionDenied (Enforce) / warn (Permissive)
  end
```

The middleware lives in `modelexpress_server/src/auth/layer.rs` as a tower `Layer`/`Service` rather than a tonic `Interceptor` because every step after extraction is `async`. The layer wraps only the `P2pService` route on the generated server (not via `Server::layer`), so routing is the allowlist; health, registry, and model services are never wrapped.

### State and caching

The request path does no Kubernetes I/O on a cache hit. A background tokio task drives a cluster-wide `kube-rs` watch into an in-memory `Store<Pod>` (and `Store<ResourceClaim>` when DRA is configured); the auth layer reads through `Arc<AuthState>` into the same store. The watch is cluster-wide because caller pods may live in any namespace; an optional label selector can help narrow down watches on large clusters to put less pressure on the API server.

Each reflector carries an `Arc<AtomicBool>` healthy flag that the task clears whenever its watch stream ends. The decision function checks the flag before consulting the store and returns `Unavailable` (denied under Enforce) if false. Without this, a deleted pod the store has not yet forgotten would still be "found" by name.

Token verification is wrapped in two caches: a positive cache keyed by SHA-256 of the bearer token (the token itself is never logged or stored) and a negative cache for definitively-bad tokens. The negative cache prevents a flood of distinct invalid tokens from becoming a flood of `TokenReview` calls; transient apiserver errors are not cached.

### Modes and defaults

The server resolves one of three modes from `MODEL_EXPRESS_SECURITY_MODE` (or `p2pAuth.mode` in helm): `off` skips verification, `permissive` verifies and logs violations but never blocks, `enforce` rejects every failed verification. By default, unset resolves to `permissive` in-cluster and `off` otherwise, so upgrades do not silently start rejecting traffic and local dev sees no surprise denials.

## Operator and client experience

A caller pod needs one additional piece of configuration: a projected-token volume whose `audience` matches the server's. Kubernetes accepts arbitrary projected-token audiences by default, so no caller-side RBAC changes. The Python client defaults its token path to `/var/run/secrets/tokens/modelexpress`, so callers following that convention set no environment variables. The interceptor re-reads the token on TTL or `mtime` change and never logs its value.

Server-side RBAC is bounded: a ClusterRole granting `authentication.k8s.io/tokenreviews: create` plus cluster-wide `pods: list,watch` (and `resourceclaims: list,watch` under DRA). The helm chart creates this when `p2pAuth.enabled=true`, with an opt-out for operators who manage cluster-scoped RBAC out of band.

## Alternatives considered

**Service mesh for caller identity** - Mesh-mTLS proves cluster-mesh membership, not workload identity. It is also operator-configured policy enforced by a sidecar the application cannot see; a missing or misapplied `AuthorizationPolicy` fails open silently. A mesh remains useful for transport encryption and L7 telemetry, but it is not a substitute for the server checking identity itself.

**Reliance on `NetworkPolicy`** - L3/L4 filtering on pod labels or namespaces. Can restrict who reaches the P2P port but proves nothing about who the caller actually is, and says nothing about whether the caller holds a fabric device. It also depends on the CNI implementing NetworkPolicy correctly and on the policy being present in every namespace where the server runs; missing CNI support or a misapplied selector fails open with no signal. Complementary to in-server auth, not a replacement.

## Notes

The implementation of server gRPC ransport TLS is split into a separate, opt-in branch ([`weaton/tls`](https://github.com/wseaton/modelexpress/tree/weaton/tls)) so the auth change does not force a topology decision (mesh-terminated mTLS vs app-level TLS) on operators.

The Python `worker_server` (`WorkerService`) that exchanges tensor descriptors between workers is outside this control plane and listens plaintext. It doesn't seem worth extending these auth protections to a direct P2P transfer model, which requires workload trust between peers and has no central authority or coordinator. I think device-auth being unsupported explicitly seems reasonable.

## References

The implementation is on branch [`weaton/p2p-grpc-auth`](https://github.com/wseaton/modelexpress/tree/weaton/p2p-grpc-auth) (draft PR [wseaton/modelexpress#1](https://github.com/wseaton/modelexpress/pull/1) for inline review); the optional transport TLS contribution is on [`weaton/tls`](https://github.com/wseaton/modelexpress/tree/weaton/tls). Operator documentation is in [`docs/DEPLOYMENT.md`](../DEPLOYMENT.md) under "P2P Caller Authentication" (auth branch) and "Transport TLS" (TLS branch).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Coordinator gRPC authorization using device assertions #298

P2P gRPC security

Summary

Motivation

Prior art

Design

Verification flow

State and caching

Modes and defaults

Operator and client experience

Alternatives considered

Notes

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[RFC] Coordinator gRPC authorization using device assertions #298

Description

P2P gRPC security

Summary

Motivation

Prior art

Design

Verification flow

State and caching

Modes and defaults

Operator and client experience

Alternatives considered

Notes

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions