rfc-0010: gateway interceptors#1927
Conversation
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
|
Acknowledging: I do have comments, whenever you're ready for them. |
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
|
|
||
| This RFC proposes a first-class extension system that lets external services | ||
| observe, modify, validate, reject, or audit gateway operations at well-defined | ||
| phases. We call these **Gateway Interceptors**. |
|
|
||
| Every interceptor service has a timeout and response size limit. Gateway API | ||
| interceptor bindings also have a maximum patch count. | ||
|
|
There was a problem hiding this comment.
The RFC defines endpoint configuration and per-review timeout behavior, but it does not say how review calls are bounded under load.
Since tonic clients fit Tower’s Service model, I’d like the RFC to describe the review path as a tower::Service stack and define the intended use of existing Layers for timeout, concurrency limits, buffering, load shedding, retry policy, and tracing.
That keeps the RFC focused on operational semantics while pointing implementers toward well-known, configurable, tested Tower layers rather than bespoke versions of the same behavior, which would be easy to confuse with a tonic::service::Interceptor.
If we don't want to bind the RFC to Rust or the tower::Service ecosystem (which this project already is), that's fine too, but we should still define the intended operational semantics and suggest that implementations use the appropriate service-middleware framework for their runtime.
There was a problem hiding this comment.
We don't want to prescribe a service stack (rust + tonic). This RFC is intended to detail the contract and semantics for hooking into gateway calls. I think it'd be perfectly fine if someone wants to write a gRPC service using a different stack. It just needs to conform to the proto contract.
As part of the implementation we can give some examples for building an interceptor.
Does that address this comment?
|
|
||
| - `grpc://host:port` connects to a plaintext gRPC interceptor service over TCP. | ||
| - `grpcs://host:port` connects to a TLS-protected gRPC interceptor service over TCP. | ||
| - `unix:///path/to/socket` connects to a gRPC interceptor service over a Unix domain |
There was a problem hiding this comment.
The RFC still allows interceptor endpoints as unix:///path, but it does not define how the gateway authenticates the service behind that socket.
The interceptor service is external, the gateway only dials a configured path; it does not create, bind, or own the socket. Pathname reachability is not peer identity. A squatted socket path, writable parent directory, stale socket, or a typo can make the wrong process the policy authority, and that process can return allowed=true; fail_closed does not help when the RPC succeeds.
If plaintext UDS remains supported, the RFC should specify the required trust checks:
- filesystem sockets only
- no abstract sockets
- an operator-only socket directory that is never mounted into sandboxes (no matter the policy)
- no symlink traversal/path substitution
- owner/mode/type verification before connect
- peer-credential verification where the platform supports it.
It should also call out that UID/GID permissions are not a reliable sandbox boundary for rootful Docker deployments unless user namespace remapping (userns-remap) or an equivalent isolation property is required. The enable_user_namespaces flag in OpenShell is implemented in the k8s driver only.
But the simplest approach is: Require an authenticated transport for interceptor services, such as TLS/mTLS, so the gateway authenticates the interceptor by cryptographic identity rather than by pathname access.
There was a problem hiding this comment.
Below on 219 I start to address some of this
Remote gRPC interceptors require authentication. The exact approach is out of scope for this RFC, but the implementation should support mTLS and
bearer-token authentication.
I don't understand this feedback either
It should also call out that UID/GID permissions are not a reliable sandbox boundary for rootful Docker deployments
This seems unrelated but perhaps I'm missing a nuance.
Note
The RFC is open for feedback.
Summary
Adds RFC-0010 for Gateway Interceptors, a proposed gateway extension system for deployment-specific business logic around OpenShell gateway API operations.
Operators and external integrators need a flexible way to customize gateway API behavior to fit their own requirements — for example, enforcing tenancy, quotas, naming conventions, or policy authority. Today any such customization has to be hardcoded into gateway handlers or pushed into drivers, which mixes responsibilities and does not scale to deployment-specific requirements.
This RFC proposes a first-class extension system that lets external services observe, modify, validate, reject, or audit gateway operations at well-defined phases. We call these Gateway Interceptors.
Related to #1919
Checklist