Skip to content

rfc-0010: gateway interceptors#1927

Open
drew wants to merge 7 commits into
mainfrom
gateway-hooks
Open

rfc-0010: gateway interceptors#1927
drew wants to merge 7 commits into
mainfrom
gateway-hooks

Conversation

@drew

@drew drew commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Note

The RFC is open for feedback.

Summary

Adds RFC-0010 for Gateway Interceptors, a proposed gateway extension system for deployment-specific business logic around OpenShell gateway API operations.

Operators and external integrators need a flexible way to customize gateway API behavior to fit their own requirements — for example, enforcing tenancy, quotas, naming conventions, or policy authority. Today any such customization has to be hardcoded into gateway handlers or pushed into drivers, which mixes responsibilities and does not scale to deployment-specific requirements.

This RFC proposes a first-class extension system that lets external services observe, modify, validate, reject, or audit gateway operations at well-defined phases. We call these Gateway Interceptors.

Related to #1919

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Signed-off-by: Drew Newberry <anewberry@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

drew added 2 commits June 16, 2026 00:05
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
@drew drew changed the title docs(rfc): add gateway interceptors RFC rfc-0010: gateway interceptors Jun 16, 2026
@drew drew added the rfc label Jun 16, 2026
@drew drew moved this from Todo to In progress in OpenShell Roadmap Jun 16, 2026
@ddurst-nvidia

Copy link
Copy Markdown
Contributor

Acknowledging: DRAFT NOT READY FOR COMMENTS

I do have comments, whenever you're ready for them.

drew added 2 commits June 23, 2026 00:06
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
@drew drew marked this pull request as ready for review June 23, 2026 07:36
@drew drew requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners June 23, 2026 07:36
Comment thread rfc/0010-gateway-interceptors/README.md
Signed-off-by: Drew Newberry <anewberry@nvidia.com>

This RFC proposes a first-class extension system that lets external services
observe, modify, validate, reject, or audit gateway operations at well-defined
phases. We call these **Gateway Interceptors**.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model seems similar to the middleware model that @pimlock is proposing in #1733. Is there some reasonable common name that could be used for both that signify intent (with qualifiers for which part of the system they are applied to)?

@ddurst-nvidia ddurst-nvidia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the RFC, having a "nameable" GatewayInterceptor helps on readability, and reduces a lot of ambiguity I was seeing with "Interceptor."


Every interceptor service has a timeout and response size limit. Gateway API
interceptor bindings also have a maximum patch count.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC defines endpoint configuration and per-review timeout behavior, but it does not say how review calls are bounded under load.

Since tonic clients fit Tower’s Service model, I’d like the RFC to describe the review path as a tower::Service stack and define the intended use of existing Layers for timeout, concurrency limits, buffering, load shedding, retry policy, and tracing.

That keeps the RFC focused on operational semantics while pointing implementers toward well-known, configurable, tested Tower layers rather than bespoke versions of the same behavior, which would be easy to confuse with a tonic::service::Interceptor.

If we don't want to bind the RFC to Rust or the tower::Service ecosystem (which this project already is), that's fine too, but we should still define the intended operational semantics and suggest that implementations use the appropriate service-middleware framework for their runtime.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to prescribe a service stack (rust + tonic). This RFC is intended to detail the contract and semantics for hooking into gateway calls. I think it'd be perfectly fine if someone wants to write a gRPC service using a different stack. It just needs to conform to the proto contract.

As part of the implementation we can give some examples for building an interceptor.

Does that address this comment?


- `grpc://host:port` connects to a plaintext gRPC interceptor service over TCP.
- `grpcs://host:port` connects to a TLS-protected gRPC interceptor service over TCP.
- `unix:///path/to/socket` connects to a gRPC interceptor service over a Unix domain

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC still allows interceptor endpoints as unix:///path, but it does not define how the gateway authenticates the service behind that socket.

The interceptor service is external, the gateway only dials a configured path; it does not create, bind, or own the socket. Pathname reachability is not peer identity. A squatted socket path, writable parent directory, stale socket, or a typo can make the wrong process the policy authority, and that process can return allowed=true; fail_closed does not help when the RPC succeeds.

If plaintext UDS remains supported, the RFC should specify the required trust checks:

  • filesystem sockets only
  • no abstract sockets
  • an operator-only socket directory that is never mounted into sandboxes (no matter the policy)
  • no symlink traversal/path substitution
  • owner/mode/type verification before connect
  • peer-credential verification where the platform supports it.

It should also call out that UID/GID permissions are not a reliable sandbox boundary for rootful Docker deployments unless user namespace remapping (userns-remap) or an equivalent isolation property is required. The enable_user_namespaces flag in OpenShell is implemented in the k8s driver only.

But the simplest approach is: Require an authenticated transport for interceptor services, such as TLS/mTLS, so the gateway authenticates the interceptor by cryptographic identity rather than by pathname access.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below on 219 I start to address some of this

Remote gRPC interceptors require authentication. The exact approach is out of scope for this RFC, but the implementation should support mTLS and
bearer-token authentication.

I don't understand this feedback either

It should also call out that UID/GID permissions are not a reliable sandbox boundary for rootful Docker deployments

This seems unrelated but perhaps I'm missing a nuance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants