Skip to content

feat(ext-proc): add standalone full-duplex exchange core#627

Merged
shaneutt merged 4 commits into
praxis-proxy:mainfrom
nerdalert:brent-ext-proc-full-duplex-core
Jun 22, 2026
Merged

feat(ext-proc): add standalone full-duplex exchange core#627
shaneutt merged 4 commits into
praxis-proxy:mainfrom
nerdalert:brent-ext-proc-full-duplex-core

Conversation

@nerdalert

Copy link
Copy Markdown
Member

Summary

This PR adds the standalone ext_proc duplex exchange core needed for the upcoming generic full-duplex request-routing integration.

It introduces ExtProcExchange, a request-scoped transport/state-machine abstraction around one persistent bidirectional gRPC ExternalProcessor.Process stream. The exchange can send processor requests and receive processor responses independently, which is required for processors that defer their routing decision until after body EOS.

This PR does not wire the exchange into ExtProcFilter behavior yet. It keeps full-duplex config acceptance and request-routing integration for the follow-up PR.

This PR is the internal async state machine for one persistent ext_proc Process stream. It does not expose a new user-facing feature yet; it provides the concurrency, ordering, timeout, and close semantics that the request-routing integration uses in the next PR.

The working llm-d PoC integration demo, PR stack details, code walkthrough, and architecture notes live here:

https://github.com/nerdalert/praxis-research-spikes/tree/main/demo/llm-d-track-b

For a detailed walkthrough of the research spike and alternatives researched see (should probably put that somewhere for posterity like an upstream discussion or a spike doc somewhere):

https://github.com/nerdalert/praxis-research-spikes/blob/main/demo/llm-d-track-b/code-walkthrough.md#full-duplex-async-performance-model

What Changed

1. Standalone Duplex Exchange Core

Adds filter/ext-proc/src/duplex.rs with a standalone ExtProcExchange.

The exchange owns one bidirectional ExternalProcessor.Process stream and exposes a small internal API:

ExtProcExchange::open(channel, &config)
ExtProcExchange::send(request_variant)
ExtProcExchange::receive()
ExtProcExchange::finish_sending()
ExtProcExchange::drain_trailing()

send() and receive() are intentionally independent. A caller can send headers, body chunks, and body EOS before waiting for the processor response. That avoids the deadlock pattern
where a processor waits for the full body before sending a header response.

PR2 keeps this as a testable transport core only. It does not modify ExtProcFilter to use the exchange in production request routing yet.

2. Single-Owner Process Driver

The exchange constructs the tonic Process future synchronously during open() without polling it immediately.

The response stream is resolved lazily when send() or receive() drives the pending future. This avoids forcing an early server response during stream setup, which would deadlock with processors that wait for request body data before responding.

The design avoids production per-request worker machinery:

  • no tokio::spawn in the exchange core
  • no oneshot response delivery
  • no unbounded queues
  • no Arc<Mutex<_>> to force thread-safety
  • no unsafe impl Sync

The pending tonic future is single-owned and wrapped with SyncWrapper so the type can later live inside request scoped filter state without adding a lock.

3. Bounded Backpressure

Outbound processor requests flow through a bounded mpsc channel with capacity 1.

That keeps memory usage bounded and prevents a request from getting arbitrarily far ahead of the gRPC stream. send() reserves channel capacity before committing state, so backpressure is part of the transactional send path.

While the gRPC stream is still bootstrapping, send() uses tokio::select! to make progress on both:

  • channel reservation
  • pending Process future resolution

This lets the exchange avoid startup deadlocks without spawning a separate exchange task.

4. Transactional Send State

send() follows a strict commit sequence:

  1. compute the send transition without mutation
  2. reserve bounded channel capacity
  3. compute the message deadline, if the message creates an active processing state
  4. send the message into the channel
  5. update exchange state with no await between commit steps

This ensures cancellation before message commit leaves the exchange state unchanged. It also ensures message timeouts start at the send commit boundary, not at the later receive call.

The exchange tracks outbound send state separately for request and response directions, including whether body data was ever committed. That lets response validation distinguish solicited processor output from unsolicited or wrong phase output.

5. Active Processing State and Output Validation

For non-full-duplex modes, each sent message creates an ActiveProcessingState with:

  expected response type
  deadline
  override-consumed flag

Only the exact expected response consumes the active state. Wrong-direction or wrong-type responses are rejected instead of being interpreted as valid output.

For full-duplex body modes, the exchange permits independent send/receive progress without per-message active processing deadlines.

Processor output is also validated transactionally. The exchange validates output phase changes on a local copy first and commits the new phase only after all checks pass. A rejected response cannot corrupt output history.

6. Timeout and Override Handling

Message timeout policy is internal to the exchange. Callers do not decide whether a receive is timed or untimed.

For non-full-duplex active processing states, the exchange stores an absolute deadline when the message is committed. receive() uses that stored deadline.

override_message_timeout envelopes are handled before response classification:

  • override envelopes are consumed
  • populated response data inside an override envelope is ignored
  • invalid override durations are consumed and ignored
  • repeated overrides are ignored
  • override durations are clamped to max_message_timeout
  • negative, out-of-range, zero, and sub-millisecond durations are rejected
  • deadline overflow returns an error instead of panicking

open_timeout is intentionally not part of this PR. open() is synchronous and only constructs the pending Process future.

7. Typed Exchange Events

Processor responses are classified into typed exchange events:

  RequestHeaders
  RequestBody
  RequestTrailers
  ResponseHeaders
  ResponseBody
  ResponseTrailers
  Immediate

Each event preserves the original proto response payload and any dynamic_metadata from the processor envelope.

ImmediateResponse is terminal after a soliciting send. Once terminal, later send/receive calls return Closed.

8. Clean Close Support

finish_sending() half-closes the outbound processor request stream by dropping the sender. This does not immediately close the response side; callers may still receive buffered processor output.

drain_trailing() can consume remaining response stream messages after a successful receive path. This gives follow-up integration code a way to cleanly drain the gRPC stream before dropping the exchange.

9. Test Coverage

This PR adds extensive exchange-level coverage with a mock duplex processor.

Coverage includes:

  • first message includes ProtocolConfiguration
  • later messages omit ProtocolConfiguration
  • request headers and body round trips
  • multiple sends before any receive
  • delayed routing without deadlock
  • response headers/body/trailers classification
  • streamed body chunk classification
  • immediate responses
  • empty stream and transport errors
  • send after finish
  • receive after finish
  • cancellation/drop cleanup
  • concurrent exchanges without crosstalk
  • Send + Sync compatibility
  • terminal state after timeout
  • body mode gating
  • exact active-state matching
  • unsolicited/wrong-phase response rejection
  • rejected response does not advance output phase
  • dynamic metadata preservation
  • override timeout acceptance, clamping, rejection, and repeated override behavior
  • deadline overflow handling
  • deadline starts after send commit
  • bounded-channel cancellation behavior
  • pending Process driver behavior on current-thread runtime

PR Composition

This PR has a large test LOC because the exchange core handles async stream ordering, cancellation, timeouts, and close behavior.

Area Lines Share
Production code 882 18.2%
Test code 3,959 81.8%
Config/build 7 <1%
Total 4,848 100%

Most of the diff is mock processor infrastructure and behavioral coverage for duplex ordering, cancellation, timeout, override, metadata, and close semantics. The production change is limited to the standalone exchange core plus minimal module/dependency wiring.

PR Stack Context

PR Title Status Scope
1 Request-Scoped State and Pipeline Pinning This PR Typed per-filter state on HttpFilterContext, stable filter identity, pipeline pinning across hot reload
2 Duplex Exchange Core Follow-up Standalone ExtProcExchange transport core for one persistent bidirectional gRPC ExternalProcessor.Process stream per request
3 Request-Routing Integration Follow-up Wires full-duplex ext_proc into the filter lifecycle for the llm-d Go EPP request-routing path using generic ext_proc + endpoint_selector

Explicitly Out of Scope

This PR intentionally does not include:

  • endpoint_selector
  • request-routing integration
  • accepting full_duplex_streamed config in ExtProcFilter
  • pre-read mutation provenance
  • structured metadata lifecycle on HttpFilterContext
  • server registration changes
  • e2e smoke scripts
  • benchmark scripts or numbers
  • Go EPP demo updates

Why This Is Needed

The upcoming generic full-duplex ext_proc integration needs one persistent processor stream per HTTP request.

Without this exchange core, the filter would either have to serialize send/receive behavior or hide stream management inside ad hoc integration code. That would make deferred processors, body streaming, timeout handling, cancellation, and response ordering much harder to reason about.

This PR keeps the transport foundation isolated and testable before wiring it into request routing.

Design Notes

The exchange design intentionally uses a state machine plus a single-owner async driver instead of a per-request worker task, a locked shared stream, or a serialized send-then-receive model. That keeps protocol correctness explicit, avoids lock/task overhead on the hot path, preserves bounded backpressure, and prevents deadlocks with processors that wait for body EOS before responding. More details in the research spike state-machine/async-driver design instead of simpler looking options

Validation

  • cargo test -p praxis-proxy-ext-proc
  • cargo test -p praxis-proxy-ext-proc -- --test-threads=1
  • cargo test -p praxis-proxy-filter
  • cargo test -p praxis-proxy-protocol
  • cargo clippy -p praxis-proxy-ext-proc --all-targets -- -D warnings
  • cargo clippy -p praxis-proxy-filter --all-targets -- -D warnings
  • cargo clippy -p praxis-proxy-protocol --all-targets -- -D warnings
  • cargo +nightly fmt -p praxis-proxy-ext-proc -p praxis-proxy-filter -p praxis-proxy-protocol -- --check
  • cargo doc -p praxis-proxy-ext-proc --no-deps --document-private-items
  • cargo doc -p praxis-proxy-filter --no-deps --document-private-items
  • cargo doc -p praxis-proxy-protocol --no-deps --document-private-items
  • git diff --check

@nerdalert nerdalert requested a review from a team June 18, 2026 04:06
@nerdalert nerdalert requested review from shaneutt and twghu as code owners June 18, 2026 04:06
@praxis-bot-app

Copy link
Copy Markdown

PR too large: 5358 lines added (limit: 750, excludes Cargo files, tests, docs, examples, and benchmarks). Please split into smaller PRs. Add skip/pr-hygiene label to override.

@nerdalert nerdalert force-pushed the brent-ext-proc-full-duplex-core branch 2 times, most recently from 068ad2f to b134e2a Compare June 18, 2026 04:34

@praxis-bot praxis-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Summary: Adds a standalone ExtProcExchange duplex state machine for persistent bidirectional gRPC ext_proc streams, to be wired into the filter lifecycle in a follow-up PR.

Overall: Well-designed state machine with strong test coverage (82% test code). The transactional send/receive model, output validation, and override handling are thorough. A few correctness issues to address.

Severity Count
Large 2
Medium 3

Findings without inline placement

None -- all findings placed inline.

Comment thread filter/ext-proc/src/duplex.rs
Comment thread filter/ext-proc/src/duplex.rs Outdated
Comment thread filter/ext-proc/src/duplex.rs
Comment thread filter/ext-proc/src/duplex.rs
Comment thread filter/ext-proc/src/duplex.rs Outdated
@nerdalert nerdalert force-pushed the brent-ext-proc-full-duplex-core branch from b134e2a to d03cdff Compare June 19, 2026 03:32
Adds the standalone duplex exchange core for persistent bidirectional ExternalProcessor.Process streams.

The exchange owns a single Process stream, uses capacity-1 backpressure, validates request/response ordering through explicit state, and keeps send/receive independent so deferred processors can receive body EOS before responding.

This is intentionally not wired into ExtProcFilter request routing yet; that integration remains follow-up scope.

Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
@nerdalert nerdalert force-pushed the brent-ext-proc-full-duplex-core branch from d03cdff to d58a913 Compare June 19, 2026 03:45
@shaneutt shaneutt self-assigned this Jun 19, 2026
@shaneutt shaneutt added the skip/pr-conventions Skip conventions checks for PRs label Jun 19, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in AI Gateway Jun 19, 2026
@shaneutt shaneutt moved this from Backlog to Review in AI Gateway Jun 19, 2026
@shaneutt shaneutt added this to the v0.4.0 milestone Jun 19, 2026
@shaneutt shaneutt merged commit ec30d4b into praxis-proxy:main Jun 22, 2026
16 checks passed
@github-project-automation github-project-automation Bot moved this from Review to Done in AI Gateway Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip/pr-conventions Skip conventions checks for PRs

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants