Skip to content

feat(ext-proc): integrate full-duplex request routing#707

Draft
nerdalert wants to merge 1 commit into
praxis-proxy:mainfrom
nerdalert:brent-ext-proc-full-duplex-routing
Draft

feat(ext-proc): integrate full-duplex request routing#707
nerdalert wants to merge 1 commit into
praxis-proxy:mainfrom
nerdalert:brent-ext-proc-full-duplex-routing

Conversation

@nerdalert

@nerdalert nerdalert commented Jun 24, 2026

Copy link
Copy Markdown
Member

Summary

This PR lets Praxis ask an external processor where a request should go, wait for the answer after the request body is complete, and then safely route the request to that selected backend.

This PR wires the standalone full-duplex ext_proc exchange into the Praxis request lifecycle for generic request-routing use cases.

It enables Praxis to speak Envoy's ExternalProcessor.Process protocol with one full-duplex gRPC stream per HTTP request. Praxis can send request headers and body chunks to an external processor, wait for the processor's endpoint decision at request body EOS, and route through a generic endpoint_selector filter.

This is PR3 in the full-duplex ext_proc stack:

Changes

1. Full-Duplex ext_proc Request Lifecycle

Summary: Praxis now keeps one conversation open with the external processor for the whole request instead of making separate disconnected calls.

This PR enables request_body_mode: full_duplex_streamed in ExtProcFilter.

The request lifecycle now supports:

  • sending RequestHeaders with ProtocolConfiguration
  • streaming request body chunks during StreamBuffer pre-read
  • sending an empty terminal body message at EOS
  • draining processor responses after EOS
  • applying processor-returned request header mutations
  • applying streamed body mutations when present
  • preserving the original request body when the processor body response contains no body mutation

The first ext_proc message is sent through open_with_request_headers(), which atomically queues and commits the initial RequestHeaders envelope. That avoids split state between opening the stream and recording that headers were sent.

2. StreamBuffer Pre-Read Integration

Summary: Praxis reads the request body early enough for the processor to make a routing decision before the request is forwarded upstream.

The protocol layer uses StreamBuffer pre-read to let body filters observe request body chunks before upstream selection happens.

For full-duplex ext_proc, this means:

  • request headers are sent before body chunks
  • body chunks are sent incrementally to the processor
  • body EOS is sent before routing
  • processor responses are drained before the normal request pipeline selects an upstream
  • the original body remains available for forwarding to the selected backend

This is what makes deferred routing work for processors that wait for the full body before choosing a destination.

3. Trusted Mutation Provenance

Summary: Praxis trusts routing headers only when they came from the external processor, never when they came from the client.

The request-routing path needs to distinguish processor-created routing headers from client-supplied spoofed headers.

This PR adds trusted pre-read mutation provenance:

  • processor header mutations are recorded as ordered TrustedHeaderMutation values
  • client-supplied request headers are never used as trusted routing input
  • pending mutations from the normal pipeline still take precedence
  • pre-read processor mutations remain available only for the request-routing decision
  • provenance is cleared after the request pipeline consumes it

This prevents a client from setting x-gateway-destination-endpoint directly and selecting an arbitrary upstream.

4. Ordered Header Mutation Semantics

Summary: If the processor changes its mind across multiple responses, Praxis follows the final ordered mutation result instead of mixing old and new values.

The mutation resolver preserves order-sensitive semantics for:

  • Remove
  • Set
  • Add

The resolver applies mutations in order and rejects only final ambiguous states. For example:

  • Add(A), Add(A) is accepted as the same value
  • Add(A), Add(B) is rejected as ambiguous
  • Add(A), Remove, Set(B) resolves to B

This matters because the external processor can return multiple responses during pre-read, and later responses must be able to override earlier ones deterministically.

5. Generic endpoint_selector Filter

Summary: A generic filter takes the trusted destination from the processor, validates it, sets the upstream, and strips the internal header before the backend sees it.

This PR adds a new generic HTTP filter: endpoint_selector.

The filter:

  • reads a trusted destination from pending or pre-read processor mutations
  • never reads original client headers for routing
  • validates host:port syntax
  • rejects empty, ambiguous, malformed, URI-like, path-containing, userinfo-containing, invalid IPv6, invalid DNS label, underscore, and port-zero values
  • sets ctx.upstream
  • strips the internal routing header from forwarded requests
  • supports required: true
  • supports configurable status_on_required_failure

Required-mode failures return FilterAction::Reject, not FilterError, so they are not bypassed by fail-open behavior.

6. Structured Metadata Persistence

Summary: Metadata returned by the processor is carried through the request context in a bounded way.

This PR persists processor dynamic metadata as serde_json::Value.

Metadata handling includes:

  • conversion from protobuf values into JSON values
  • non-finite floats mapped safely instead of panicking
  • namespace/key merge behavior
  • per-namespace key bound to avoid unbounded metadata growth
  • persistence from pre-read filter context back into the Pingora request context

This keeps processor metadata available to later phases without making routing depend on it.

7. Server Registration and Example Config

Summary: The generic ext_proc filter can now be used by the normal Praxis server when the feature is enabled.

This PR adds feature-gated ext_proc registration through the server registry.

It also adds a documented example config for the composition:

ext_proc -> endpoint_selector

The example demonstrates the generic request-routing path without a custom llm-d-specific filter.

Concurrency Model

Summary: The request path avoids extra per-request worker tasks and keeps buffering tight so slow processors cannot cause unbounded memory growth.

The request path uses the PR2 exchange state machine:

  • one ExternalProcessor.Process stream per HTTP request
  • single-owner exchange state
  • capacity-1 outbound channel
  • pending tonic Process future resolved lazily
  • no production per-request tokio::spawn
  • no unbounded queue
  • no Arc<Mutex<_>> stream wrapper
  • no unsafe Sync implementation

This matters for processors that wait until request body EOS before returning a routing decision. Praxis must be able to send headers, stream body chunks, and send EOS without first waiting for an early processor response.

Safety Properties

Summary: The routing decision is processor-controlled, validated, fail-closed when required, and hidden from the backend.

This PR preserves these safety properties:

  • client-supplied destination headers are not trusted
  • required endpoint-selection failures return the configured rejection status
  • required-mode rejections are not bypassed by fail-open behavior
  • invalid trusted destinations do not route
  • ambiguous trusted destinations do not route
  • internal routing headers are stripped before backend forwarding
  • repeated requests keep separate Process streams and request state
  • full-duplex request mode skips the legacy response-phase callout so one HTTP request does not create two processor streams
  • Host is treated as a protected request-authority header. Generic ext_proc request mutations cannot append, replace, or remove it, preventing a default append mutation from producing duplicate Host fields on the forwarded request. This is protocol-safety behavior, not llm-d-specific handling.

Scope Boundary

Summary: This PR proves request routing. It does not claim full Envoy ext_proc parity or full llm-d environment validation.

This PR does not implement:

  • full Envoy ext_proc parity
  • response-phase lifecycle processing
  • request trailers through Pingora
  • native in-process llm-d scheduling
  • vLLM behavior
  • Gateway API dynamic pool management
  • real scheduler/cache-aware routing validation

Those are follow-up areas and are not blockers for this request-routing path.

Reviewer Docs

Summary: The supporting docs explain the architecture, code path, PR stack, and validation output.

Reviewer docs are available here:

https://github.com/nerdalert/praxis-research-spikes/tree/main/demo/llm-d-track-b

Relevant pages:

  • Architecture and PR Stack

  • Code Walkthrough

  • Sample Output

  • PR3 Integration Validation Output

    The validation artifacts under demo/llm-d-track-b/validation/ are pushed in the research-spikes repo. The separate hermetic integration-test branch is not part of
    this PR.

    Validation

    Summary: The branch has been rebased on current main and passed lint, unit, ext_proc serial, docs, and diff checks.

    Validated after rebasing on upstream/main:

    • make lint
    • cargo clippy --workspace --all-targets -- -D warnings
    • cargo +nightly-2026-03-28 fmt --all -- --check
    • cargo test -p praxis-proxy-ext-proc -- --test-threads=1
    • make test-unit
    • cargo doc --workspace --no-deps --document-private-items
    • git diff --check

    Current focused counts include:

    • 191 ext-proc tests
    • 2222 filter tests
    • 367 protocol tests
    • 44 server library tests
    • 27 server CLI tests

@nerdalert nerdalert requested review from a team June 24, 2026 03:44
@praxis-bot-app

Copy link
Copy Markdown

PR too large: 3878 lines added (limit: 750, excludes Cargo files, tests, docs, examples, and benchmarks). Please split into smaller PRs. Add skip/pr-conventions label to override.

@nerdalert nerdalert added the skip/pr-conventions Skip conventions checks for PRs label Jun 24, 2026
@nerdalert nerdalert marked this pull request as draft June 24, 2026 04:05
Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
@nerdalert nerdalert force-pushed the brent-ext-proc-full-duplex-routing branch from 078e1c1 to 816e0df Compare June 24, 2026 05:52

@praxis-bot praxis-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #707 Review: feat(ext-proc): integrate full-duplex request routing

This is a substantial and well-structured PR that wires the standalone full-duplex ext_proc exchange into the Praxis request lifecycle. The architecture is sound: trusted mutation provenance prevents SSRF, the endpoint_selector filter validates addresses thoroughly, and the lifecycle timeout bounds prevent unbounded memory growth from slow processors.

Findings: 11 comments (0 blocking, 5 suggested improvements, 6 nits/docs)

Key observations

Correctness: The mutation provenance model is solid -- pre-read mutations are clearly distinguished from client-supplied headers, and the ordered log correctly handles override sequences (Add -> Remove -> Set resolves deterministically). The validate_host_port function covers a comprehensive set of attack vectors (SSRF via scheme injection, path traversal, userinfo, malformed IPv6).

Architecture concern: The stream_buffer pre-read fallback has an implicit mutual-exclusivity contract between legacy mutation queues and pre_read_mutations that is not documented or enforced. A future filter mixing both mechanisms in the same chain would lose mutations silently.

Validation gap: lifecycle_timeout_ms can be configured below message_timeout_ms, which would make the full-duplex drain path always fail. This should be validated similarly to deferred_close_timeout_ms.

Test coverage: Excellent. The test suite covers the full lifecycle (headers-before-body ordering, incremental chunk delivery, EOS-does-not-resend, deferred routing, immediate rejection, configured timeouts), plus edge cases (Host mutation protection, case-insensitive matching, ambiguous trusted values). The endpoint validation tests cover DNS, IPv4, IPv6, scheme injection, path injection, userinfo, port zero, and malformed labels.

See inline comments for details.

// Bootstrap State
// -----------------------------------------------------------------------------

/// Pinned boxed Process future, `Send + 'static` but not `Sync`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate doc comment: This line adds a second /// Pinned boxed Process future ... doc comment above the use statement, while the original one still exists on line 334 above the type alias. One of them should be removed.

reason = "sectioned state-machine implementation keeps domains reviewable"
)]
impl ExtProcExchange {
/// Internal receive with override loop and classification.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment placed on the wrong method: The detailed doc comment describing the internal receive algorithm (steps 1-3 with override loop, classification, and classify_and_validate link) was moved from receive_inner onto ensure_response_stream. The comment describes receive_inner's behavior (override loop, classification), not ensure_response_stream's (which just resolves the bootstrap future). Meanwhile receive_inner on line 928 lost its detailed docs and only has a short summary. The algorithmic doc should stay on receive_inner.

///
/// Uses [`OnceLock`] so initialization happens exactly once,
/// inside whichever Tokio runtime context the first request
/// runs in. Returns `None` if the initial connection fails.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading doc comment: The doc says "Returns None if the initial connection fails" but the method signature is fn channel(&self) -> Channel -- it never returns None. Since connect_lazy() does not fail (it defers connection), this should say something like "The returned channel connects lazily on first use."

let ms = cfg.lifecycle_timeout_ms;
return Err(
format!("ext_proc: lifecycle_timeout_ms ({ms}) exceeds maximum ({MAX_LIFECYCLE_TIMEOUT_MS})").into(),
);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation: lifecycle_timeout_ms can be less than message_timeout_ms: The config validates lifecycle_timeout_ms > 0 and <= 300_000, but does not check whether it is at least as large as message_timeout_ms. A lifecycle timeout shorter than the per-message timeout means the lifecycle will always expire before the first drain receive() can complete, making the full-duplex path always reject with status_on_error. Consider adding a check analogous to the existing deferred_close_timeout_ms >= message_timeout_ms validation above.

};
match eos_result {
Ok(()) => {},
Err(ExchangeError::SendFailed | ExchangeError::Closed) => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SendFailed silently swallowed at debug level: When the EOS body send fails due to SendFailed | Closed, this logs at debug and proceeds to drain. Closed is a clean half-close, but SendFailed could indicate the exchange never bootstrapped (channel creation problem, serialization failure). Consider logging SendFailed at warn to surface connectivity problems, while keeping Closed at debug.

Comment thread filter/src/context.rs
/// Returns an error if a pending `Set` value contains
/// non-text bytes, or if the final state has multiple
/// distinct values.
pub fn pending_header_value(&self, name: &HeaderName) -> Result<PendingHeaderResult, String> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending_header_value assumes removes precede sets: The method checks request_headers_to_remove for presence and separately finds the last set value. When both exist, it returns the set value (remove + set = set), which is correct for the ext_proc mutation path where removes are always applied before sets. However, if a filter ever pushes Set(x) then Remove(x) (intending: set first, then retract), this method would still return Value(...). A brief comment documenting the ordering assumption would help future maintainers.

// mutation queues. This preserves their existing remove -> set
// -> add application order.
for name in &filter_ctx.request_headers_to_remove {
mutation_log.push(TrustedHeaderMutation::Remove(name.clone()));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutual exclusivity contract between mutation sources is implicit: When pre_read_mutations is non-empty, the else-branch discards legacy queues (extra_request_headers, request_headers_to_set, request_headers_to_remove). This works because ext_proc populates both in lockstep, but a future filter that only uses legacy queues while another uses pre_read_mutations would silently lose mutations. Consider adding a debug_assert! or a comment documenting that filters in the same pre-read chain must use the same mutation mechanism.

/// let filter = EndpointSelectorFilter::from_config(&yaml).unwrap();
/// assert_eq!(filter.name(), "endpoint_selector");
/// ```
pub struct EndpointSelectorFilter {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EndpointSelectorFilter does not implement Debug: Unlike ExtProcFilter in this same PR (which has a manual Debug impl), this struct has no Debug implementation. Adding at least a minimal Debug impl showing source_header and required would help with diagnostic logging.

@@ -2049,7 +2112,7 @@ fn make_response() -> praxis_filter::Response {
static TEST_ID_GENERATOR: std::sync::LazyLock<praxis_core::id::IdGenerator> =
std::sync::LazyLock::new(|| praxis_core::id::IdGenerator::with_seed(0));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attribute before doc comment: The #[expect(clippy::too_many_lines, ...)] attribute appears before the /// doc comment. Per Rust convention (and this project's style), doc comments should precede attributes:

// Current:
#[expect(clippy::too_many_lines, reason = "...")]
/// Build a minimal [`HttpFilterContext`] ...
fn make_ctx(...)

// Should be:
/// Build a minimal [`HttpFilterContext`] ...
#[expect(clippy::too_many_lines, reason = "...")]
fn make_ctx(...)

extensions,
filter_metadata,
filter_state,
_pre_read_mutations,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_pre_read_mutations silently discarded at destructuring site: The destructured filter_ctx.pre_read_mutations is bound to _pre_read_mutations and dropped. The comment on line 219 explains why (consumed by endpoint_selector, cleared to prevent stale reuse), but the underscore binding at the destructuring site makes it easy to miss. A brief inline comment at the destructuring would help: // Consumed by request pipeline; cleared below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip/pr-conventions Skip conventions checks for PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants