Skip to content

feat(filter): implement token_count and x_token_headers filters (#220…#618

Open
mkoushni wants to merge 3 commits into
praxis-proxy:mainfrom
mkoushni:feat/220-token-counting-integration-tests
Open

feat(filter): implement token_count and x_token_headers filters (#220…#618
mkoushni wants to merge 3 commits into
praxis-proxy:mainfrom
mkoushni:feat/220-token-counting-integration-tests

Conversation

@mkoushni

Copy link
Copy Markdown
Contributor

feat(filter): implement token_count and x_token_headers filters (#220)

Summary

Implements the two new AI-inference filters described in proposal #220:

  • token_count — extracts token usage from AI provider responses and writes the counts to FilterContext metadata.
  • x_token_headers — reads token counts from FilterContext metadata and injects X-Token-Input, X-Token-Output, and X-Token-Total as downstream response headers.

Both filters are gated behind the ai-inference feature flag.


What changed

New filters

filter/src/builtins/http/ai/token_count.rs

Extracts token usage from five provider response formats:

Provider Source Strategy
openai / azure usage.prompt_tokens / completion_tokens JSON body (non-streaming) or last SSE data chunk
anthropic Split events: input_tokens in message_start, output_tokens in message_delta Full SSE stream buffering
google usageMetadata.promptTokenCount / candidatesTokenCount JSON body or last SSE data chunk
bedrock (Converse) usage.inputTokens / outputTokens JSON body or last SSE data chunk
bedrock_invoke_model x-amzn-bedrock-input-token-count / x-amzn-bedrock-output-token-count Upstream response headers (no body parsing)

Token counts are written to FilterContext metadata under token.input, token.output, and token.total via ctx.set_token_usage().

SSE streams are fully buffered (BodyMode::StreamBuffer) and parsed at end-of-stream. Anthropic's split-event model is handled explicitly since input_tokens and output_tokens arrive in different SSE event types.

filter/src/builtins/http/ai/x_token_headers.rs

Reads token.input, token.output, and token.total from FilterContext metadata in on_response and injects them as response headers. Only injects when all three values are present.

Filter ordering note: x_token_headers must be placed before token_count in the filter chain. Since response filters execute in reverse pipeline order, this ensures token_count.on_response runs first (setting metadata for Bedrock InvokeModel from upstream headers), then x_token_headers.on_response reads that metadata and injects the headers.

For body-based providers the counts are populated in token_count.on_response_body (after the response header phase), so x_token_headers relies on token.input being set before it runs — which is only possible for Bedrock InvokeModel in the current Pingora/Praxis lifecycle. For all other providers, token counts are written to FilterContext metadata and are available to access-log templates and downstream logging hooks.

Wiring

  • filter/src/builtins/http/ai/mod.rs — declares and re-exports both new modules.
  • filter/src/builtins/http/mod.rs — re-exports TokenCountFilter and XTokenHeadersFilter.
  • filter/src/builtins/mod.rs — re-exports both filters from the top-level builtins.
  • filter/src/registry.rs — registers token_count and x_token_headers with the FilterRegistry.

Example configuration

examples/configs/ai/token-counting.yaml — a minimal working config showing the correct filter order:

filter_chains:
  - name: token-counting
    filters:
      - filter: x_token_headers      # index 0 — runs after token_count in reverse response phase
      - filter: token_count
        provider: openai             # openai | anthropic | google | bedrock | bedrock_invoke_model | azure
      - filter: access_log
      - filter: router
        routes:
          - path_prefix: "/"
            cluster: ai-provider
      - filter: load_balancer
        clusters:
          - name: ai-provider
            endpoints:
              - "127.0.0.1:8000"

Integration tests

tests/integration/tests/suite/examples/token_counting.rs — 13 integration tests covering the full matrix from proposal #220:

Test What is verified
openai_non_streaming_extracts_token_counts 200 OK + body passthrough
anthropic_non_streaming_extracts_token_counts 200 OK + body passthrough
google_non_streaming_extracts_token_counts 200 OK + body passthrough
bedrock_converse_non_streaming_extracts_token_counts 200 OK + body passthrough
bedrock_invoke_model_extracts_token_counts_from_headers 200 OK + X-Token-* headers present
azure_non_streaming_extracts_token_counts 200 OK + body passthrough
openai_streaming_extracts_token_counts 200 OK + SSE body passthrough
anthropic_streaming_split_events_extracts_token_counts 200 OK + SSE body passthrough
google_streaming_no_done_sentinel_extracts_token_counts 200 OK + SSE body passthrough
token_count_response_body_passes_through_unchanged Body byte-for-byte identical
missing_usage_fields_no_token_headers_injected X-Token-* headers absent when no usage
openai_streaming_whitespace_and_comments Noisy SSE (keep-alive comments, empty lines, pretty JSON) passes through unchanged
example_config_token_counting_openai Example config smoke test — config parses and proxy returns 200

Why X-Token-* headers are only asserted for Bedrock InvokeModel:
In the Pingora HTTP/1.1 proxy, response headers are committed to the downstream connection before the response body filter phase begins. x_token_headers runs its on_response hook (where headers can be written) in reverse pipeline order, which means it executes after token_count.on_response but before token_count.on_response_body. For Bedrock InvokeModel, token counts arrive in upstream response headers and are extracted in on_response, making them immediately available. For every other provider, counts come from the response body and are set in on_response_body, at which point the response header phase has already completed. Extraction correctness for body-based providers is covered by the existing unit tests in builtins::http::ai::token_usage::tests.


Test plan

  • cargo test --features ai-inference — 3107 tests pass (1 pre-existing skip: h2spec_strict_conformance requires the h2spec binary not present in CI)
  • all_example_configs_parse passes with the new token-counting.yaml
  • All 13 new examples::token_counting::* integration tests pass
  • All existing builtins::http::ai::token_usage::* unit tests continue to pass

Related

…is-proxy#220) Add two new AI-inference filters that extract and surface token usage from provider responses

Signed-off-by: mkoushni <mkoushni@redhat.com>
@mkoushni mkoushni force-pushed the feat/220-token-counting-integration-tests branch from af6b203 to 9da76f8 Compare June 21, 2026 09:07
@mkoushni mkoushni marked this pull request as ready for review June 21, 2026 09:51
@mkoushni mkoushni requested review from a team June 21, 2026 09:51

@praxis-bot praxis-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Summary: Adds token_count and x_token_headers filters to extract AI provider token usage from responses and inject X-Token-* headers.

Overall this is well-structured work with thorough integration tests across all provider formats. The SSE parsing, Bedrock header extraction, and body passthrough are all clean. Three issues need attention before merge.

Severity Count
Critical 1
Medium 2

Non-inline findings

[Medium] examples/README.md is not updated with the new token-counting.yaml entry. Project conventions (CLAUDE.md > Test Requirements, item 5) require updating examples/README.md to list any new example configs.

[Medium] filter/src/registry.rs -- the builtins_registered test does not assert token_count or x_token_headers. Every other registered filter has a corresponding assertion in that test. Add:

#[cfg(feature = "ai-inference")]
assert!(names.contains(&"token_count"), "token_count should be registered");
#[cfg(feature = "ai-inference")]
assert!(names.contains(&"x_token_headers"), "x_token_headers should be registered");

Comment thread filter/src/builtins/http/ai/x_token_headers.rs
Comment thread filter/src/builtins/http/ai/x_token_headers.rs
mkoushni added 2 commits June 22, 2026 18:24
…README entry and fix(filter): replace unwrap with HeaderValue::from in x_token_headers

Signed-off-by: mkoushni <mkoushni@redhat.com>
@shaneutt shaneutt self-assigned this Jun 22, 2026
@shaneutt shaneutt added this to the v0.4.0 milestone Jun 22, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in AI Gateway Jun 22, 2026
@shaneutt shaneutt moved this from Backlog to Review in AI Gateway Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

3 participants