feat(filter): implement token_count and x_token_headers filters (#220… by mkoushni · Pull Request #618 · praxis-proxy/praxis

mkoushni · 2026-06-17T13:19:37Z

feat(filter): implement `token_count` and `x_token_headers` filters (#220)

Summary

Implements the two new AI-inference filters described in proposal #220:

token_count — extracts token usage from AI provider responses and writes the counts to FilterContext metadata.
x_token_headers — reads token counts from FilterContext metadata and injects X-Token-Input, X-Token-Output, and X-Token-Total as downstream response headers.

Both filters are gated behind the ai-inference feature flag.

What changed

New filters

`filter/src/builtins/http/ai/token_count.rs`

Extracts token usage from five provider response formats:

Provider	Source	Strategy
`openai` / `azure`	`usage.prompt_tokens` / `completion_tokens`	JSON body (non-streaming) or last SSE data chunk
`anthropic`	Split events: `input_tokens` in `message_start`, `output_tokens` in `message_delta`	Full SSE stream buffering
`google`	`usageMetadata.promptTokenCount` / `candidatesTokenCount`	JSON body or last SSE data chunk
`bedrock` (Converse)	`usage.inputTokens` / `outputTokens`	JSON body or last SSE data chunk
`bedrock_invoke_model`	`x-amzn-bedrock-input-token-count` / `x-amzn-bedrock-output-token-count`	Upstream response headers (no body parsing)

Token counts are written to FilterContext metadata under token.input, token.output, and token.total via ctx.set_token_usage().

SSE streams are fully buffered (BodyMode::StreamBuffer) and parsed at end-of-stream. Anthropic's split-event model is handled explicitly since input_tokens and output_tokens arrive in different SSE event types.

`filter/src/builtins/http/ai/x_token_headers.rs`

Reads token.input, token.output, and token.total from FilterContext metadata in on_response and injects them as response headers. Only injects when all three values are present.

Filter ordering note: x_token_headers must be placed before token_count in the filter chain. Since response filters execute in reverse pipeline order, this ensures token_count.on_response runs first (setting metadata for Bedrock InvokeModel from upstream headers), then x_token_headers.on_response reads that metadata and injects the headers.

For body-based providers the counts are populated in token_count.on_response_body (after the response header phase), so x_token_headers relies on token.input being set before it runs — which is only possible for Bedrock InvokeModel in the current Pingora/Praxis lifecycle. For all other providers, token counts are written to FilterContext metadata and are available to access-log templates and downstream logging hooks.

Wiring

filter/src/builtins/http/ai/mod.rs — declares and re-exports both new modules.
filter/src/builtins/http/mod.rs — re-exports TokenCountFilter and XTokenHeadersFilter.
filter/src/builtins/mod.rs — re-exports both filters from the top-level builtins.
filter/src/registry.rs — registers token_count and x_token_headers with the FilterRegistry.

Example configuration

examples/configs/ai/token-counting.yaml — a minimal working config showing the correct filter order:

filter_chains:
  - name: token-counting
    filters:
      - filter: x_token_headers      # index 0 — runs after token_count in reverse response phase
      - filter: token_count
        provider: openai             # openai | anthropic | google | bedrock | bedrock_invoke_model | azure
      - filter: access_log
      - filter: router
        routes:
          - path_prefix: "/"
            cluster: ai-provider
      - filter: load_balancer
        clusters:
          - name: ai-provider
            endpoints:
              - "127.0.0.1:8000"

Integration tests

tests/integration/tests/suite/examples/token_counting.rs — 13 integration tests covering the full matrix from proposal #220:

Test	What is verified
`openai_non_streaming_extracts_token_counts`	200 OK + body passthrough
`anthropic_non_streaming_extracts_token_counts`	200 OK + body passthrough
`google_non_streaming_extracts_token_counts`	200 OK + body passthrough
`bedrock_converse_non_streaming_extracts_token_counts`	200 OK + body passthrough
`bedrock_invoke_model_extracts_token_counts_from_headers`	*200 OK + X-Token- headers present**
`azure_non_streaming_extracts_token_counts`	200 OK + body passthrough
`openai_streaming_extracts_token_counts`	200 OK + SSE body passthrough
`anthropic_streaming_split_events_extracts_token_counts`	200 OK + SSE body passthrough
`google_streaming_no_done_sentinel_extracts_token_counts`	200 OK + SSE body passthrough
`token_count_response_body_passes_through_unchanged`	Body byte-for-byte identical
`missing_usage_fields_no_token_headers_injected`	X-Token-* headers absent when no usage
`openai_streaming_whitespace_and_comments`	Noisy SSE (keep-alive comments, empty lines, pretty JSON) passes through unchanged
`example_config_token_counting_openai`	Example config smoke test — config parses and proxy returns 200

Why X-Token-* headers are only asserted for Bedrock InvokeModel:
In the Pingora HTTP/1.1 proxy, response headers are committed to the downstream connection before the response body filter phase begins. x_token_headers runs its on_response hook (where headers can be written) in reverse pipeline order, which means it executes after token_count.on_response but before token_count.on_response_body. For Bedrock InvokeModel, token counts arrive in upstream response headers and are extracted in on_response, making them immediately available. For every other provider, counts come from the response body and are set in on_response_body, at which point the response header phase has already completed. Extraction correctness for body-based providers is covered by the existing unit tests in builtins::http::ai::token_usage::tests.

Test plan

cargo test --features ai-inference — 3107 tests pass (1 pre-existing skip: h2spec_strict_conformance requires the h2spec binary not present in CI)
all_example_configs_parse passes with the new token-counting.yaml
All 13 new examples::token_counting::* integration tests pass
All existing builtins::http::ai::token_usage::* unit tests continue to pass

PR Review

Summary: Adds token_count and x_token_headers filters to extract AI provider token usage from responses and inject X-Token-* headers.

Overall this is well-structured work with thorough integration tests across all provider formats. The SSE parsing, Bedrock header extraction, and body passthrough are all clean. Three issues need attention before merge.

Severity	Count
Critical	1
Medium	2

Non-inline findings

[Medium] examples/README.md is not updated with the new token-counting.yaml entry. Project conventions (CLAUDE.md > Test Requirements, item 5) require updating examples/README.md to list any new example configs.

[Medium] filter/src/registry.rs -- the builtins_registered test does not assert token_count or x_token_headers. Every other registered filter has a corresponding assertion in that test. Add:

#[cfg(feature = "ai-inference")]
assert!(names.contains(&"token_count"), "token_count should be registered");
#[cfg(feature = "ai-inference")]
assert!(names.contains(&"x_token_headers"), "x_token_headers should be registered");

…README entry and fix(filter): replace unwrap with HeaderValue::from in x_token_headers Signed-off-by: mkoushni <mkoushni@redhat.com>

feat(filter): implement token_count and x_token_headers filters (prax…

9da76f8

…is-proxy#220) Add two new AI-inference filters that extract and surface token usage from provider responses Signed-off-by: mkoushni <mkoushni@redhat.com>

mkoushni force-pushed the feat/220-token-counting-integration-tests branch from af6b203 to 9da76f8 Compare June 21, 2026 09:07

mkoushni marked this pull request as ready for review June 21, 2026 09:51

mkoushni requested review from a team June 21, 2026 09:51

mkoushni requested review from franciscojavierarceo, leseb, shaneutt and twghu as code owners June 21, 2026 09:51

praxis-bot reviewed Jun 22, 2026

View reviewed changes

Comment thread filter/src/builtins/http/ai/x_token_headers.rs

Comment thread filter/src/builtins/http/ai/x_token_headers.rs

mkoushni added 2 commits June 22, 2026 18:24

fix(filter): add token_count/x_token_headers registry assertions and …

1210456

…README entry and fix(filter): replace unwrap with HeaderValue::from in x_token_headers Signed-off-by: mkoushni <mkoushni@redhat.com>

Merge branch 'main' into feat/220-token-counting-integration-tests

fe798ba

shaneutt self-assigned this Jun 22, 2026

shaneutt added this to AI Gateway Jun 22, 2026

shaneutt added this to the v0.4.0 milestone Jun 22, 2026

github-project-automation Bot moved this to Backlog in AI Gateway Jun 22, 2026

shaneutt moved this from Backlog to Review in AI Gateway Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(filter): implement token_count and x_token_headers filters (#220…#618

feat(filter): implement token_count and x_token_headers filters (#220…#618
mkoushni wants to merge 3 commits into
praxis-proxy:mainfrom
mkoushni:feat/220-token-counting-integration-tests

mkoushni commented Jun 17, 2026

Uh oh!

praxis-bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mkoushni commented Jun 17, 2026

feat(filter): implement token_count and x_token_headers filters (#220)

Summary

What changed

New filters

filter/src/builtins/http/ai/token_count.rs

filter/src/builtins/http/ai/x_token_headers.rs

Wiring

Example configuration

Integration tests

Test plan

Related

Uh oh!

praxis-bot left a comment

Choose a reason for hiding this comment

PR Review

Non-inline findings

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(filter): implement `token_count` and `x_token_headers` filters (#220)

`filter/src/builtins/http/ai/token_count.rs`

`filter/src/builtins/http/ai/x_token_headers.rs`