feat(filter): implement token_count and x_token_headers filters (#220…#618
feat(filter): implement token_count and x_token_headers filters (#220…#618mkoushni wants to merge 3 commits into
Conversation
…is-proxy#220) Add two new AI-inference filters that extract and surface token usage from provider responses Signed-off-by: mkoushni <mkoushni@redhat.com>
af6b203 to
9da76f8
Compare
praxis-bot
left a comment
There was a problem hiding this comment.
PR Review
Summary: Adds token_count and x_token_headers filters to extract AI provider token usage from responses and inject X-Token-* headers.
Overall this is well-structured work with thorough integration tests across all provider formats. The SSE parsing, Bedrock header extraction, and body passthrough are all clean. Three issues need attention before merge.
| Severity | Count |
|---|---|
| Critical | 1 |
| Medium | 2 |
Non-inline findings
[Medium] examples/README.md is not updated with the new token-counting.yaml entry. Project conventions (CLAUDE.md > Test Requirements, item 5) require updating examples/README.md to list any new example configs.
[Medium] filter/src/registry.rs -- the builtins_registered test does not assert token_count or x_token_headers. Every other registered filter has a corresponding assertion in that test. Add:
#[cfg(feature = "ai-inference")]
assert!(names.contains(&"token_count"), "token_count should be registered");
#[cfg(feature = "ai-inference")]
assert!(names.contains(&"x_token_headers"), "x_token_headers should be registered");…README entry and fix(filter): replace unwrap with HeaderValue::from in x_token_headers Signed-off-by: mkoushni <mkoushni@redhat.com>
feat(filter): implement
token_countandx_token_headersfilters (#220)Summary
Implements the two new AI-inference filters described in proposal #220:
token_count— extracts token usage from AI provider responses and writes the counts toFilterContextmetadata.x_token_headers— reads token counts fromFilterContextmetadata and injectsX-Token-Input,X-Token-Output, andX-Token-Totalas downstream response headers.Both filters are gated behind the
ai-inferencefeature flag.What changed
New filters
filter/src/builtins/http/ai/token_count.rsExtracts token usage from five provider response formats:
openai/azureusage.prompt_tokens/completion_tokensanthropicinput_tokensinmessage_start,output_tokensinmessage_deltagoogleusageMetadata.promptTokenCount/candidatesTokenCountbedrock(Converse)usage.inputTokens/outputTokensbedrock_invoke_modelx-amzn-bedrock-input-token-count/x-amzn-bedrock-output-token-countToken counts are written to
FilterContextmetadata undertoken.input,token.output, andtoken.totalviactx.set_token_usage().SSE streams are fully buffered (
BodyMode::StreamBuffer) and parsed at end-of-stream. Anthropic's split-event model is handled explicitly sinceinput_tokensandoutput_tokensarrive in different SSE event types.filter/src/builtins/http/ai/x_token_headers.rsReads
token.input,token.output, andtoken.totalfromFilterContextmetadata inon_responseand injects them as response headers. Only injects when all three values are present.Filter ordering note:
x_token_headersmust be placed beforetoken_countin the filter chain. Since response filters execute in reverse pipeline order, this ensurestoken_count.on_responseruns first (setting metadata for Bedrock InvokeModel from upstream headers), thenx_token_headers.on_responsereads that metadata and injects the headers.For body-based providers the counts are populated in
token_count.on_response_body(after the response header phase), sox_token_headersrelies ontoken.inputbeing set before it runs — which is only possible for Bedrock InvokeModel in the current Pingora/Praxis lifecycle. For all other providers, token counts are written toFilterContextmetadata and are available to access-log templates and downstreamlogginghooks.Wiring
filter/src/builtins/http/ai/mod.rs— declares and re-exports both new modules.filter/src/builtins/http/mod.rs— re-exportsTokenCountFilterandXTokenHeadersFilter.filter/src/builtins/mod.rs— re-exports both filters from the top-level builtins.filter/src/registry.rs— registerstoken_countandx_token_headerswith theFilterRegistry.Example configuration
examples/configs/ai/token-counting.yaml— a minimal working config showing the correct filter order:Integration tests
tests/integration/tests/suite/examples/token_counting.rs— 13 integration tests covering the full matrix from proposal #220:openai_non_streaming_extracts_token_countsanthropic_non_streaming_extracts_token_countsgoogle_non_streaming_extracts_token_countsbedrock_converse_non_streaming_extracts_token_countsbedrock_invoke_model_extracts_token_counts_from_headersazure_non_streaming_extracts_token_countsopenai_streaming_extracts_token_countsanthropic_streaming_split_events_extracts_token_countsgoogle_streaming_no_done_sentinel_extracts_token_countstoken_count_response_body_passes_through_unchangedmissing_usage_fields_no_token_headers_injectedopenai_streaming_whitespace_and_commentsexample_config_token_counting_openaiTest plan
cargo test --features ai-inference— 3107 tests pass (1 pre-existing skip:h2spec_strict_conformancerequires theh2specbinary not present in CI)all_example_configs_parsepasses with the newtoken-counting.yamlexamples::token_counting::*integration tests passbuiltins::http::ai::token_usage::*unit tests continue to passRelated
token_usageextraction library (already merged in Response-based token counting from provider JSON #210)