Skip to content

docs(proposals): add How? section to #210 response-based token counting#643

Open
mkoushni wants to merge 2 commits into
praxis-proxy:mainfrom
mkoushni:feat/210-response-based-token-counting
Open

docs(proposals): add How? section to #210 response-based token counting#643
mkoushni wants to merge 2 commits into
praxis-proxy:mainfrom
mkoushni:feat/210-response-based-token-counting

Conversation

@mkoushni

Copy link
Copy Markdown
Contributor

docs(proposals): add How? section to #210 response-based token counting

Summary

Completes the graduation criteria for proposal #210 by adding the How? section — the design and implementation plan for the token_count filter that extracts token usage from AI provider response bodies and headers.


What changed

docs/proposals/00210_response-based-token-counting.md — 193 lines added, status updated from proposedaccepted.

Open questions resolved

All four open questions from the What?/Why? section are answered in the new How? section:

Question Decision
Provider identification Explicit provider: YAML key — no auto-detection. Azure and OpenAI share the same JSON schema so auto-detection would be ambiguous.
Streaming completion signal BodyMode::StreamBuffer — proxy buffers all response body bytes and delivers them once with end_of_stream: true. Stream close is the authoritative trigger, covering providers that omit [DONE] (Google Gemini).
Streaming accumulation per provider Per-provider strategy: single terminal-chunk scan for OpenAI/Azure/Google/Bedrock Converse; two-event scan (message_start + message_delta) for Anthropic; header-only for Bedrock InvokeModel.
Partial usage data Only the final assembled payload is parsed — no summing of intermediate chunks — to avoid double-counting.

Design content added

  • Requirements — 7 concrete implementation requirements.
  • Filter struct and configTokenCountConfig, ProviderKind enum, TokenCountFilter struct with YAML snippet.
  • HttpFilter hook table — behaviour of each hook (on_request, on_response, response_body_access, response_body_mode, on_response_body).
  • SSE extraction detail — dispatch tree: Anthropic two-event scan vs. last-valid-chunk scan for all other providers.
  • Bedrock InvokeModel path — header-only extraction in on_response; BodyAccess::None prevents unnecessary buffering.
  • FilterContext metadata keystoken.input, token.output, token.total written via ctx.set_token_usage.
  • Module registration — step-by-step wiring for ai/mod.rs, http/mod.rs, builtins/mod.rs, and registry.rs.
  • YAML configuration example — minimal working filter chain showing token_count with provider: openai.

Related

@mkoushni mkoushni marked this pull request as ready for review June 21, 2026 11:02
@mkoushni mkoushni requested a review from a team June 21, 2026 11:02
@mkoushni mkoushni requested review from shaneutt and twghu as code owners June 21, 2026 11:02

@praxis-bot praxis-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Summary: Adds the How? section to the #210 proposal, transitioning status from proposed to accepted. Well-structured design that correctly addresses all four open questions and aligns with the existing token_usage library and FilterContext APIs.

Severity Count
Medium 3

No critical or large issues found. The design decisions are sound -- StreamBuffer for body aggregation, explicit provider: key over auto-detection, and header-only path for Bedrock InvokeModel are all correct choices.

Comment thread docs/proposals/00210_response-based-token-counting.md Outdated
Comment thread docs/proposals/00210_response-based-token-counting.md
Comment thread docs/proposals/00210_response-based-token-counting.md
mkoushni added 2 commits June 22, 2026 18:32
…token counting

Signed-off-by: mkoushni <mkoushni@redhat.com>
- Add listeners block to YAML configuration example
- Align ProviderKind::OpenAi casing with TokenUsageProvider::OpenAi
- Update response_body_mode hook table row to reflect Stream/StreamBuffer split

Signed-off-by: mkoushni <mkoushni@redhat.com>
@mkoushni mkoushni force-pushed the feat/210-response-based-token-counting branch from b7bf13b to 39631ec Compare June 22, 2026 15:41
@shaneutt shaneutt self-assigned this Jun 22, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in AI Gateway Jun 22, 2026
@shaneutt shaneutt moved this from Backlog to Review in AI Gateway Jun 22, 2026
@shaneutt shaneutt added this to the v0.4.0 milestone Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

3 participants