Skip to content

Add response token estimation to MCP tool responses #5

@sinh-x

Description

@sinh-x

Problem

MCP tool return payloads are not tracked separately in Claude Code's JSONL token data — they get folded into cache read/creation tokens. This makes it impossible to accurately measure per-skill or per-tool cost.

For example, in a session where read_claude_session returned ~10K tokens of JSON, the cost was only visible as a ~33K cache_creation jump in the corresponding turn — with no way to attribute that growth to the specific MCP call vs. other context expansion.

Proposed Solution

Each MCP tool response should include an estimated_tokens field reporting the approximate token count of the response payload. This allows the agent to:

  1. Attribute context/cache growth to specific tool calls
  2. Track per-skill token cost across sessions
  3. Build token budgets for session planning
  4. Identify expensive tools that could benefit from response trimming

Example

{
  "result": { ... },
  "estimated_tokens": 1250
}

Implementation Notes

  • Use a simple heuristic: len(json_string) / 4 as a rough token estimate
  • Or integrate tiktoken for more accurate counts (adds dependency)
  • Include in all tool responses, especially heavy ones like read_claude_session, prepare_session, list_sessions
  • Consider adding a token_budget parameter to tools so callers can request truncated responses

Context

Discovered during session token analysis: cache tokens dominate Claude Code sessions (~94% of total). Actual input/output is <0.1%. Understanding which MCP calls contribute most to cache growth is key for optimization.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions