Problem
MCP tool return payloads are not tracked separately in Claude Code's JSONL token data — they get folded into cache read/creation tokens. This makes it impossible to accurately measure per-skill or per-tool cost.
For example, in a session where read_claude_session returned ~10K tokens of JSON, the cost was only visible as a ~33K cache_creation jump in the corresponding turn — with no way to attribute that growth to the specific MCP call vs. other context expansion.
Proposed Solution
Each MCP tool response should include an estimated_tokens field reporting the approximate token count of the response payload. This allows the agent to:
- Attribute context/cache growth to specific tool calls
- Track per-skill token cost across sessions
- Build token budgets for session planning
- Identify expensive tools that could benefit from response trimming
Example
{
"result": { ... },
"estimated_tokens": 1250
}
Implementation Notes
- Use a simple heuristic:
len(json_string) / 4 as a rough token estimate
- Or integrate
tiktoken for more accurate counts (adds dependency)
- Include in all tool responses, especially heavy ones like
read_claude_session, prepare_session, list_sessions
- Consider adding a
token_budget parameter to tools so callers can request truncated responses
Context
Discovered during session token analysis: cache tokens dominate Claude Code sessions (~94% of total). Actual input/output is <0.1%. Understanding which MCP calls contribute most to cache growth is key for optimization.
🤖 Generated with Claude Code
Problem
MCP tool return payloads are not tracked separately in Claude Code's JSONL token data — they get folded into cache read/creation tokens. This makes it impossible to accurately measure per-skill or per-tool cost.
For example, in a session where
read_claude_sessionreturned ~10K tokens of JSON, the cost was only visible as a ~33Kcache_creationjump in the corresponding turn — with no way to attribute that growth to the specific MCP call vs. other context expansion.Proposed Solution
Each MCP tool response should include an
estimated_tokensfield reporting the approximate token count of the response payload. This allows the agent to:Example
{ "result": { ... }, "estimated_tokens": 1250 }Implementation Notes
len(json_string) / 4as a rough token estimatetiktokenfor more accurate counts (adds dependency)read_claude_session,prepare_session,list_sessionstoken_budgetparameter to tools so callers can request truncated responsesContext
Discovered during session token analysis: cache tokens dominate Claude Code sessions (~94% of total). Actual input/output is <0.1%. Understanding which MCP calls contribute most to cache growth is key for optimization.
🤖 Generated with Claude Code