Summary
Add a codegraph gain command (or codegraph context --token-stats) that reports estimated token savings from CodeGraph-assisted exploration.
Problem
CodeGraph already advertises large reductions in tool calls and exploration time, but there is no local way for a user to answer:
- How many tokens did this CodeGraph query emit?
- How many tokens did it likely save compared with reading the relevant files directly?
- How much value has CodeGraph provided over recent local sessions?
Tools such as token-optimizing command proxies can report a gain view because they compare raw output with filtered output and persist per-command estimates. CodeGraph could expose a similar, clearly-estimated analytics view for graph-assisted code exploration.
Proposed solution
Add a token savings estimator with two parts:
- Per-query stats for
codegraph context / MCP context tools
- Aggregated history via a new
codegraph gain command
Example CLI output:
$ codegraph context "How does tool execution work?" --token-stats
CodeGraph token stats
Query: How does tool execution work?
Output tokens estimated: 8,420
Related files: 14
Full-file baseline estimated: 61,300
Estimated tokens saved: 52,880
Estimated savings: 86.3%
Method: ceil(chars / 4), baseline = full contents of related files
Example aggregate output:
$ codegraph gain
CodeGraph Gain
Queries tracked: 37
Output tokens estimated: 214,000
Baseline tokens estimated: 1,920,000
Estimated tokens saved: 1,706,000
Estimated savings: 88.9%
Suggested calculation model
A simple first version could mirror the pragmatic approach used by CLI output filters:
output_tokens_est = ceil(context_output_chars / 4)
baseline_tokens_est = ceil(sum(chars of relatedFiles full file contents) / 4)
saved_tokens_est = max(0, baseline_tokens_est - output_tokens_est)
savings_pct = saved_tokens_est / baseline_tokens_est * 100
For MCP calls, the same stats could be included in the response metadata or persisted locally.
Why this baseline is useful
This is not a perfect counterfactual for what an agent would have done without CodeGraph. The actual non-CodeGraph path might involve grep, glob, partial reads, repeated reads, or exploratory mistakes.
However, the full-related-files baseline is still useful because it is:
- deterministic
- local-only
- cheap to compute
- easy to explain
- directionally aligned with CodeGraph's value proposition
- comparable across projects and sessions
The command should label the result as an estimate, not exact model billing tokens.
Optional enhancements
--format json for dashboards and agent integrations
--since 7d, --project, --daily, --history for local analytics
- Allow a configurable tokenizer later, while keeping
chars / 4 as the default fallback
- Track
tool_calls_avoided_est using a simple related-files/read-count model
- Separate stats for CLI and MCP usage
- Include
method and baseline fields so downstream tools do not mistake estimates for exact usage
Acceptance criteria
codegraph context ... --token-stats can show estimated output tokens, baseline tokens, saved tokens, and savings percent
codegraph gain can show aggregate local savings history
- JSON output is available for both per-query and aggregate stats
- The docs clearly explain that the numbers are estimates and define the baseline
- No source code leaves the machine; history is stored locally, similar to the existing local
.codegraph data model
Summary
Add a
codegraph gaincommand (orcodegraph context --token-stats) that reports estimated token savings from CodeGraph-assisted exploration.Problem
CodeGraph already advertises large reductions in tool calls and exploration time, but there is no local way for a user to answer:
Tools such as token-optimizing command proxies can report a
gainview because they compare raw output with filtered output and persist per-command estimates. CodeGraph could expose a similar, clearly-estimated analytics view for graph-assisted code exploration.Proposed solution
Add a token savings estimator with two parts:
codegraph context/ MCP context toolscodegraph gaincommandExample CLI output:
Example aggregate output:
Suggested calculation model
A simple first version could mirror the pragmatic approach used by CLI output filters:
For MCP calls, the same stats could be included in the response metadata or persisted locally.
Why this baseline is useful
This is not a perfect counterfactual for what an agent would have done without CodeGraph. The actual non-CodeGraph path might involve grep, glob, partial reads, repeated reads, or exploratory mistakes.
However, the full-related-files baseline is still useful because it is:
The command should label the result as an estimate, not exact model billing tokens.
Optional enhancements
--format jsonfor dashboards and agent integrations--since 7d,--project,--daily,--historyfor local analyticschars / 4as the default fallbacktool_calls_avoided_estusing a simple related-files/read-count modelmethodandbaselinefields so downstream tools do not mistake estimates for exact usageAcceptance criteria
codegraph context ... --token-statscan show estimated output tokens, baseline tokens, saved tokens, and savings percentcodegraph gaincan show aggregate local savings history.codegraphdata model