[Feature Request] Add codegraph gain for estimated token savings analytics

## Summary

Add a `codegraph gain` command (or `codegraph context --token-stats`) that reports estimated token savings from CodeGraph-assisted exploration.

## Problem

CodeGraph already advertises large reductions in tool calls and exploration time, but there is no local way for a user to answer:

- How many tokens did this CodeGraph query emit?
- How many tokens did it likely save compared with reading the relevant files directly?
- How much value has CodeGraph provided over recent local sessions?

Tools such as token-optimizing command proxies can report a `gain` view because they compare raw output with filtered output and persist per-command estimates. CodeGraph could expose a similar, clearly-estimated analytics view for graph-assisted code exploration.

## Proposed solution

Add a token savings estimator with two parts:

1. Per-query stats for `codegraph context` / MCP context tools
2. Aggregated history via a new `codegraph gain` command

Example CLI output:

```text
$ codegraph context "How does tool execution work?" --token-stats

CodeGraph token stats
Query: How does tool execution work?
Output tokens estimated: 8,420
Related files: 14
Full-file baseline estimated: 61,300
Estimated tokens saved: 52,880
Estimated savings: 86.3%
Method: ceil(chars / 4), baseline = full contents of related files
```

Example aggregate output:

```text
$ codegraph gain

CodeGraph Gain
Queries tracked: 37
Output tokens estimated: 214,000
Baseline tokens estimated: 1,920,000
Estimated tokens saved: 1,706,000
Estimated savings: 88.9%
```

## Suggested calculation model

A simple first version could mirror the pragmatic approach used by CLI output filters:

```text
output_tokens_est = ceil(context_output_chars / 4)
baseline_tokens_est = ceil(sum(chars of relatedFiles full file contents) / 4)
saved_tokens_est = max(0, baseline_tokens_est - output_tokens_est)
savings_pct = saved_tokens_est / baseline_tokens_est * 100
```

For MCP calls, the same stats could be included in the response metadata or persisted locally.

## Why this baseline is useful

This is not a perfect counterfactual for what an agent would have done without CodeGraph. The actual non-CodeGraph path might involve grep, glob, partial reads, repeated reads, or exploratory mistakes.

However, the full-related-files baseline is still useful because it is:

- deterministic
- local-only
- cheap to compute
- easy to explain
- directionally aligned with CodeGraph's value proposition
- comparable across projects and sessions

The command should label the result as an estimate, not exact model billing tokens.

## Optional enhancements

- `--format json` for dashboards and agent integrations
- `--since 7d`, `--project`, `--daily`, `--history` for local analytics
- Allow a configurable tokenizer later, while keeping `chars / 4` as the default fallback
- Track `tool_calls_avoided_est` using a simple related-files/read-count model
- Separate stats for CLI and MCP usage
- Include `method` and `baseline` fields so downstream tools do not mistake estimates for exact usage

## Acceptance criteria

- `codegraph context ... --token-stats` can show estimated output tokens, baseline tokens, saved tokens, and savings percent
- `codegraph gain` can show aggregate local savings history
- JSON output is available for both per-query and aggregate stats
- The docs clearly explain that the numbers are estimates and define the baseline
- No source code leaves the machine; history is stored locally, similar to the existing local `.codegraph` data model


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add codegraph gain for estimated token savings analytics #513

Summary

Problem

Proposed solution

Suggested calculation model

Why this baseline is useful

Optional enhancements

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature Request] Add codegraph gain for estimated token savings analytics #513

Description

Summary

Problem

Proposed solution

Suggested calculation model

Why this baseline is useful

Optional enhancements

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions