Summary
Add a new driver interface function agent_tool_counts that parses
session JSONL and counts research vs mutation tool calls. Append the
counts to the existing stats TSV and display the ratio in both
costs.sh and dashboard.sh.
Design
New driver interface function: agent_tool_counts
Each driver parses its own JSONL format and outputs reads\tedits
(tab-separated).
| Driver |
Reads (research) |
Edits (mutation) |
| Claude Code |
Read, Grep, Glob |
Write, Edit, MultiEdit |
| Codex CLI |
command_execution, web_search, mcp_tool_call |
file_change |
| Gemini CLI |
read_file, grep_search, list_directory |
write_file, edit_file |
Shell/Bash is excluded from both categories (ambiguous -- could be
research or mutation).
Extend the stats TSV
Append two columns to the existing 9-column stats_agent_N.tsv:
... | turns | reads | edits
Backward compatible: awk treats missing columns as 0 for older rows.
Display in costs.sh and dashboard.sh
Both scripts already read the TSV via read_agent_stats. Add a R:E
column showing the ratio, e.g.:
# Model Cost Input Output Cache Turns R:E Time
1 claude-opus-4-6 (h) $0.4200 120k 45k 890k 12 6.2 1h 23m
2 claude-opus-4-6 (h) $0.3800 105k 38k 820k 10 2.1 1h 05m
Format: ratio as %.1f when edits > 0, - when both are 0.
References
Motivation
A recent report on Claude Code quality regression
(anthropics/claude-code#42796)
and Boris Cherny's analysis
showed that a key leading indicator of degraded code quality is the
read:edit ratio -- the number of research tool calls (Read, Grep,
Glob) divided by mutation tool calls (Write, Edit).
In their measurements, the ratio dropped from 6.6 to 2.0, meaning
the agent shifted from carefully reading code before editing to
"edit-happy" behavior -- mutating code without sufficient context
gathering. This correlated directly with worse output quality.
The swarm already records every tool call in its session JSONL logs
(agent_logs/agent_<id>_<commit>_<ts>.log). We should compute and
surface this metric so operators can detect degradation per agent,
per session, and over time.
Summary
Add a new driver interface function
agent_tool_countsthat parsessession JSONL and counts research vs mutation tool calls. Append the
counts to the existing stats TSV and display the ratio in both
costs.shanddashboard.sh.Design
New driver interface function:
agent_tool_countsEach driver parses its own JSONL format and outputs
reads\tedits(tab-separated).
Read,Grep,GlobWrite,Edit,MultiEditcommand_execution,web_search,mcp_tool_callfile_changeread_file,grep_search,list_directorywrite_file,edit_fileShell/Bash is excluded from both categories (ambiguous -- could be
research or mutation).
Extend the stats TSV
Append two columns to the existing 9-column
stats_agent_N.tsv:Backward compatible: awk treats missing columns as 0 for older rows.
Display in costs.sh and dashboard.sh
Both scripts already read the TSV via
read_agent_stats. Add aR:Ecolumn showing the ratio, e.g.:
Format: ratio as
%.1fwhen edits > 0,-when both are 0.References
Motivation
A recent report on Claude Code quality regression
(anthropics/claude-code#42796)
and Boris Cherny's analysis
showed that a key leading indicator of degraded code quality is the
read:edit ratio -- the number of research tool calls (Read, Grep,
Glob) divided by mutation tool calls (Write, Edit).
In their measurements, the ratio dropped from 6.6 to 2.0, meaning
the agent shifted from carefully reading code before editing to
"edit-happy" behavior -- mutating code without sufficient context
gathering. This correlated directly with worse output quality.
The swarm already records every tool call in its session JSONL logs
(
agent_logs/agent_<id>_<commit>_<ts>.log). We should compute andsurface this metric so operators can detect degradation per agent,
per session, and over time.