Skip to content

Track read:edit ratio per agent from session JSONL logs #66

@moodmosaic

Description

@moodmosaic

Summary

Add a new driver interface function agent_tool_counts that parses
session JSONL and counts research vs mutation tool calls. Append the
counts to the existing stats TSV and display the ratio in both
costs.sh and dashboard.sh.

Design

New driver interface function: agent_tool_counts

Each driver parses its own JSONL format and outputs reads\tedits
(tab-separated).

Driver Reads (research) Edits (mutation)
Claude Code Read, Grep, Glob Write, Edit, MultiEdit
Codex CLI command_execution, web_search, mcp_tool_call file_change
Gemini CLI read_file, grep_search, list_directory write_file, edit_file

Shell/Bash is excluded from both categories (ambiguous -- could be
research or mutation).

Extend the stats TSV

Append two columns to the existing 9-column stats_agent_N.tsv:

... | turns | reads | edits

Backward compatible: awk treats missing columns as 0 for older rows.

Display in costs.sh and dashboard.sh

Both scripts already read the TSV via read_agent_stats. Add a R:E
column showing the ratio, e.g.:

#    Model                     Cost      Input     Output  Cache    Turns   R:E      Time
1    claude-opus-4-6 (h)    $0.4200     120k       45k    890k       12   6.2      1h 23m
2    claude-opus-4-6 (h)    $0.3800     105k       38k    820k       10   2.1      1h 05m

Format: ratio as %.1f when edits > 0, - when both are 0.

References

Motivation

A recent report on Claude Code quality regression
(anthropics/claude-code#42796)
and Boris Cherny's analysis
showed that a key leading indicator of degraded code quality is the
read:edit ratio -- the number of research tool calls (Read, Grep,
Glob) divided by mutation tool calls (Write, Edit).

In their measurements, the ratio dropped from 6.6 to 2.0, meaning
the agent shifted from carefully reading code before editing to
"edit-happy" behavior -- mutating code without sufficient context
gathering. This correlated directly with worse output quality.

The swarm already records every tool call in its session JSONL logs
(agent_logs/agent_<id>_<commit>_<ts>.log). We should compute and
surface this metric so operators can detect degradation per agent,
per session, and over time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions