feat: content-size field + --largest-bodies view (fixes size-aggregate eval regression)#5
Merged
Merged
Conversation
…egate eval regression)
Closes the size-aggregate gap surfaced in the post-PR4 eval rerun, where
hargrep regressed 64% on "which URL has the largest body?" because the
agent chased --size-by-type (MIME-level aggregate) when it needed URL-level
body sizes.
- content-size: new valid --fields value. Emits as contentSize (HAR
camelCase convention). Source is response.content.size (i64; -1 when the
HAR logger didn't know, surfaced raw so callers can filter).
- --largest-bodies[=N]: new aggregate view. Emits [{id, url, mime_type,
content_size}] sorted by content_size desc, limited to top N. Default
N=10. Uses --largest-bodies=N (equals) syntax because plain space
delimiters would be ambiguous with the FILE positional arg — clap's
require_equals keeps the grammar unambiguous.
Respects filters and honors grep-like exit semantics (1 on empty).
Conflicts with every other output/view flag.
Updated --help-llm cheatsheet and README to document both additions.
135 tests pass (55 unit + 80 integration). Clippy + fmt clean.
- clippy 1.95 on CI flagged unnecessary_sort_by in largest_bodies; switched to sort_by_key with Reverse. Local clippy on older Rust didn't trigger it. - Pin fixture-exact sizes in test_fields_includes_content_size so a sort-key swap would surface immediately (per pr-test-analyzer review). - Pin the #1 winner (id 3, PNG) in --largest-bodies tests instead of the tautological sorted-desc check. Added limit=1 test. - Widen test_largest_bodies_conflicts_with_other_views to cover all four view flags (was testing only --overview). - Add 4 inline unit tests to src/aggregates.rs for largest_bodies covering sort, limit truncation, limit=0, and -1 (unknown) sinking with stable tie-breaking. - Doc comment on largest_bodies explicitly notes -1 semantics and stable sort. - Updated aggregate_exit_code docstring to list --largest-bodies. - README notes the -1-sinks-to-bottom behavior. 140 tests pass (59 unit + 81 integration). Clippy + fmt clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes a regression surfaced by the post-PR4 eval rerun on
small.har:This PR adds the missing URL-level primitive and the derived view.
What changed
`content-size` — new `--fields` value. Emits as `contentSize` (matching HAR camelCase). Source is `response.content.size` (i64; surfaced raw including `-1` when the HAR logger didn't record size, so callers can filter).
`--largest-bodies[=N]` — new aggregate view. Emits `[{id, url, mime_type, content_size}]` sorted by `content_size` desc, limited to top-N. Defaults to N=10. Respects filters, honors grep-like exit semantics (1 when empty).
Syntax: uses `--largest-bodies=N` (equals form) because space-delimited `--largest-bodies N ` would be ambiguous with the `FILE` positional arg — `require_equals` keeps the grammar unambiguous.
On the referenced Codex comment (PR #4 r3097423264)
That comment flagged that `--body-regex` wasn't in `--entry`'s conflict set. It was already fixed on PR #4 itself (commit `7b90748` added both `body_grep` and `body_regex` to `--entry`'s `conflicts_with_all` before merge). Verified on current `main`: both flags conflict correctly. No action needed in this PR.
Test plan
Expected eval impact
Next eval rerun should flip size-aggregate from hargrep's worst regression (+64%) to a likely win, since the agent will pick up `--largest-bodies` via the `--help-llm` reference already added in PR #4.
🤖 Generated with Claude Code