Skip to content

feat: content-size field + --largest-bodies view (fixes size-aggregate eval regression)#5

Merged
brunojm merged 2 commits into
mainfrom
feat/llm-improvements-pr5
Apr 17, 2026
Merged

feat: content-size field + --largest-bodies view (fixes size-aggregate eval regression)#5
brunojm merged 2 commits into
mainfrom
feat/llm-improvements-pr5

Conversation

@brunojm

@brunojm brunojm commented Apr 17, 2026

Copy link
Copy Markdown
Owner

Summary

Closes a regression surfaced by the post-PR4 eval rerun on small.har:

size-aggregate: hargrep 9,104 → 14,934 tokens (+64%) — worst regression in the entire matrix. Task asks "which URL has the largest response body?"; agent chased the new --size-by-type flag, which aggregates at MIME level, not URL level, and had to fall back to manual body-size extraction.

This PR adds the missing URL-level primitive and the derived view.

What changed

  • `content-size` — new `--fields` value. Emits as `contentSize` (matching HAR camelCase). Source is `response.content.size` (i64; surfaced raw including `-1` when the HAR logger didn't record size, so callers can filter).

  • `--largest-bodies[=N]` — new aggregate view. Emits `[{id, url, mime_type, content_size}]` sorted by `content_size` desc, limited to top-N. Defaults to N=10. Respects filters, honors grep-like exit semantics (1 when empty).

    Syntax: uses `--largest-bodies=N` (equals form) because space-delimited `--largest-bodies N ` would be ambiguous with the `FILE` positional arg — `require_equals` keeps the grammar unambiguous.

On the referenced Codex comment (PR #4 r3097423264)

That comment flagged that `--body-regex` wasn't in `--entry`'s conflict set. It was already fixed on PR #4 itself (commit `7b90748` added both `body_grep` and `body_regex` to `--entry`'s `conflicts_with_all` before merge). Verified on current `main`: both flags conflict correctly. No action needed in this PR.

Test plan

  • `cargo test` — 135 tests pass (55 unit + 80 integration). New tests:
    • `--fields url,content-size` emits `contentSize` with valid i64 values.
    • `--largest-bodies` returns sorted-desc array, capped at entry count for small HARs.
    • `--largest-bodies=N` honors the override.
    • Respects filters (e.g. `--largest-bodies --method POST`).
    • Exits 1 when filter empties the result set.
    • Conflicts with every other view flag.
  • `cargo clippy --all-targets -- -D warnings` — clean.
  • `cargo fmt --check` — clean.

Expected eval impact

Next eval rerun should flip size-aggregate from hargrep's worst regression (+64%) to a likely win, since the agent will pick up `--largest-bodies` via the `--help-llm` reference already added in PR #4.

🤖 Generated with Claude Code

brunojm added 2 commits April 16, 2026 23:00
…egate eval regression)

Closes the size-aggregate gap surfaced in the post-PR4 eval rerun, where
hargrep regressed 64% on "which URL has the largest body?" because the
agent chased --size-by-type (MIME-level aggregate) when it needed URL-level
body sizes.

- content-size: new valid --fields value. Emits as contentSize (HAR
  camelCase convention). Source is response.content.size (i64; -1 when the
  HAR logger didn't know, surfaced raw so callers can filter).

- --largest-bodies[=N]: new aggregate view. Emits [{id, url, mime_type,
  content_size}] sorted by content_size desc, limited to top N. Default
  N=10. Uses --largest-bodies=N (equals) syntax because plain space
  delimiters would be ambiguous with the FILE positional arg — clap's
  require_equals keeps the grammar unambiguous.

Respects filters and honors grep-like exit semantics (1 on empty).
Conflicts with every other output/view flag.

Updated --help-llm cheatsheet and README to document both additions.

135 tests pass (55 unit + 80 integration). Clippy + fmt clean.
- clippy 1.95 on CI flagged unnecessary_sort_by in largest_bodies; switched
  to sort_by_key with Reverse. Local clippy on older Rust didn't trigger it.
- Pin fixture-exact sizes in test_fields_includes_content_size so a sort-key
  swap would surface immediately (per pr-test-analyzer review).
- Pin the #1 winner (id 3, PNG) in --largest-bodies tests instead of the
  tautological sorted-desc check. Added limit=1 test.
- Widen test_largest_bodies_conflicts_with_other_views to cover all four
  view flags (was testing only --overview).
- Add 4 inline unit tests to src/aggregates.rs for largest_bodies covering
  sort, limit truncation, limit=0, and -1 (unknown) sinking with stable
  tie-breaking.
- Doc comment on largest_bodies explicitly notes -1 semantics and stable
  sort.
- Updated aggregate_exit_code docstring to list --largest-bodies.
- README notes the -1-sinks-to-bottom behavior.

140 tests pass (59 unit + 81 integration). Clippy + fmt clean.
@brunojm brunojm merged commit 3a74201 into main Apr 17, 2026
4 checks passed
@brunojm brunojm deleted the feat/llm-improvements-pr5 branch April 17, 2026 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant