feat: LLM-friendly flags — entry IDs, --overview, asset stripping, compact JSON by brunojm · Pull Request #1 · brunojm/hargrep

brunojm · 2026-04-17T01:18:39Z

Summary

Makes hargrep materially more token-efficient for LLM coding agents without sacrificing human usability. First of a planned series; scope intentionally small.

Based on measured eval data (3 arms × 10 tasks × 3 trials on a 201-entry HAR): hargrep clearly beats naive `Read`+`Grep` (-90% tokens, +20% correctness), but loses to generic `jq`+`bash` on small HARs due to verbose default output and prompt overhead. This PR closes that gap on the default-output and structural axes; aggregate flags come in PR 2.

What changed

Entry IDs. Every output entry includes a stable `id` field (its original 0-indexed position in the HAR). Enables the pointer-then-fetch pattern — an agent lists matches with `--fields id,url,status`, then drills into one with `--entry N` (returns a single JSON object, not an array). `id` is also valid in `--fields`.
`--overview`. One-shot JSON dashboard: entry count, status/method/MIME histograms, top 10 domains, total body size, total time. Respects filters — use it after `--status-range 4xx` for a scoped view. Replaces a cascade of exploratory queries with a single call.
Static-asset body stripping by default. Images, fonts, CSS, JS, WASM, video, and audio response bodies are dropped by default. They dominate real-HAR size but rarely help debug API behaviour. `--include-all-bodies` restores prior behaviour; `--no-body` still strips everything.
TTY-aware compact JSON. `--output json` is pretty when stdout is a terminal, compact when piped. ~30% savings on the typical subprocess case, no change for humans.

Breaking change

Default response bodies for static-asset MIME types are now stripped. Scripts that need the prior behaviour should pass `--include-all-bodies`. JSON/HTML/XML/text bodies are unchanged.

Test plan

`cargo test` — 99 tests pass (49 unit + 50 integration). Includes new tests for: entry-id emission across all output formats, `--entry` happy + out-of-range + conflicts, asset-body stripping semantics, `--include-all-bodies` override, `--no-body` still strips everything, `--overview` shape + status/method histograms + filter composition + conflict rules, compact JSON when piped.
`cargo clippy --all-targets -- -D warnings` — clean.
`cargo fmt --check` — clean.
Manual smoke on `samples/igvita.har` for `--overview` and `--entry`.

Follow-up (not in this PR)

PR 2 will add aggregate views that collapse multi-turn queries: `--domains`, `--size-by-type`, `--redirects`, `--body-grep`. PR 3 will trim the LLM-facing prompt material and add a compact `--help-llm` output. The eval harness that produced the baseline numbers lives in a separate branch and is deliberately excluded from this PR.

🤖 Generated with Claude Code

…, compact JSON PR 1 of a series making hargrep more token-efficient for LLM agents. Based on eval data showing hargrep needlessly verbose vs. naive jq on small HARs while winning clearly on correctness and volume. Changes: - Entry IDs. Every output entry now includes an `id` field (original 0-indexed position in the HAR). IDs are stable across filter changes, enabling the pointer-then-fetch pattern: an agent lists matches with `--fields id,url`, then drills into a specific one with `--entry N` (returns a single JSON object instead of an array). Adds a new `id` value for `--fields`. - `--overview`. One-shot HAR dashboard: entry count, status/method/MIME histograms, top 10 domains by request count, total body size, total time. Respects filters — call it after `--status-range 4xx` to see a filtered picture. Replaces a cascade of exploratory queries with one call. - Asset-body auto-strip. Static-asset response bodies (images, fonts, CSS, JS, WASM, video, audio) are stripped by default. They dominate real-HAR size but rarely help debug API behaviour. `--include-all-bodies` restores the old behaviour; `--no-body` still strips everything. - TTY-aware compact JSON. `--output json` is pretty in a TTY, compact when piped. Saves ~30% on the common subprocess case without hurting human readability. All new flags have conflict rules (e.g. `--overview` and `--count` are mutually exclusive). 99 tests pass (49 unit + 50 integration), clippy clean. Breaking change note: default response bodies for static assets are now stripped. Pass `--include-all-bodies` to recover the prior behaviour.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87219f81fa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T01:21:57Z

+    if let Some(id) = cli.entry {
+        let total = har.log.entries.len();
+        let entry = har.log.entries.into_iter().nth(id).ok_or_else(|| {
+            anyhow::anyhow!("entry id {id} out of range (HAR has {total} entries)")
+        })?;
+        let output = output::format_single_entry(id, &entry, body_mode)?;
+        print!("{output}");
+        return Ok(0);


Reject filter flags when using --entry

The --entry path short-circuits before any filter logic runs, so filter flags like --status, --method, --url, etc. are accepted but silently ignored. In practice, hargrep --entry 0 --status 500 ... will still emit entry 0 even if it is not a 500 response, which can mislead automation that expects all supplied predicates to be enforced. Either make --entry conflict with filter flags or apply filtering before resolving the entry ID.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-17T01:21:57Z

+            serde_json::to_string(&doc)?
+        };
+        println!("{serialized}");
+        return Ok(0);


Return grep-style no-match code for --overview

This branch always returns exit code 0, even when filters produce zero matches. That diverges from the tool’s documented/established contract (1 for no matches) used by other output modes, so scripts that rely on exit status to detect empty result sets will incorrectly treat an empty overview as success. The overview path should return the precomputed exit_code instead of hardcoding success.

Useful? React with 👍 / 👎.

…s exit code Two P2 issues flagged by Codex on PR #1: 1. --entry silently ignored filter flags (e.g. `--entry 0 --status 500` returned entry 0 regardless of its status). --entry is a direct lookup, not a filter operation — adding conflicts_with_all for method, status, status-range, url, url-regex, header, mime, min-time so the combination errors at parse time with exit code 2 instead of misleading automation. 2. --overview always returned exit 0, breaking the grep-like "1 on no matches" contract documented for every other output mode. Now returns the precomputed exit_code so scripts can distinguish empty-result from matched-result runs. Regression tests added for both (102 tests pass total).

brunojm · 2026-04-17T01:32:50Z

Addressed both Codex review points in d614e16:

--entry now conflicts with all filter flags (--method, --status, --status-range, --url, --url-regex, --header, --mime, --min-time). Clap rejects the combination at parse time with exit code 2, so hargrep --entry 0 --status 500 ... now errors instead of silently ignoring the --status predicate.
--overview respects the grep-like exit contract: returns 1 when the filter set is empty, 0 otherwise. The empty document is still printed so downstream tooling sees well-formed output.

Added regression tests covering both (102 tests total). CI green.

- clippy 1.95 on CI flagged unnecessary_sort_by in largest_bodies; switched to sort_by_key with Reverse. Local clippy on older Rust didn't trigger it. - Pin fixture-exact sizes in test_fields_includes_content_size so a sort-key swap would surface immediately (per pr-test-analyzer review). - Pin the #1 winner (id 3, PNG) in --largest-bodies tests instead of the tautological sorted-desc check. Added limit=1 test. - Widen test_largest_bodies_conflicts_with_other_views to cover all four view flags (was testing only --overview). - Add 4 inline unit tests to src/aggregates.rs for largest_bodies covering sort, limit truncation, limit=0, and -1 (unknown) sinking with stable tie-breaking. - Doc comment on largest_bodies explicitly notes -1 semantics and stable sort. - Updated aggregate_exit_code docstring to list --largest-bodies. - README notes the -1-sinks-to-bottom behavior. 140 tests pass (59 unit + 81 integration). Clippy + fmt clean.

…e eval regression) (#5) * feat: add content-size field + --largest-bodies view (fixes size-aggregate eval regression) Closes the size-aggregate gap surfaced in the post-PR4 eval rerun, where hargrep regressed 64% on "which URL has the largest body?" because the agent chased --size-by-type (MIME-level aggregate) when it needed URL-level body sizes. - content-size: new valid --fields value. Emits as contentSize (HAR camelCase convention). Source is response.content.size (i64; -1 when the HAR logger didn't know, surfaced raw so callers can filter). - --largest-bodies[=N]: new aggregate view. Emits [{id, url, mime_type, content_size}] sorted by content_size desc, limited to top N. Default N=10. Uses --largest-bodies=N (equals) syntax because plain space delimiters would be ambiguous with the FILE positional arg — clap's require_equals keeps the grammar unambiguous. Respects filters and honors grep-like exit semantics (1 on empty). Conflicts with every other output/view flag. Updated --help-llm cheatsheet and README to document both additions. 135 tests pass (55 unit + 80 integration). Clippy + fmt clean. * fix: use sort_by_key; address PR review feedback - clippy 1.95 on CI flagged unnecessary_sort_by in largest_bodies; switched to sort_by_key with Reverse. Local clippy on older Rust didn't trigger it. - Pin fixture-exact sizes in test_fields_includes_content_size so a sort-key swap would surface immediately (per pr-test-analyzer review). - Pin the #1 winner (id 3, PNG) in --largest-bodies tests instead of the tautological sorted-desc check. Added limit=1 test. - Widen test_largest_bodies_conflicts_with_other_views to cover all four view flags (was testing only --overview). - Add 4 inline unit tests to src/aggregates.rs for largest_bodies covering sort, limit truncation, limit=0, and -1 (unknown) sinking with stable tie-breaking. - Doc comment on largest_bodies explicitly notes -1 semantics and stable sort. - Updated aggregate_exit_code docstring to list --largest-bodies. - README notes the -1-sinks-to-bottom behavior. 140 tests pass (59 unit + 81 integration). Clippy + fmt clean.

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

brunojm mentioned this pull request Apr 17, 2026

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter #2

Closed

3 tasks

brunojm merged commit e351c87 into main Apr 17, 2026
4 checks passed

brunojm deleted the feat/llm-improvements-pr1 branch April 17, 2026 01:34

brunojm mentioned this pull request Apr 17, 2026

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter #3

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LLM-friendly flags — entry IDs, --overview, asset stripping, compact JSON#1

feat: LLM-friendly flags — entry IDs, --overview, asset stripping, compact JSON#1
brunojm merged 2 commits into
mainfrom
feat/llm-improvements-pr1

brunojm commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Uh oh!

brunojm commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brunojm commented Apr 17, 2026

Summary

What changed

Breaking change

Test plan

Follow-up (not in this PR)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

brunojm commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant