feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter by brunojm · Pull Request #3 · brunojm/hargrep

brunojm · 2026-04-17T01:35:44Z

Summary

Aggregate views that collapse multi-turn agent interactions into single calls, plus a first-class `--body-grep` filter. Second PR in the LLM-friendly series; follows up on #1 (now merged).

(This is a re-opened version of #2, which was auto-closed when its stacked base branch was deleted during the #1 squash-merge. Same code.)

What changed

`--domains` — `[{domain, count}]` sorted by count desc. Answers "which hosts are in this HAR?" without the agent `awk`-ing on URLs.
`--size-by-type` — `[{mime_type, total_bytes, count}]` sorted by total_bytes desc. One call for "where is the bandwidth going?"
`--redirects` — `[{id, url, status, location}]` for every 3xx entry. Returns raw pairs — chain reconstruction is cheap for the agent and keeps the format simple.
`--body-grep SUBSTRING` — new filter matching against request postData.text or response content.text. Composes with the existing filter pipeline. Replaces agents falling back to `rg`/`grep` on the raw HAR, which is noisy and unaware of JSON escaping.

All aggregate views respect filters, so `--status-range 4xx --domains` scopes to erroring hosts. All views honor the grep-like exit contract: exit 1 when the emitted document is empty, 0 otherwise — derived from the aggregate rows themselves, not just the pre-aggregate filter set, so `--redirects` on a HAR with no 3xx entries correctly exits 1 even if there are plenty of 2xx matches.

Codex review on #2 (already addressed)

Aggregate branches return Ok(0) unconditionally; should preserve no-match exit code.

Fixed in the commit. Exit-code logic now derives from `aggregate_exit_code(doc)` which inspects the emitted array's length or the overview's `entries` count.

Test plan

`cargo test` — 123 tests pass (55 unit + 68 integration). New tests cover:
- Domains sort order + filter composition + exit codes.
- Size-by-type sort + unknown/negative MIME handling + exit codes.
- Redirects happy + empty + missing Location header + exit-1 when empty.
- Body-grep across response/request bodies + filter composition + no-match exit code.
- Conflict rules between the four new view flags.
`cargo clippy --all-targets -- -D warnings` — clean.
`cargo fmt --check` — clean.

Follow-up (PR 3)

Distill the LLM-facing system prompt / flag reference (the eval showed the agent was paying a ~800-token surcharge per turn).
`hargrep --help-llm` — compact cheatsheet.
Possibly `--body-regex` to complement `--body-grep`.

🤖 Generated with Claude Code

…and --body-grep filter PR 2 of the LLM-friendly series. Stacks on feat/llm-improvements-pr1. Each new aggregate view answers a question that previously forced an agent to chain several hargrep calls and post-process the output. --body-grep replaces falling back to rg/grep on the raw HAR (noisy and unaware of JSON escaping). Flags: - --domains: [{domain, count}] sorted by count desc. Respects filters, so e.g. --status-range 4xx --domains shows which hosts are erroring. - --size-by-type: [{mime_type, total_bytes, count}] sorted by total_bytes desc. Makes "where's my bandwidth going?" a one-liner. - --redirects: [{id, url, status, location}] for every 3xx entry. Raw pairs rather than stitched chains — stitching is one step in the agent and keeps the format simple. - --body-grep SUBSTRING: new filter that matches against request postData.text or response content.text. Composes with the existing filter pipeline. All four are mutually exclusive with each other and with --overview, --count, --fields, --entry, --no-body, --include-all-bodies, --output where combining would be nonsensical. 116 tests pass (55 unit + 61 integration). Clippy clean, fmt clean.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 956adc2789

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T01:39:33Z

+    /// Filter by substring match against request or response body text.
+    /// Matches when either contains the pattern. Case-sensitive.
+    #[arg(long)]
+    body_grep: Option<String>,


Disallow combining --entry with --body-grep

The new --body-grep flag is a filter, but it is not included in --entry's conflict set. In run, the --entry branch returns before any filtering is applied, so hargrep --entry N --body-grep ... succeeds and silently ignores --body-grep, which can mislead automation expecting filter semantics. --entry already blocks other filter flags for this exact reason, so body_grep should be treated the same way.

Useful? React with 👍 / 👎.

Codex flagged on PR #3 that --body-grep wasn't in --entry's conflict set, so `hargrep --entry N --body-grep foo` silently ignored the filter. Same class of bug as the earlier --entry fix; body-grep was just added later and missed the sweep. Adding body_grep + the newly-introduced body_regex to the list. Extended the existing conflict test to cover both.

* feat: add --body-regex filter and --help-llm compact cheatsheet Third PR in the LLM-friendly series. Two small additions that complete the filter/help surface area for agents. - --body-regex REGEX: regex variant of --body-grep, mirroring how --url pairs with --url-regex. Compiled at CLI parse time so bad patterns error with exit code 2 before any file is read. Supports (?i) for case-insensitive. Composes with --body-grep and all other filters as AND. - --help-llm: prints a compact flag reference and exits. 1566 bytes vs 3511 for clap's default --help (-55%). Tuned for LLM consumption: one line per flag group, no examples, exit codes documented. Lets an agent discover flags on-demand for ~400 tokens instead of carrying a 1k+ token cheatsheet in every system prompt. 129 tests pass (55 unit + 74 integration). Clippy clean, fmt clean. * fix: --entry also conflicts with --body-grep and --body-regex Codex flagged on PR #3 that --body-grep wasn't in --entry's conflict set, so `hargrep --entry N --body-grep foo` silently ignored the filter. Same class of bug as the earlier --entry fix; body-grep was just added later and missed the sweep. Adding body_grep + the newly-introduced body_regex to the list. Extended the existing conflict test to cover both. * docs: note --entry conflicts with filters; add --body-regex + --help-llm examples

brunojm merged commit 609a496 into main Apr 17, 2026
4 checks passed

brunojm deleted the feat/pr2-reopen branch April 17, 2026 01:37

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

brunojm mentioned this pull request Apr 17, 2026

feat: --body-regex filter + --help-llm compact cheatsheet #4

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter#3

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter#3
brunojm merged 1 commit into
mainfrom
feat/pr2-reopen

brunojm commented Apr 17, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brunojm commented Apr 17, 2026

Summary

What changed

Codex review on #2 (already addressed)

Test plan

Follow-up (PR 3)

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant