Skip to content

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter#3

Merged
brunojm merged 1 commit into
mainfrom
feat/pr2-reopen
Apr 17, 2026
Merged

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter#3
brunojm merged 1 commit into
mainfrom
feat/pr2-reopen

Conversation

@brunojm

@brunojm brunojm commented Apr 17, 2026

Copy link
Copy Markdown
Owner

Summary

Aggregate views that collapse multi-turn agent interactions into single calls, plus a first-class `--body-grep` filter. Second PR in the LLM-friendly series; follows up on #1 (now merged).

(This is a re-opened version of #2, which was auto-closed when its stacked base branch was deleted during the #1 squash-merge. Same code.)

What changed

  • `--domains` — `[{domain, count}]` sorted by count desc. Answers "which hosts are in this HAR?" without the agent `awk`-ing on URLs.
  • `--size-by-type` — `[{mime_type, total_bytes, count}]` sorted by total_bytes desc. One call for "where is the bandwidth going?"
  • `--redirects` — `[{id, url, status, location}]` for every 3xx entry. Returns raw pairs — chain reconstruction is cheap for the agent and keeps the format simple.
  • `--body-grep SUBSTRING` — new filter matching against request postData.text or response content.text. Composes with the existing filter pipeline. Replaces agents falling back to `rg`/`grep` on the raw HAR, which is noisy and unaware of JSON escaping.

All aggregate views respect filters, so `--status-range 4xx --domains` scopes to erroring hosts. All views honor the grep-like exit contract: exit 1 when the emitted document is empty, 0 otherwise — derived from the aggregate rows themselves, not just the pre-aggregate filter set, so `--redirects` on a HAR with no 3xx entries correctly exits 1 even if there are plenty of 2xx matches.

Codex review on #2 (already addressed)

Aggregate branches return Ok(0) unconditionally; should preserve no-match exit code.

Fixed in the commit. Exit-code logic now derives from `aggregate_exit_code(doc)` which inspects the emitted array's length or the overview's `entries` count.

Test plan

  • `cargo test` — 123 tests pass (55 unit + 68 integration). New tests cover:
    • Domains sort order + filter composition + exit codes.
    • Size-by-type sort + unknown/negative MIME handling + exit codes.
    • Redirects happy + empty + missing Location header + exit-1 when empty.
    • Body-grep across response/request bodies + filter composition + no-match exit code.
    • Conflict rules between the four new view flags.
  • `cargo clippy --all-targets -- -D warnings` — clean.
  • `cargo fmt --check` — clean.

Follow-up (PR 3)

  • Distill the LLM-facing system prompt / flag reference (the eval showed the agent was paying a ~800-token surcharge per turn).
  • `hargrep --help-llm` — compact cheatsheet.
  • Possibly `--body-regex` to complement `--body-grep`.

🤖 Generated with Claude Code

…and --body-grep filter

PR 2 of the LLM-friendly series. Stacks on feat/llm-improvements-pr1.

Each new aggregate view answers a question that previously forced an agent to
chain several hargrep calls and post-process the output. --body-grep replaces
falling back to rg/grep on the raw HAR (noisy and unaware of JSON escaping).

Flags:

- --domains: [{domain, count}] sorted by count desc. Respects filters, so
  e.g. --status-range 4xx --domains shows which hosts are erroring.
- --size-by-type: [{mime_type, total_bytes, count}] sorted by total_bytes
  desc. Makes "where's my bandwidth going?" a one-liner.
- --redirects: [{id, url, status, location}] for every 3xx entry. Raw pairs
  rather than stitched chains — stitching is one step in the agent and keeps
  the format simple.
- --body-grep SUBSTRING: new filter that matches against request postData.text
  or response content.text. Composes with the existing filter pipeline.

All four are mutually exclusive with each other and with --overview,
--count, --fields, --entry, --no-body, --include-all-bodies, --output where
combining would be nonsensical.

116 tests pass (55 unit + 61 integration). Clippy clean, fmt clean.
@brunojm brunojm merged commit 609a496 into main Apr 17, 2026
4 checks passed
@brunojm brunojm deleted the feat/pr2-reopen branch April 17, 2026 01:37

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 956adc2789

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/main.rs
/// Filter by substring match against request or response body text.
/// Matches when either contains the pattern. Case-sensitive.
#[arg(long)]
body_grep: Option<String>,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Disallow combining --entry with --body-grep

The new --body-grep flag is a filter, but it is not included in --entry's conflict set. In run, the --entry branch returns before any filtering is applied, so hargrep --entry N --body-grep ... succeeds and silently ignores --body-grep, which can mislead automation expecting filter semantics. --entry already blocks other filter flags for this exact reason, so body_grep should be treated the same way.

Useful? React with 👍 / 👎.

brunojm added a commit that referenced this pull request Apr 17, 2026
Codex flagged on PR #3 that --body-grep wasn't in --entry's conflict set, so
`hargrep --entry N --body-grep foo` silently ignored the filter. Same class
of bug as the earlier --entry fix; body-grep was just added later and missed
the sweep. Adding body_grep + the newly-introduced body_regex to the list.

Extended the existing conflict test to cover both.
brunojm added a commit that referenced this pull request Apr 17, 2026
* feat: add --body-regex filter and --help-llm compact cheatsheet

Third PR in the LLM-friendly series. Two small additions that complete the
filter/help surface area for agents.

- --body-regex REGEX: regex variant of --body-grep, mirroring how --url pairs
  with --url-regex. Compiled at CLI parse time so bad patterns error with
  exit code 2 before any file is read. Supports (?i) for case-insensitive.
  Composes with --body-grep and all other filters as AND.

- --help-llm: prints a compact flag reference and exits. 1566 bytes vs 3511
  for clap's default --help (-55%). Tuned for LLM consumption: one line per
  flag group, no examples, exit codes documented. Lets an agent discover
  flags on-demand for ~400 tokens instead of carrying a 1k+ token cheatsheet
  in every system prompt.

129 tests pass (55 unit + 74 integration). Clippy clean, fmt clean.

* fix: --entry also conflicts with --body-grep and --body-regex

Codex flagged on PR #3 that --body-grep wasn't in --entry's conflict set, so
`hargrep --entry N --body-grep foo` silently ignored the filter. Same class
of bug as the earlier --entry fix; body-grep was just added later and missed
the sweep. Adding body_grep + the newly-introduced body_regex to the list.

Extended the existing conflict test to cover both.

* docs: note --entry conflicts with filters; add --body-regex + --help-llm examples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant