feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter by brunojm · Pull Request #2 · brunojm/hargrep

brunojm · 2026-04-17T01:23:42Z

Summary

Second PR in the LLM-friendly series. Stacks on #1 — please review and merge #1 first, then rebase this onto main.

Each new flag collapses a multi-turn agent interaction into a single call.

What changed

`--domains` — `[{domain, count}]` sorted by count desc. Answers "which hosts are in this HAR?" without the agent having to `awk` on URLs.
`--size-by-type` — `[{mime_type, total_bytes, count}]` sorted by total_bytes desc. Answers "where is the bandwidth going?" in one call.
`--redirects` — `[{id, url, status, location}]` for every 3xx entry. Returns raw pairs rather than stitched chains — the agent can reconstruct chains trivially and the format stays simple.
`--body-grep SUBSTRING` — new filter matching against request postData.text or response content.text. Composes with the existing filter pipeline (e.g. `--status-range 5xx --body-grep 'session expired'`). Replaces agents falling back to `rg`/`grep` on the raw HAR, which is noisy and unaware of JSON escaping.

All aggregate views respect filters, so `--status-range 4xx --domains` scopes to erroring hosts.

Why not combine with PR 1?

Keeping PRs small and focused. PR 1 introduces the structural pieces (entry IDs, overview, asset stripping, compact JSON). PR 2 adds targeted aggregate flags that build on them.

Test plan

`cargo test` — 116 tests pass (55 unit + 61 integration). New tests cover: domains sort order + filter composition, size-by-type sort + unknown/negative MIME handling, redirects happy + empty + missing Location header, body-grep across response/request bodies + filter composition + no-match exit code, and conflict rules between the four new view flags.
`cargo clippy --all-targets -- -D warnings` — clean.
`cargo fmt --check` — clean.

Follow-up (PR 3)

Distill the LLM-facing system prompt / flag reference (the eval showed arm3's verbose prompt was ~1.2k tokens — significant per-turn cost).
`hargrep --help-llm` — compact cheatsheet output.
Possibly `--body-regex` to complement `--body-grep` the way `--url-regex` complements `--url`.

🤖 Generated with Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27e8b80a3f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T01:28:05Z

+    if cli.domains {
+        emit_json_doc(&aggregates::domains(&filtered))?;
+        return Ok(0);


Preserve no-match exit code in aggregate modes

These aggregate branches return Ok(0) unconditionally, which bypasses the documented grep-like contract (1 when nothing matches). In practice, commands like --redirects can emit [] (e.g., a HAR with no 3xx responses) but still exit 0, so automation cannot distinguish “no matches” from “matches found.” Please derive the exit status from the emitted aggregate rows (or at least from exit_code) instead of hardcoding success.

Useful? React with 👍 / 👎.

…and --body-grep filter PR 2 of the LLM-friendly series. Stacks on feat/llm-improvements-pr1. Each new aggregate view answers a question that previously forced an agent to chain several hargrep calls and post-process the output. --body-grep replaces falling back to rg/grep on the raw HAR (noisy and unaware of JSON escaping). Flags: - --domains: [{domain, count}] sorted by count desc. Respects filters, so e.g. --status-range 4xx --domains shows which hosts are erroring. - --size-by-type: [{mime_type, total_bytes, count}] sorted by total_bytes desc. Makes "where's my bandwidth going?" a one-liner. - --redirects: [{id, url, status, location}] for every 3xx entry. Raw pairs rather than stitched chains — stitching is one step in the agent and keeps the format simple. - --body-grep SUBSTRING: new filter that matches against request postData.text or response content.text. Composes with the existing filter pipeline. All four are mutually exclusive with each other and with --overview, --count, --fields, --entry, --no-body, --include-all-bodies, --output where combining would be nonsensical. 116 tests pass (55 unit + 61 integration). Clippy clean, fmt clean.

brunojm · 2026-04-17T01:33:12Z

Addressed the Codex review point in cdeaf5b:

All three aggregate views (--domains, --size-by-type, --redirects) now honor the grep-like exit contract. Exit code derives from the emitted aggregate rows, not just the pre-aggregate filter set — so hargrep --redirects file.har exits 1 when the HAR has no 3xx entries even if it has plenty of 2xx matches.

For --overview, which always emits a full object rather than an array, emptiness is checked via entries == 0.

Also rebased onto the updated PR #1 (which added its own --overview exit-code fix). 123 tests pass locally.

Note: CI won't run here because the PR base is feat/llm-improvements-pr1, not main. Once PR #1 merges, I'll retarget this PR to main and CI will pick it up.

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

brunojm force-pushed the feat/llm-improvements-pr2 branch from 27e8b80 to cdeaf5b Compare April 17, 2026 01:32

brunojm deleted the branch feat/llm-improvements-pr1 April 17, 2026 01:34

brunojm closed this Apr 17, 2026

brunojm mentioned this pull request Apr 17, 2026

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter #3

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter#2

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter#2
brunojm wants to merge 1 commit into
feat/llm-improvements-pr1from
feat/llm-improvements-pr2

brunojm commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Uh oh!

brunojm commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brunojm commented Apr 17, 2026

Summary

What changed

Why not combine with PR 1?

Test plan

Follow-up (PR 3)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

brunojm commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant