Skip to content

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter#2

Closed
brunojm wants to merge 1 commit into
feat/llm-improvements-pr1from
feat/llm-improvements-pr2
Closed

feat: aggregate views (--domains, --size-by-type, --redirects) + --body-grep filter#2
brunojm wants to merge 1 commit into
feat/llm-improvements-pr1from
feat/llm-improvements-pr2

Conversation

@brunojm

@brunojm brunojm commented Apr 17, 2026

Copy link
Copy Markdown
Owner

Summary

Second PR in the LLM-friendly series. Stacks on #1 — please review and merge #1 first, then rebase this onto main.

Each new flag collapses a multi-turn agent interaction into a single call.

What changed

  • `--domains` — `[{domain, count}]` sorted by count desc. Answers "which hosts are in this HAR?" without the agent having to `awk` on URLs.
  • `--size-by-type` — `[{mime_type, total_bytes, count}]` sorted by total_bytes desc. Answers "where is the bandwidth going?" in one call.
  • `--redirects` — `[{id, url, status, location}]` for every 3xx entry. Returns raw pairs rather than stitched chains — the agent can reconstruct chains trivially and the format stays simple.
  • `--body-grep SUBSTRING` — new filter matching against request postData.text or response content.text. Composes with the existing filter pipeline (e.g. `--status-range 5xx --body-grep 'session expired'`). Replaces agents falling back to `rg`/`grep` on the raw HAR, which is noisy and unaware of JSON escaping.

All aggregate views respect filters, so `--status-range 4xx --domains` scopes to erroring hosts.

Why not combine with PR 1?

Keeping PRs small and focused. PR 1 introduces the structural pieces (entry IDs, overview, asset stripping, compact JSON). PR 2 adds targeted aggregate flags that build on them.

Test plan

  • `cargo test` — 116 tests pass (55 unit + 61 integration). New tests cover: domains sort order + filter composition, size-by-type sort + unknown/negative MIME handling, redirects happy + empty + missing Location header, body-grep across response/request bodies + filter composition + no-match exit code, and conflict rules between the four new view flags.
  • `cargo clippy --all-targets -- -D warnings` — clean.
  • `cargo fmt --check` — clean.

Follow-up (PR 3)

  • Distill the LLM-facing system prompt / flag reference (the eval showed arm3's verbose prompt was ~1.2k tokens — significant per-turn cost).
  • `hargrep --help-llm` — compact cheatsheet output.
  • Possibly `--body-regex` to complement `--body-grep` the way `--url-regex` complements `--url`.

🤖 Generated with Claude Code

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27e8b80a3f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/main.rs Outdated
Comment on lines +193 to +195
if cli.domains {
emit_json_doc(&aggregates::domains(&filtered))?;
return Ok(0);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve no-match exit code in aggregate modes

These aggregate branches return Ok(0) unconditionally, which bypasses the documented grep-like contract (1 when nothing matches). In practice, commands like --redirects can emit [] (e.g., a HAR with no 3xx responses) but still exit 0, so automation cannot distinguish “no matches” from “matches found.” Please derive the exit status from the emitted aggregate rows (or at least from exit_code) instead of hardcoding success.

Useful? React with 👍 / 👎.

…and --body-grep filter

PR 2 of the LLM-friendly series. Stacks on feat/llm-improvements-pr1.

Each new aggregate view answers a question that previously forced an agent to
chain several hargrep calls and post-process the output. --body-grep replaces
falling back to rg/grep on the raw HAR (noisy and unaware of JSON escaping).

Flags:

- --domains: [{domain, count}] sorted by count desc. Respects filters, so
  e.g. --status-range 4xx --domains shows which hosts are erroring.
- --size-by-type: [{mime_type, total_bytes, count}] sorted by total_bytes
  desc. Makes "where's my bandwidth going?" a one-liner.
- --redirects: [{id, url, status, location}] for every 3xx entry. Raw pairs
  rather than stitched chains — stitching is one step in the agent and keeps
  the format simple.
- --body-grep SUBSTRING: new filter that matches against request postData.text
  or response content.text. Composes with the existing filter pipeline.

All four are mutually exclusive with each other and with --overview,
--count, --fields, --entry, --no-body, --include-all-bodies, --output where
combining would be nonsensical.

116 tests pass (55 unit + 61 integration). Clippy clean, fmt clean.
@brunojm brunojm force-pushed the feat/llm-improvements-pr2 branch from 27e8b80 to cdeaf5b Compare April 17, 2026 01:32
@brunojm

brunojm commented Apr 17, 2026

Copy link
Copy Markdown
Owner Author

Addressed the Codex review point in cdeaf5b:

All three aggregate views (--domains, --size-by-type, --redirects) now honor the grep-like exit contract. Exit code derives from the emitted aggregate rows, not just the pre-aggregate filter set — so hargrep --redirects file.har exits 1 when the HAR has no 3xx entries even if it has plenty of 2xx matches.

For --overview, which always emits a full object rather than an array, emptiness is checked via entries == 0.

Also rebased onto the updated PR #1 (which added its own --overview exit-code fix). 123 tests pass locally.

Note: CI won't run here because the PR base is feat/llm-improvements-pr1, not main. Once PR #1 merges, I'll retarget this PR to main and CI will pick it up.

@brunojm brunojm deleted the branch feat/llm-improvements-pr1 April 17, 2026 01:34
@brunojm brunojm closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant