feat: --body-regex filter + --help-llm compact cheatsheet#4
Conversation
Third PR in the LLM-friendly series. Two small additions that complete the filter/help surface area for agents. - --body-regex REGEX: regex variant of --body-grep, mirroring how --url pairs with --url-regex. Compiled at CLI parse time so bad patterns error with exit code 2 before any file is read. Supports (?i) for case-insensitive. Composes with --body-grep and all other filters as AND. - --help-llm: prints a compact flag reference and exits. 1566 bytes vs 3511 for clap's default --help (-55%). Tuned for LLM consumption: one line per flag group, no examples, exit codes documented. Lets an agent discover flags on-demand for ~400 tokens instead of carrying a 1k+ token cheatsheet in every system prompt. 129 tests pass (55 unit + 74 integration). Clippy clean, fmt clean.
Codex flagged on PR #3 that --body-grep wasn't in --entry's conflict set, so `hargrep --entry N --body-grep foo` silently ignored the filter. Same class of bug as the earlier --entry fix; body-grep was just added later and missed the sweep. Adding body_grep + the newly-introduced body_regex to the list. Extended the existing conflict test to cover both.
|
Also addressed Codex's feedback on #3 in 7b90748: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: feb600e2ea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| #[arg(long)] | ||
| body_regex: Option<Regex>, |
There was a problem hiding this comment.
Declare --body-regex incompatible with --entry
The new --body-regex option is defined as a filter, but it can still be passed with --entry; in run, the --entry path returns before any filters are applied, so the regex is silently ignored (for example, --entry 0 --body-regex '^nope$' still returns entry 0). This creates surprising behavior for the new flag and should be prevented by adding a conflict with --entry at argument definition time.
Useful? React with 👍 / 👎.
Summary
Third PR in the LLM-friendly series. Closes out the core feature set from the improvement list: filter/help surface area for agents.
What changed
`--body-regex REGEX`
Regex variant of `--body-grep`, mirroring how `--url` pairs with `--url-regex`. Compiled at CLI parse time so bad patterns error with exit code 2 before any file is read. Use `(?i)pattern` for case-insensitive matching. Composes as AND with `--body-grep` and all other filters.
```bash
hargrep --body-regex '(?i)session\sexpired' recording.har
hargrep --body-regex '"error":\s".*timeout"' --status-range 5xx recording.har
```
`--help-llm`
Prints a compact flag reference and exits. 1566 bytes vs 3511 for clap's default `--help` — 55% smaller. Tuned for LLM consumption: one line per flag group, no examples, exit codes documented.
```
$ hargrep --help-llm
hargrep — HAR query CLI. Reads FILE (or stdin).
FILTERS (AND-combined):
--method GET|POST|... --status CODE
--status-range 4xx|200-299 --url SUBSTR
--url-regex REGEX --header 'NAME[:VALUE]'
--mime SUBSTR --min-time MS
--body-grep SUBSTR --body-regex REGEX
OUTPUT (mutually exclusive):
(default) Filtered entries as JSON (pretty in TTY, compact when piped).
--output json|jsonl|summary
--fields F,F,... id,url,method,status,status-text,time,mime-type,started-date-time
--count Matching entry count.
--overview {entries,status,methods,mime_types,top_domains,total_body_size_bytes,total_time_ms}
--domains [{domain,count}] sorted by count desc.
--size-by-type [{mime_type,total_bytes,count}] sorted by total_bytes desc.
--redirects [{id,url,status,location}] for every 3xx.
--entry N One entry by id (original 0-indexed HAR position).
...
```
This lets an agent discover flags on demand for ~400 tokens instead of carrying a 1k+ token cheatsheet in every system prompt.
Test plan
Follow-up (no further PRs planned in this series)
With PR 1, PR 3 (née 2), and this one merged, hargrep has the feature set the eval identified as missing. Natural next step: re-run the step-1 eval against the current `main` to measure whether we've actually closed the gap to `jq` on small HARs and opened a lead on medium/large ones.
🤖 Generated with Claude Code