Pentest Agent Suite for Claude Code

Autonomous bug-bounty framework for Claude Code and 6 other AI coding tools — 50 agents, 26 commands, 19 CLI tools, 11 skills, 2 MCP servers.

~760 files · ~118k lines · 50 agents · 26 commands · 19 CLI tools · 11 skills · 2 MCP servers (16 bug-bounty platforms + BYO writeup search) · 2,500 payload lines

A complete bug bounty framework. Battle-tested hunting methodology with concrete payloads, 7-Question Gate validation, autonomous hunt loops, A→B exploit chain building, persistent brain with endpoint tracking, optional semantic writeup search (bring your own index), automatic cost tracking via CC hooks, live platform integration, and a cross-IDE installer that emits the native format for Claude Code, Codex, Gemini, Cursor, Windsurf, VS Code Copilot, and OpenClaw.

Quick Start

# MCP servers are launched via `uv run --with mcp` — no global pip install required.
export HACKERONE_USERNAME=you HACKERONE_TOKEN=your_token
uv run python3 tools/scaffold.py hackerone tesla
cd ~/bounties/hackerone-tesla && claude
/model opus             # Opus 4.7 [1M] — subagents inherit via model: "inherit"
/sync hackerone tesla
/brain init && /status
/hunt tesla.com

scaffold.py provisions the workspace for every supported project-scoped client, not only Claude Code: CLAUDE.md, AGENTS.md, .codex/, .agents/skills/, .gemini/, .cursor/, .windsurf/, .github/, and .vscode/mcp.json are generated from the copied workspace assets so paths resolve inside the bounty workspace.

Install (Claude Code + 6 other AI coding tools)

The framework ships pre-rendered for every supported tool. There are two ways to use it:

1. Use the bundles directly (no install step)

git clone https://github.com/H-mmer/pentest-agents-suite
cd pentest-agents-suite/pentest-agents/providers/codex
codex                       # or: cd ../gemini && gemini, etc.

The providers/<id>/ tree contains a fully-translated, ready-to-use bundle for each non-Claude target. Path references inside use .. to reach the repo's tools/, rules/, and mcp-*-server/ — so the bundle works as long as it stays inside the cloned repo.

2. Run the installer (writes into your own project or ~/.codex/ etc.)

python3 -m tools.installer install --targets all --scope project
python3 -m tools.installer install --targets codex --scope global

Install mode rewrites paths to absolute references back into the cloned pentest-agents repo, so the install works no matter where the user's own project lives.

Target	Agents	Slash commands	Rules	MCP	Scopes
Claude Code	native `.claude/agents/*.md`	`.claude/skills/<name>/SKILL.md`	`CLAUDE.md`	`.mcp.json` / `~/.claude.json`	global + project
OpenAI Codex	native `.codex/agents/*.toml`	`.agents/skills/<name>/SKILL.md`	`AGENTS.md` (≤32 KiB)	`[mcp_servers.*]` in `config.toml`	global + project
Google Gemini	native `.gemini/agents/*.md`	TOML in `.gemini/commands/`	`GEMINI.md`	`mcpServers` in `settings.json`	global + project
Cursor	→ skills `.cursor/skills/agent-*/SKILL.md` (no native subagents)	→ skills `.cursor/skills/cmd-*/SKILL.md`	`.cursor/rules/*.mdc` + `AGENTS.md`	`.cursor/mcp.json`	global + project
Windsurf	→ skills	Workflows	`.windsurf/rules/*.md` (≤12 KiB / file)	`~/.codeium/windsurf/mcp_config.json`	global + project
VS Code Copilot	`.github/agents/*.agent.md` (≤30 KiB / agent)	`.github/prompts/*.prompt.md`	`.github/copilot-instructions.md` + `.github/instructions/*`	`.vscode/mcp.json`	project + global-MCP
OpenClaw	→ skills	→ skills	`~/.openclaw/workspace/AGENTS.md` or `<proj>/AGENTS.md`	`mcp.servers` in `~/.openclaw/openclaw.json`	global + project (MCP is user-level)

Cursor, Windsurf, and OpenClaw have no native subagent concept; Claude-format agents render as skills/rules. Codex commands are emitted as AgentSkills under .agents/skills/; the deprecated .codex/prompts/ path is not used.

providers/ directory (in the cloned repo):

providers/
├── codex/    AGENTS.md + .codex/{agents,config.toml} + .agents/skills
├── gemini/   GEMINI.md + .gemini/{agents,commands} + settings.json
├── cursor/   AGENTS.md + .cursor/{rules,skills,mcp.json}
├── windsurf/ AGENTS.md + .windsurf/{rules,workflows,skills} + mcp_config.json
├── copilot/  .github/{copilot-instructions.md,instructions,prompts,agents} + .vscode/mcp.json
└── openclaw/ AGENTS.md + .agents/skills/ + openclaw.json

providers/ is generated, not edited by hand. Re-render after editing .claude/, rules/, or skills/ source:

python3 -m tools.installer render --targets all
python3 -m tools.installer render --check        # exits 1 if drift

The test_committed_providers_match_render pytest case enforces drift detection locally — there is no GitHub Actions CI by project policy.

What gets translated

When .claude/ content is rendered for non-Claude targets, the translator:

Drops the model: field — each target uses its own default model.
Strips Claude-specific prose — "Claude Code" → "the AI coding tool", "the Agent tool" → "the subagent dispatch tool", model: "inherit" is removed entirely.
Rewrites $CLAUDE_PROJECT_DIR — to .. in providers/ (relative to the cloned repo), or to absolute paths into the cloned source repo when installing into a user's project.
Maps effort: frontmatter to model_reasoning_effort in Codex TOML.
Caps body length — Copilot agents are truncated at 30,000 chars (Copilot's hard limit). Windsurf rules are chunked at 12,000 chars (workspace) / 6,000 chars (global).
Adds Copilot subagent links — orchestrator agents (chain-builder, correlator, recon-ranker) get an agents: list of siblings so Copilot wires the dispatch graph.

Installer management

pentest-agents list                      # detect which targets are installed
pentest-agents install --targets claude_code,codex --scope global
pentest-agents install --dry-run         # preview every file + JSON merge
pentest-agents verify                    # check manifest vs. disk (drift)
pentest-agents uninstall                 # reverse, restore .pa-backup files
pentest-agents render --targets all      # regenerate providers/<id>/
pentest-agents render --check            # drift gate (exit 1 if dirty)

Every install records a manifest (.pentest-agents/manifest.json for project scope, ~/.config/pentest-agents/manifest.json for global). Uninstall only removes files we wrote and surgically strips only the MCP/JSON keys we merged — your other settings are never touched. Conflicting writes back up the original as <path>.pa-backup and are restored on uninstall.

Workflow

New program:   /new → /sync → /brain init → /analyze → /surface → /hunt
Returning:     /resume <target> → /hunt or /autopilot
After finding: /validate → /chain → /report → /dupcheck → /submit → /learn
Batch triage:  /triage (7-Question Gate on all findings)

MCP Servers (2)

bounty-platforms (16 platforms)

HackerOne (full API), Bugcrowd, Intigriti, Immunefi (public), YesWeHack + 11 stubs. 7 MCP tools: list_platforms, get_program_scope, get_program_policy, search_hacktivity, sync_program, draft_report, submit_report.

writeup-search (BYO index)

Searchable knowledge base agents query during hunting and validation. 4 MCP tools:

search_writeups — semantic search (FAISS) or keyword search for prior art
get_writeup — full writeup content by ID
search_techniques — exploitation techniques by vuln class
search_payloads — curated payloads from rules/payloads.md

The writeup index is not bundled. Bulk-redistributing scraped hacktivity violates most platform ToS, so this repo ships the server only. The search_payloads + search_techniques fallback works out of the box; the semantic/keyword layers activate once you point the server at your own index.

Three search modes (auto-detected, graceful fallback):

Mode	Requires	Searches
FAISS (semantic)	`faiss-cpu`, `sentence-transformers`, your `metadata.db` + `index.faiss`	Your writeup corpus via vector embeddings
SQLite (keyword)	Your `metadata.db` only	Your writeup corpus via `LIKE` over the text column
Local (default)	Nothing — zero deps	`rules/payloads.md` + `skills/` shipped in this repo

Point the server at your index by dropping metadata.db (+ optionally index.faiss) into ~/.local/share/pentest-writeups/, or set WRITEUP_DB_DIR=/path/to/dir.

Expected schema (metadata.db): a SQLite file with at least one table containing columns id, title, url, and one text column (content / text / body / writeup). Row order in the table must match vector order in index.faiss when using semantic mode.

Build your own index — `rag-builder/`

The repo now ships a local RAG/FAISS builder under rag-builder/ that turns a list of GitHub / GitLab repositories into a metadata.db + index.faiss pair the writeup-search MCP server consumes. Destructive operations (clone, embed, write) are always gated behind --execute — running the CLI without it prints the plan and changes nothing, so you can never wipe an existing index by accident.

cd rag-builder

# 1. Inspect the plan — no network, no writes.
python3 build.py status
python3 build.py ingest                    # dry-run (the default)

# 2. Opt-in pre-flight: probe every URL with `git ls-remote` (network).
python3 build.py ingest --check-remotes    # ~5s for 141 repos at 16 workers

# 3. Actually clone + index every repo from repos.yaml into ./data/.
python3 build.py ingest --execute
python3 build.py ingest --execute --check-remotes   # skip unreachable first

# 4. Point the MCP server at the output.
export WRITEUP_DB_DIR="$PWD/data"
python3 ../mcp-writeup-server/server.py --test

rag-builder/repos.yaml ships with a 146-entry seed covering CTF archives, bug-bounty reports, payload collections, and research aggregators — edit freely. repos-skipped.yaml is loaded automatically as an exclusion list (override with --skip-list or --no-skip-list). config.yaml controls the embedding model (all-MiniLM-L6-v2 by default), host allowlist, clone size cap, and file-size ceiling. See rag-builder/README.md for the full reference.

CC Hooks (automatic cost tracking)

Configured in settings.json, fires automatically:

SubagentStop → cost_hook.py logs agent name + session to cost-tracking.json
Stop → logs session end
SessionStart → welcome message

Statusline shows live cost from session token data: $0.57

Commands (26)

Hunting & Analysis

Command	Description
`/hunt <target> [--vuln-class]`	Active hunting — searches writeup DB for techniques first, then tests with concrete payloads
`/autopilot <target>`	Autonomous loop with --paranoid/--normal/--yolo checkpoints
`/surface <target>`	P1/P2/Kill ranked attack surface
`/chain`	Build A→B→C exploit chains via chain-builder agent (9 capability rows + 4 documented deep chains in `rules/chain-table.md`)
`/analyze <target>`	AI analysis: crown jewels, attack paths, blind spots
`/mindmap <target>`	Attack surface tree with brain status
`/sast <repo>`	Source-code vulnerability hunting (entry → flow → gap → exploit pipeline)

Validation & Reporting

Command	Description
`/validate <finding>`	7-Question Gate → PASS/KILL/DOWNGRADE/CHAIN REQUIRED
`/triage`	Batch-validate ALL findings, kill weak ones
`/quality <draft>`	Score report 1-10 (blocks below 7)
`/report [format]`	Reports (hard gate: requires /validate PASS)
`/dupcheck <desc>`	Hacktivity + writeup DB for duplicates
`/submit <finding>`	Submit (hard gate: /validate PASS + /quality ≥ 7)

Session & Memory

Command	Description
`/resume <target>`	Resume — untested endpoints + suggestions
`/remember`	Log finding/pattern for cross-target learning
`/learn <id> <status>`	Record response — auto-boosts paid techniques
`/brain`	init, brief, status, endpoint, endpoints, record, exhausted

Infrastructure

Command	Description
`/new`, `/sync`, `/status`	Setup + dashboard
`/pipeline`, `/quickscan`, `/fullscan`	Scanning pipelines
`/correlate`	Chain discovery across findings
`/cost`, `/monitor`	Cost tracking, target change detection

Agents (50)

H1 Weakness Specialists (19)

xss-hunter (#60/#61/#62), sqli-hunter (#67), csrf-hunter (#57), ssrf-hunter (#75), ssti-hunter (#74), idor-hunter (#55), auth-tester (#27), info-disclosure (#18), open-redirect (#38), rce-hunter (#70), xxe-hunter (#63), file-upload (#39), cors-hunter (#58), subdomain-takeover (#145), business-logic (#28), race-condition (#29), privilege-escalation (#26), oauth-hunter (#1/#22/#106/#137), llm-ai-hunter (chains under #18/#55/#61/#70/#106)

Hunting & Analysis (3)

validator — 7-Question Gate + never-submit list (PASS/KILL/DOWNGRADE/CHAIN)
chain-builder — A→B chain walk against the capability table, searches writeup DB for proven chains
recon-ranker — P1/P2/Kill surface ranking

Infrastructure / Recon (10)

recon, vuln-scanner, config-auditor, cloud-recon, js-analyzer, waf-profiler, graphql-audit, nuclei-writer, browser-agent (Burp MCP), browser-stealth-agent (Camoufox)

Meta / Validation (9)

brain, correlator, quality-check, monitor, poc-builder, report-writer, scope-check, browser-verifier (client-side PoC proof), dast-devils-advocate (adversarial downgrade)

SAST Pipeline (8)

sast-file-ranker, sast-entry-mapper, sast-danger-mapper, sast-flow-tracer, sast-gap-analyzer, sast-devils-advocate, sast-hunter, sast-exploit-builder

Specialized (1)

web3-auditor — Solidity grep arsenal, Foundry PoC, DeFi patterns

Hunting Skills (5 deep methodology skills + 6 reference skills = 11)

The hunt-* skills are vuln-class-specific methodology files distilled from public bug-bounty reports. Each has a verified 2024-2026 CVE catalog and sub-techniques. The matching specialist agent reads its skill via Read $CLAUDE_PROJECT_DIR/skills/hunt-<class>/SKILL.md before testing.

Skill	Lines	Pairs With	Highlights
`skills/hunt-rce/SKILL.md`	1,135	rce-hunter	1,218-report distillation. RSC CVE-2025-55182, runc Leaky Vessels, BentoML pickle, LangChain REPL, Tekton/OpenProject git arg injection, ingress-nginx, container/runtime, ML serving, agentic LLM tool-use, OSS supply chain
`skills/hunt-idor/SKILL.md`	969	idor-hunter	1,117-report distillation. Sam Curry automotive chain, OneUptime CVE-2026-30956, Zitadel V2Beta/Mgmt API, Inforcer tenant enum, Apache Answer UUIDv1 prediction, Indico BOLA, GraphQL field-level pivots, agentic AI cross-tenant
`skills/hunt-xss/SKILL.md`	968	xss-hunter	DOMPurify mXSS family, Auth0 nextjs-auth0 returnTo, RSC DoS family, markdown-to-jsx, listmonk admin-ATO, Trix rich-text editor (H1 #2819573 / #2521419), Jupyter notebook XSS (GHSA-rch3-82jr-f9w9), n8n MCP OAuth XSS (GHSA-537j-gqpc-p7fq), LinkedIn-class iframe-in-article (H1 #2212950), 10 sub-techniques (A-J), Semgrep / ast-grep / ripgrep / CodeQL patterns
`skills/hunt-oauth/SKILL.md`	770	oauth-hunter	365-report distillation. ruby-saml parser differentials, Authentik regex `redirect_uri`, workers-oauth-provider PKCE downgrade, Entra ID actor token, Hono JWT alg confusion, nOAuth, Tekton token exfil, Argo CD project token, tinyauth
`skills/hunt-llm-ai/SKILL.md`	930	llm-ai-hunter	OWASP LLM Top 10 v2025 + Agentic AI Top 10. Microsoft 365 Copilot ASCII Smuggling, LangChain GmailToolkit indirect injection (CVE-2025-46059), LangChain PythonREPLTool semantic RCE (CVE-2025-68613), BentoML pickle, Ollama RCE family, Open WebUI SSE injection, MLflow path traversal

Reference skills (read by methodology-aware agents): hunting-methodology, recon-methodology, report-writing, sast-methodology, triage-validation, vuln-classes.

CLI Tools (19)

Tool	Purpose
brain.py	Brain with endpoint tracking + circuit breaker
intel_engine.py	Hacktivity patterns + tech→vuln mapping
journal.py	JSONL session journal for /resume
target_selector.py	Program ROI ranking
cost_hook.py	CC hook: auto-logs agent completions via SubagentStop
statusline.py	Dashboard (--compact/--watch/--json)
scope_check.py	Scope validation with --list
scope_hook.py	PreToolUse hook: blocks out-of-scope Bash commands (exact + wildcard)
cvss_version_guard.py	Enforces H1 = CVSS 3.1, other platforms = CVSS 4.0
file_path_guard.py	Blocks hallucinated file paths in reports
file_safety.py	Shared safety checks for agent-written files
dedup_findings.py	Dedup + hacktivity cross-reference
global_brain.py	Cross-engagement knowledge (incremental hash-based sync)
response_tracker.py	Response learning + auto-boost paid techniques
scaffold.py	Workspace scaffolding with update mode
capture.py	Screenshots + video (WSL2)
cost.py	Token cost tracking + ROI
camofox_ctl.sh	Camoufox (stealth Firefox) lifecycle — Cloudflare/Akamai bypass
pentest-statusline.sh	CC statusline: findings, brain, context, cost

Rules Library (`rules/`)

Single source of truth for every agent — all hunters, validators, and report-writers read the relevant files at session start.

File	Lines	Purpose
`hunting.md`	360	31 hunting rules (Rule 0 harm check, Rule 8 sibling check, Rule 9 A→B signal, Rule 19 never-submit, Rule 24 mutation matrix, Rule 28 detection-token rotation, Rule 30 no cross-region inference, Rule 31 unauth state-change battery)
`payloads.md`	2,605	XSS (incl. Detection Mechanism Rotation Ladder) / SSRF / SQLi / IDOR / OAuth / upload / race / SSTI / deser / JWT / LFI / prototype pollution / NoSQLi / DeFi
`techniques.md`	389	Proven attack techniques extracted from real paid engagements
`waf-bypass-protocol.md`	166	WAF bypass iteration ladder for Akamai/Cloudflare/Imperva
`vendor-status.md`	127	Patched vendor vectors, framework fingerprints, cooldown tables
`chain-table.md`	192	Capability→next-bug chain table for `/chain` (9 capability rows + 4 documented deep chains)
`never-submit.md`	42	Never-submit list + conditionally-valid-with-chain table
`mistakes.md`	665	Top 10 most common mistakes — every agent reads this at session start

Key Features

Writeup search MCP: Agents query prior art during hunting — bring your own FAISS/SQLite writeup index, or fall back to the shipped payload/technique library
CC hooks: SubagentStop/Stop auto-log costs, statusline shows live $X.XX from token data
PreToolUse scope hook: Bash commands are matched (exact + wildcard) against scope.yaml; out-of-scope targets are blocked before the tool call fires
7-Question Gate: Every finding validated — first NO = KILL
Depth Engine: /autopilot enforces an anti-shallow protocol — no claim of "exhausted" until the exhaustion matrix is complete
Stacked-encoding mandate: /hunt and /autopilot require multi-layer encoding in every payload attempt before declaring a surface clean
CVSS policy guard: HackerOne findings use CVSS 3.1; every other platform uses CVSS 4.0 — enforced by cvss_version_guard.py
Circuit breaker: 5× consecutive 403/429 → auto-backoff 60s
Endpoint tracking: Brain records every endpoint tested per target
Hard validation gates: /report and /submit refuse without /validate PASS
Never-submit filter: Pipeline auto-kills informational findings
Incremental sync: Global brain hash-based, skips unchanged files
Feedback loop: /learn auto-boosts paid techniques globally
Session journal: JSONL log for /resume continuity

Requirements

Python 3.10+, uv (MCP servers launch via uv run --with mcp)
Optional: uv pip install faiss-cpu sentence-transformers (for writeup semantic search)
Security tools: nmap, httpx, subfinder, nuclei, ffuf, katana, sqlmap
GraphQL hunter tools: graphql-path-enum — cargo install --git https://gitlab.com/dee-see/graphql-path-enum (auto-installed by setup-mcp.sh if cargo is present)
Evidence: grim/scrot, wf-recorder/ffmpeg
jq (for statusline)

License

For authorized security testing only. Follow responsible disclosure.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.claude		.claude
docs		docs
hooks		hooks
mcp-bounty-server		mcp-bounty-server
mcp-writeup-server		mcp-writeup-server
providers		providers
rag-builder		rag-builder
rules		rules
skills		skills
tests		tests
tools		tools
wordlists		wordlists
.gitignore		.gitignore
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
pyproject.toml		pyproject.toml
scope.yaml.example		scope.yaml.example

Folders and files

Latest commit

History

Repository files navigation

Pentest Agent Suite for Claude Code

Quick Start

Install (Claude Code + 6 other AI coding tools)

What gets translated

Installer management

Workflow

MCP Servers (2)

bounty-platforms (16 platforms)

writeup-search (BYO index)

Build your own index — rag-builder/

CC Hooks (automatic cost tracking)

Commands (26)

Hunting & Analysis

Validation & Reporting

Session & Memory

Infrastructure

Agents (50)

H1 Weakness Specialists (19)

Hunting & Analysis (3)

Infrastructure / Recon (10)

Meta / Validation (9)

SAST Pipeline (8)

Specialized (1)

Hunting Skills (5 deep methodology skills + 6 reference skills = 11)

CLI Tools (19)

Rules Library (rules/)

Key Features

Requirements

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Build your own index — `rag-builder/`

Rules Library (`rules/`)

Packages