Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d0568fb
feat(router): unify chat adapters on LiteLLM (#379)
franconicola May 23, 2026
1b3dedf
refactor(router): extract envelope helpers (#379 Phase A)
franconicola May 23, 2026
67cd38f
refactor(router): drive adapters from ProviderConfig (#379 Phase B)
franconicola May 23, 2026
c14b2e0
refactor(router): hoist call path into AgentRouter (#379 Phase C)
franconicola May 23, 2026
68746c7
feat(router): capture I/O via LiteLLM CustomLogger (#379 Phase D)
franconicola May 23, 2026
e065713
refactor(router): move ADK CustomLLM to providers/ (#379 Phase E)
franconicola May 23, 2026
0b0a2e5
feat(router): surface response_cost + call_id, unify status_code (#37…
franconicola May 23, 2026
c3090e9
refactor(router): refactor ADKAgent off LiteLLMAgent (#379 Phase E.2a)
franconicola May 23, 2026
5b7b9af
refactor(router): chat AgentTypes use _ChatRegistration (#379 Phase E…
franconicola May 23, 2026
e3de58f
refactor(router): delete chat adapter classes (#379 Phase E.2c)
franconicola May 23, 2026
fe5c624
refactor(router): namespace metadata under metadata['hackagent'] (#37…
franconicola May 23, 2026
7e5d9ad
refactor(router): delete adapters/ folder (#379 Phase F.3)
franconicola May 23, 2026
59216e7
docs(examples): multi-provider LiteLLM demo (#379 Phase F.4)
franconicola May 23, 2026
7d94736
chore(router): finalise #379 — drop plan doc, add router integration …
franconicola May 23, 2026
c0ce615
docs(sidebars): refresh router section after #379 (#388 CI fix)
franconicola May 23, 2026
605a8e1
ci: point integration jobs at tests/integration/router/ (#388 CI fix)
franconicola May 23, 2026
16500dd
fix(router): stop leaking backend api_key to Ollama + pull gemma3:4b …
franconicola May 23, 2026
9fcabb1
📝 docs(update): updating docs
franconicola May 23, 2026
25e00dc
ci: fold slow integration tests into PR CI, drop nightly cron
franconicola May 23, 2026
f46a284
test: route classifier to suite's small model + bump heavy-test timeouts
franconicola May 23, 2026
8a4ad1d
ci: shard Ollama integration into fast/slow + bump remaining ADK time…
franconicola May 23, 2026
a3c9b5d
test: delete e2e duplicate, relocate misclassified integration tests
franconicola May 23, 2026
340fba4
ci: drop deleted tests/integration/storage/ from offline job (#388 fix)
franconicola May 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 34 additions & 9 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,11 @@ jobs:
fail_ci_if_error: true

# ── Integration (offline) ─────────────────────────────────────────────────
# Covers tests/integration/storage/ (LocalBackend + SQLite) and
# tests/integration/tui/ (mock-based). No external services required.
# Covers tests/integration/tui/ — mock-based TUI integration tests
# that exercise widget composition + lifecycle without external
# services. (The former tests/integration/storage/ moved to
# tests/unit/server/storage/ since it only exercised the in-memory
# local backend.)
integration-offline:
name: Integration Tests (Offline)
runs-on: ubuntu-latest
Expand All @@ -161,7 +164,6 @@ jobs:
- name: Run offline integration tests with coverage
run: >
uv run pytest
tests/integration/storage/
tests/integration/tui/
--run-integration
-n auto
Expand All @@ -180,17 +182,32 @@ jobs:
retention-days: 1

# ── Integration (Ollama) ──────────────────────────────────────────────────
# Covers tests/integration/adapters/ and tests/integration/attacks/.
# Covers tests/integration/router/ and tests/integration/attacks/.
# Requires a running Ollama instance with tinyllama.
#
# Sharded across two runners (real CPUs, real parallelism):
# - shard=fast → ``-m "not slow"`` (the bulk; ~few minutes)
# - shard=slow → ``-m "slow"`` (advprefix multi-judge; ~14 min on CPU)
# Within each shard pytest-xdist spreads tests across runner cores
# with ``-n auto --dist=loadfile`` and Ollama is allowed to serve
# multiple concurrent requests via ``OLLAMA_NUM_PARALLEL=4``.
integration-ollama:
name: Integration Tests (Ollama)
name: Integration Tests (Ollama, ${{ matrix.shard }})
runs-on: ubuntu-latest
timeout-minutes: 30
if: github.event_name == 'pull_request' && github.base_ref == 'main'
strategy:
fail-fast: false
matrix:
shard:
- fast
- slow
env:
HACKAGENT_API_KEY: ${{ secrets.HACKAGENT_API_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
OLLAMA_MODEL: tinyllama
# Let Ollama serve concurrent requests from pytest-xdist workers.
OLLAMA_NUM_PARALLEL: "4"
TEST_MAX_TOKENS_FAST: "15"
TEST_MAX_TOKENS_MEDIUM: "25"
TEST_MAX_TOKENS_SLOW: "40"
Expand Down Expand Up @@ -222,28 +239,36 @@ jobs:
ollama-models-tinyllama-

- name: Pull Ollama model
# Integration tests reuse tinyllama for the target, attacker,
# judges, and category classifier (via
# ``_fast_classifier_config``). The orchestrator's implicit
# default classifier (``gemma3:4b``) is much slower on CPU
# runners and not pulled here on purpose.
run: ollama pull tinyllama

- name: Install dependencies
run: uv sync --group dev

- name: Run Ollama integration tests with coverage
# Each shard handles one ``-m`` selector so the slow advprefix
# test (~14 min on CPU) runs on its own runner instead of
# bottlenecking the rest of the suite.
run: >
uv run pytest
tests/integration/adapters/
tests/integration/router/
tests/integration/attacks/
--run-integration
-n 2
-n auto
--dist=loadfile
-m "not slow"
-m "${{ matrix.shard == 'slow' && 'slow' || 'not slow' }}"
-v --tb=short
--cov --cov-fail-under=0
--cov-report=xml:reports/coverage.xml

- name: Upload Ollama-integration coverage artifact
uses: actions/upload-artifact@v7
with:
name: coverage-integration-ollama
name: coverage-integration-ollama-${{ matrix.shard }}
path: reports/.coverage
include-hidden-files: true
retention-days: 1
Expand Down
70 changes: 0 additions & 70 deletions .github/workflows/nightly.yml

This file was deleted.

13 changes: 11 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,17 @@ repos:
- repo: local
hooks:
- id: pytest
name: pytest
entry: uv run pytest --run-integration --ignore=tests/e2e/attacks
name: pytest (unit only)
# Integration + e2e tests run in GitHub Actions (see
# ``.github/workflows/ci.yml``). The local pre-commit only
# runs the unit suite so commits stay snappy. To run the full
# integration suite locally on demand:
# uv run pytest tests/integration/ --run-integration
#
# ``-n 4`` (not ``-n auto``) so the hook works both on 4-vCPU
# CI runners and on shared HPC login nodes that advertise 64+
# logical CPUs but enforce per-user thread limits.
entry: uv run pytest tests/unit/ -n 4
language: system
pass_filenames: false
files: ^(.*\.py|pyproject\.toml|poetry\.lock|.*requirements.*\.txt|.*package\.json|.*package-lock\.json)$
2 changes: 1 addition & 1 deletion docs/docs/api-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ For practical usage examples, see the [Python SDK Quickstart](./sdk/python-quick

---

*Auto-generated from hackagent v0.6.0.*
*Auto-generated from hackagent v0.10.1.*
26 changes: 12 additions & 14 deletions docs/docs/cli/initialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,26 +19,24 @@ The initialization wizard will:
1. **Display the HackAgent ASCII logo**
2. **Set verbosity level** — Control logging detail (0=ERROR to 3=DEBUG)
3. **Save configuration** — Stored in `~/.config/hackagent/config.json`
HACKAGENT_BANNER = """

"""
## Example Session

```bash
$ hackagent init

╭────────────────────────────────────────────────────────────────────────╮
│ │
│ │
│ │
│ ███████╗███████╗ ██████╗███████╗██╗ ██╗██╗ ██╗██╗ ██╗ █████╗ │
│ ██╔════╝██╔════╝██╔════╝██╔════╝██║ ██║██║ ██║██║ ██║██╔══██╗ │
│ ███████╗█████╗ ██║ █████╗ ██║ ██║███████║██║ ██║███████║ │
│ ╚════██║██╔══╝ ██║ ██╔══╝ ╚██╗ ██╔╝╚════██║██║ ██║██╔══██║ │
│ ███████║███████╗╚██████╗███████╗ ╚████╔╝ ██║███████╗██║██║ ██║ │
│ ╚══════╝╚══════╝ ╚═════╝╚══════╝ ╚═══╝ ╚═╝╚══════╝╚═╝╚═╝ ╚═╝ │
│ │
│ │
│ │
╰────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────────────────────────────────╮
│ │
│ ██╗ ██╗ █████╗ ██████╗██╗ ██╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗ │
│ ██║ ██║██╔══██╗██╔════╝██║ ██╔╝██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝ │
│ ███████║███████║██║ █████╔╝ ███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║ │
│ ██╔══██║██╔══██║██║ ██╔═██╗ ██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║ │
│ ██║ ██║██║ ██║╚██████╗██║ ██╗██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║ │
│ ╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝ │
│ │
╰──────────────────────────────────────────────────────────────────────────────────╯

🔧 HackAgent CLI Setup Wizard
Welcome! Let's get you set up for AI agent security testing.
Expand Down
12 changes: 9 additions & 3 deletions docs/docs/hackagent/agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,14 @@ attack methodologies.
def __init__(endpoint: str,
name: Optional[str] = None,
agent_type: Union[AgentTypeEnum, str] = AgentTypeEnum.UNKNOWN,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
raise_on_unexpected_status: bool = False,
timeout: Optional[float] = None,
metadata: Optional[Dict[str, Any]] = None,
target_config: Optional[Dict[str, Any]] = None,
adapter_operational_config: Optional[Dict[str, Any]] = None)
adapter_operational_config: Optional[Dict[str, Any]] = None,
thinking: Optional[bool] = None)
```

Initializes the HackAgent client and prepares it for interaction.
Expand Down Expand Up @@ -75,6 +78,10 @@ attack strategies.
generation defaults such as `name`4, `name`5,
and `name`0.
- `name`7 - Optional configuration for the agent adapter.
- `name`8 - Optional OLLAMA-only control for reasoning traces.
When set to `False`, requests sent through the target OLLAMA adapter
include `agent_type`0 to disable thinking output. Ignored for
non-OLLAMA target agent types.

#### attack\_strategies

Expand All @@ -91,8 +98,7 @@ Lazy-loaded attack strategies dictionary.
def hack(attack_config: Dict[str, Any],
run_config_override: Optional[Dict[str, Any]] = None,
fail_on_run_error: bool = True,
_tui_app: Optional[Any] = None,
_tui_log_callback: Optional[Any] = None) -> Any
_tui_event_bus: Optional[Any] = None) -> Any
```

Executes a specified attack strategy against the configured victim agent.
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/hackagent/attacks/evaluator/evaluation_step.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ Prepare evaluated items for backend sync:
- Add _run_id if missing
- Ensure result_id exists
- Build judge_keys
- Call _sync_to_server
- Call _sync_to_server (only if not already synced by the attack)

#### get\_statistics

Expand Down
22 changes: 12 additions & 10 deletions docs/docs/hackagent/attacks/evaluator/sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,13 @@ Usage:
#### update\_single\_result

```python
def update_single_result(result_id: str,
success: bool,
evaluation_notes: str,
backend: Any,
logger: Optional[logging.Logger] = None) -> bool
def update_single_result(
result_id: str,
success: bool,
evaluation_notes: str,
backend: Any = None,
logger: Optional[logging.Logger] = None,
metadata_updates: Optional[Dict[str, Any]] = None) -> bool
```

Update a single Result's evaluation status via the storage backend.
Expand Down Expand Up @@ -62,19 +64,19 @@ def sync_evaluation_to_server(

Sync evaluation results to the server, aggregating the best per result_id.

Multiple completion rows may share the same `result_id` (one per goal).
Multiple completion rows may share the same ``result_id`` (one per goal).
This function aggregates to find the best (success wins over failure)
evaluation per `result_id`, then PATCHes the server once per goal.
evaluation per ``result_id``, then PATCHes the server once per goal.

**Arguments**:

- `evaluated_data` - List of dicts with evaluation results. Each dict
should contain `result_id` and evaluation score keys.
should contain ``result_id`` and evaluation score keys.
- `client` - Authenticated client for API calls.
- `logger` - Optional logger instance.
- `judge_keys` - Optional list of dicts mapping judge types to their
column names, e.g. ``[\{"key": "eval_jb", "explanation": "explanation_jb",
- `1 - "JailbreakBench"}]`. If None, auto-detects from
column names, e.g. ``[{"key": "eval_jb", "explanation": "explanation_jb",
- ``1 - "JailbreakBench"}]``. If None, auto-detects from
known column patterns.


Expand Down
4 changes: 2 additions & 2 deletions docs/docs/hackagent/attacks/objectives/base.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,11 @@ Usage:
)

# Use in attack configuration
attack_config = \{
attack_config = {
"objective": "prompt_injection",
"technique": "advprefix", # or "template"
"goals": [...]
\}
}

#### \_\_init\_\_

Expand Down
8 changes: 4 additions & 4 deletions docs/docs/hackagent/attacks/orchestrator.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,7 @@ def execute(attack_config: Dict[str, Any],
fail_on_run_error: bool,
max_wait_time_seconds: Optional[int] = None,
poll_interval_seconds: Optional[int] = None,
_tui_app: Optional[Any] = None,
_tui_log_callback: Optional[Any] = None) -> Any
_tui_event_bus: Optional[Any] = None) -> Any
```

Execute attack with server tracking.
Expand All @@ -108,8 +107,9 @@ Standard workflow:
- `fail_on_run_error` - Whether to raise on errors
- `max_wait_time_seconds` - Unused for local execution
- `poll_interval_seconds` - Unused for local execution
- `_tui_app` - Optional TUI app for logging
- `_tui_log_callback` - Optional TUI log callback
- `_tui_event_bus` - Optional :class:`hackagent.cli.tui.events.TUIEventBus`
that receives structured events (step start/end, tool calls,
progress, etc.) during execution.


**Returns**:
Expand Down
8 changes: 4 additions & 4 deletions docs/docs/hackagent/attacks/shared/response_utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ def extract_response_content(
Extract text content from an LLM response in various formats.

Handles the following response formats:
1. **OpenAI-style object** — `response.choices[0].message.content`
2. **Dictionary** — `response["generated_text"]` or
`response["processed_response"]`
1. **OpenAI-style object** — ``response.choices[0].message.content``
2. **Dictionary** — ``response["generated_text"]`` or
``response["processed_response"]``
3. **String** — returned as-is
4. **None / empty** — returns None

Expand All @@ -58,7 +58,7 @@ Handles the following response formats:
>>> # OpenAI-style response
>>> content = extract_response_content(openai_response)
>>> # Dict-style response
>>> content = extract_response_content(\{"generated_text": "Hello!"\})
>>> content = extract_response_content({"generated_text": "Hello!"})
>>> # Plain string
>>> content = extract_response_content("Hello!")

Loading
Loading