| Layer | Directory | Build Tag | Dependencies | CI |
|---|---|---|---|---|
| Unit | internal/*/ |
none | none | always |
| Integration | tests/integration/ |
none | W&B API key | always |
| Scenario | tests/scenario/ |
scenario |
fake-claude, fake-gh, all 4 tool binaries | CI default (L1+L2) |
| E2E | tests/e2e/ |
e2e |
Docker, real services | manual / nightly |
- Located in
internal/*/alongside production code - No build tags required
- Minimize mock usage; prefer real code
- Run:
go test ./internal/... -count=1
- Located in
tests/integration/ - No build tags required
- Test component interactions with real external services (e.g., W&B API)
- Run:
go test ./tests/integration/... -count=1
- Located in
tests/scenario/ - Build tag:
//go:build scenario - Requires all 4 sibling tool repos at the same parent directory
- TestMain builds all 4 binaries + fake-claude + fake-gh
- Override sibling paths with env vars:
PHONEWAVE_REPO,SIGHTJACK_REPO,PAINTRESS_REPO,AMADEUS_REPO
| Level | Focus | Timeout |
|---|---|---|
| L1 | Single closed loop | 120s |
| L2 | Multi-issue scenarios | 180s |
| L3 | Concurrent operations | 300s |
| L4 | Fault injection, recovery | 600s |
Run: just test-scenario (L1+L2) or just test-scenario-all
The Observer type (tests/scenario/observer_test.go) provides high-level assertion helpers for scenario tests. An Observer wraps a Workspace and testing.T to verify post-expedition state without low-level file inspection.
| Helper | Purpose |
|---|---|
AssertMailboxState |
Verify file counts in mailbox directories |
AssertAllOutboxEmpty |
Verify all tool outboxes are empty |
AssertArchiveContains |
Check archive contains specific D-Mail kinds |
AssertDMailKind |
Verify a D-Mail file has the expected kind |
WaitForClosedLoop |
Block until the expedition loop completes |
AssertExpeditionJournalExists |
Verify journal was written |
AssertJournalWritten |
Verify minimum journal entry count |
AssertGommageEvent |
Verify gommage event with expected failure count |
AssertEventInJSONL |
Verify an event type exists in the JSONL event store |
AssertPromptContainsLumina |
Verify Lumina content appears in the prompt |
AssertPromptNotContainsLumina |
Verify Lumina content is absent from the prompt |
AssertNotifyFailOpen |
Verify notification failure does not block the loop |
AssertWorktreeCount |
Verify worktree count in Swarm Mode |
AssertExpeditionCount |
Verify total expedition count |
AssertEscalationEvent |
Verify escalation event was emitted |
AssertInboxProcessedAll |
Verify all inbox D-Mails were consumed |
AssertPRReviewGateNotCalled |
Verify review gate was skipped |
AssertExpeditionTimedOut |
Verify expedition hit timeout |
AssertPromptContainsField |
Verify a field substring appears in the prompt |
AssertLuminaInsightFile |
Verify Lumina insight file matches a pattern |
AssertInsightsFileExists |
Verify insights file exists on disk |
AssertBugsFoundInJSONL |
Verify bug events in the JSONL store |
AssertNotifyArgvContains |
Verify notification command arguments |
AssertReportDMailFields |
Verify report D-Mail contains required fields |
AssertGommageRecoveryEvent |
Verify gommage.recovery event with class and retry count |
AssertCheckpointEvent |
Verify expedition.checkpoint event with phase and workdir |
The SPRTEvaluator (internal/domain/sprt.go) implements the Sequential Probability Ratio Test for detecting expedition success rate regressions. Based on AgentAssay (arXiv:2603.02601) defaults, it compares observed success rates against null (P0=0.70) and alternative (P1=0.85) hypotheses, yielding PASS, FAIL, or INCONCLUSIVE verdicts. Scenario tests use SPRT to gate multi-expedition runs.
Property-based tests use testing/quick (stdlib) to verify invariants hold for arbitrary operation sequences. Located in internal/harness/policy/:
- GradientGauge bounds: Level never goes below 0 or above max regardless of Charge/Discharge/Decay sequence
- GradientGauge monotonicity: Charge never decreases level; Discharge always resets to 0
- GradientGauge JSON round-trip: Marshal/Unmarshal preserves gauge state
- Located in
tests/e2e/ - Build tag:
//go:build e2e - Docker compose based (
tests/e2e/compose-e2e.yaml) - All dependencies must be real — mocks are strictly prohibited
- Run:
just test-e2e(requires Docker)
Unit tests prefer external test packages (package xxx_test) over white-box packages (package xxx). External tests exercise only the public API surface, which:
- Validates the API contract that external consumers depend on
- Catches accidental API breakage through compilation
- Permits internal refactoring without test changes
- Reduces coupling between tests and implementation details
White-box tests (package xxx) are reserved for cases that require access to unexported symbols (e.g., testing internal state machines, concurrency internals). Bridge constructors in export_test.go files expose specific unexported symbols for external tests when needed.
The package-audit CI job enforces minimum external test ratios:
| Scope | Threshold |
|---|---|
internal/ |
>= 60% |
internal/session/ |
>= 65% |
Run locally: just test-package-audit
Every same-package test file (package xxx, not package xxx_test) must include a // white-box-reason: comment immediately after the package declaration, explaining why public API testing is insufficient.
Format: // white-box-reason: <concise reason referencing unexported symbols>
The package-audit CI job and just test-package-rationale-audit enforce this requirement. New same-package test files without the comment will fail CI.
| Command | Purpose | Dependencies |
|---|---|---|
just lint |
Full lint pass | vet, semgrep, root-guard, nosemgrep-audit, lint-md |
just check |
Pre-commit gate | fmt, vet, semgrep, root-guard, nosemgrep-audit, test, docs-check |
just semgrep |
Semgrep ERROR rules | semgrep |
just nosemgrep-audit |
Validate nosemgrep tags | grep/awk |
just semgrep-test |
Test semgrep rules against fixtures | semgrep |
| Job | Steps |
|---|---|
semgrep |
just semgrep + just nosemgrep-audit + just semgrep-test |
package-audit |
threshold check (inline) + just test-package-rationale-audit |
test |
build + vet + test + race |
docs-check |
docgen + dead links + vocabulary |
just lintfails locally: fix the issue before committing.just nosemgrep-auditfails: add[permanent]or[expires: YYYY-MM-DD]tag to the nosemgrep annotation.just semgrepfails: fix the code or add a tagged nosemgrep annotation if false positive.
# Unit + integration tests (default CI)
just test
# Scenario tests (L1+L2, CI default)
just test-scenario
# E2E (requires Docker)
just test-e2e
# All semgrep rules
just semgrep
just semgrep-test
just semgrep-warnings