You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This plan enumerates the 28 open issues with code landed on the branch, each with a self-contained test scenario a Claude subagent can execute. The intent is parallel dispatch: 4 worker subagents pick up groups by isolation requirement (pure unit, in-memory store, real HTTP MCP, CI/manifest).
Branch under test:staging/api-hardening Base for diff:main Commit range:git log main..staging/api-hardening (152 commits)
docs: stale tool count + self-host guidance in plugin-install.md
docs/install/plugin-install.md
D
332
dedup_action="merged" returns independent new entry_id (PR #341 merged)
mcp/tools/crud.py:_handle_store merge branch
B
333
resolve_review double-approve silently bumps version (PR #339 merged)
mcp/tools/classify.py:_handle_resolve_review
B
334
watch list liveness fields exposed but never populated (PR #338 merged)
feeds/poller.py, store/duckdb.py liveness writes
B
335
source=<url> vs feed_url=<url> diverge (PR #340 merged)
mcp/tools/crud.py:_build_filters_from_arguments
B
Update 2026-04-19: PRs #352, #353, #354, #358, #359, #360, #361 (issues 345–351) are now merged to staging/api-hardening. Group E scenarios run against staging directly — no separate branch checkouts required. PRs #356/#357 against main are superseded; staging itself is landing on main tonight.
Group key
A — pure schema/enum/static check. No runtime needed. Subagent reads source + runs targeted pytest.
B — in-memory async store + handler. Subagent runs pytest -k <pattern> against in-memory DuckDB fixture or invokes handler directly.
C — needs running MCP HTTP server (TLS/CORS/transport probe).
D — manifest/skill text/CI config. Read-only assertion against repo files.
E — issues 345–351 (now on staging). Subagent runs the listed pytest + direct calls against the staging worktree.
F — agent-driven E2E user journeys against a live MCP. Subagent acts as a real client, chaining tool calls across multiple issues per scenario.
Subagent dispatch strategy
Spawn 6 worker subagents in parallel, each owning one group (A, B, C, D, E, F). A 7th orchestrator (this conversation) collects reports and aggregates pass/fail. All groups share the staging/api-hardening worktree; Groups C and F additionally need a running HTTP MCP server.
Per-subagent prompt template:
You are testing the staging/api-hardening branch of distillery2. Checkout staging/api-hardening in a worktree before starting (or operate in the existing worktree if provided). For each scenario in the assigned group, execute the listed steps, capture actual output, and report PASS/FAIL with one-line evidence (test name, error message, or response snippet). Do not modify source. If a test fixture is missing, mark BLOCKED. Final report: markdown table | issue | scenario | result | evidence |.
Group B — In-memory store + handler (1 subagent, runs full pytest by topic)
Use tests/conftest.py fixtures: store, make_entry, deterministic_embedding_provider.
Each scenario: subagent runs the listed pytest pattern AND inlines a direct handler call to verify response shape against the issue's acceptance criteria.
Direct: _handle_store_batch({"entries":[{...},{...},{...}]}) → response has entry_ids (3), count==3, results list of 3 with persisted=True per entry.
_handle_watch({"action":"add","source_type":"github","url":"https://github.com/python/cpython","sync_history":True}) → response includes sync_job with job_id.
Direct: feed a fake GitHub issue payload with user.login=alice through GitHubSyncAdapter → resulting Entry has author=="alice" and metadata["imported_by"]=="gh-sync".
Direct: add a source, poll once via FeedPoller, then _handle_watch({"action":"list"}) → entry includes last_polled_at, last_item_count, last_error (null on success), next_poll_at.
Direct: kick off run_sync_job_async(...) against a stub adapter → SyncJobTracker.get(job_id) transitions PENDING→RUNNING→COMPLETED with pages_processed > 0.
#332 — dedup_action="merged" ghost id (PR #341 merged on staging)
pytest tests/ -k "merge or fold or dedup_action_merged" -v
Direct: configure dedup thresholds so a second store call lands in the merge band (≥0.80, <0.95). Call _handle_store(...) twice → second response: dedup_action == "merged", entry_id == first_entry_id (true fold; no fresh row). Verify with _handle_get(entry_id=first_entry_id) — content/refs folded into existing row, version incremented.
Negative: a "stored" path (similarity < 0.60) must NOT report dedup_action="merged".
#333 — resolve_review double-approve idempotency (PR #339 merged on staging)
pytest tests/test_mcp_classify.py -k "double_approve or idempotent or no_op" -v
Direct: seed entry with status="active", capture version=N. Call _handle_resolve_review({"action":"approve","entry_id":id}) → response indicates no-op (e.g. changed: false); entry version still N.
Repeat for archive on already-archived entry: no version bump, no audit-log duplicate.
#334 — watch list liveness actually populated (PR #338 merged on staging)
pytest tests/test_mcp_feeds.py -k "liveness or populate" -v
Direct: add a feed source, run FeedPoller.poll_once() against a stub adapter, then _handle_watch({"action":"list"}) → row has non-null last_polled_at AND non-null last_item_count (not just exposed-but-null). After a forced failure, last_error non-null and ≤200 chars.
Sync path: kick off a sync_history job; while RUNNING and after COMPLETED, list shows liveness fields update from sync writes too.
#335 — source=<url> aliases to feed_url (PR #340 merged on staging)
pytest tests/ -k "source_alias or feed_url_alias" -v
Direct: seed feed source https://x.test/rss with 3 entries. _handle_list({"source": "https://x.test/rss"}) and _handle_list({"feed_url": "https://x.test/rss"}) MUST return identical entry sets. _handle_list({"source": "manual"}) (enum value) still works as a source-type filter — alias only kicks in when the value parses as a URL.
Group C — Real HTTP MCP server (1 subagent, more setup)
TLS verify=True audit — grep all httpx.Client( and httpx.AsyncClient( callsites; assert each constructed with verify=True (or default which is True; flag any explicit verify=False).
Expected: empty output OR every match passes default verify (no verify=False anywhere).
Ownership on classify — pytest tests/ -k "ownership and classify" -v. Direct: as user-A, store entry; as user-B, call distillery_classify on it → FORBIDDEN.
CORS — start HTTP server with default config; curl -H 'Origin: https://evil.test' -i http://localhost:8765/mcp → response must NOT echo Access-Control-Allow-Origin: https://evil.test. Then start with cors_allowed_origins=["https://ok.test"] and confirm allowed origin echoes back.
Dep pinning — open pyproject.toml and assert upper bounds present on pyyaml, httpx, fastmcp, defusedxml.
Log retention — invoke /api/maintenance with bearer token; assert response includes search_logs_pruned: <n> field. Verify config defaults search_log_retention_days == 90.
Read skills/setup/SKILL.md and skills/setup/references/cron-payloads.md. Assert no occurrences of POST /hooks/poll, POST /api/maintenance, or HTTP-only references in cron sections.
Assert presence of distillery_list, distillery_watch, distillery_store tool calls in payload examples.
Read skills/briefing/SKILL.md. Assert it references distillery_list with stale_days parameter (not a missing distillery_stale tool).
Grep skills/ for distillery_stale → no matches outside historical changelogs.
#330 — docs: stale tool count + self-host guidance
Read docs/install/plugin-install.md (or docs/skills/index.md — wherever total tool count is published).
Assert published count matches the actual registered tool count (introspect distillery.mcp.server or count tool decorators in src/distillery/mcp/tools/).
Assert presence of a self-host section covering (a) DISTILLERY_CONFIG env var, (b) HTTP transport with GitHub OAuth, (c) plugin user-scope override.
If the doc still hardcodes the old count (e.g. "12 tools" when current is higher) → FAIL.
Group E — Issues 345–351 (now on staging)
All seven PRs are merged to staging/api-hardening. Tests run against the single staging worktree.
_handle_store({"content":"x","entry_type":"note"}) → INVALID_PARAMS with details.suggestion == "inbox", details.allowed is the 12-element canonical list, details.field == "entry_type", message contains Did you mean 'inbox'?.
Direct: open store, _handle_store(...), then inspect <db>.wal size — should be 0 or near-0 after the write (CHECKPOINT flushed it).
Recovery: simulate replay failure (mock _sync_initialize to raise), then trigger recovery → assert WAL renamed to *.wal.corrupt.<ts>, NOT unlinked. The original WAL bytes remain on disk under the new name.
Failure swallowing: monkeypatch the connection so CHECKPOINT raises → _handle_store still returns persisted=true; warning logged.
Stand up a stub HTTP server on :9100 returning 404 on /health and a valid JSON-RPC tools/list response on POST /mcp. Run hook with DISTILLERY_MCP_URL=http://localhost:9100/mcp → exit 0, briefing rendered (no longer no-ops on the 404).
Stub returning 401 on /mcp → hook exits 0 with [Distillery] briefing disabled — auth failed on stderr.
With DISTILLERY_BRIEFING_QUIET=1 set, the diagnostic stderr line MUST be suppressed.
Direct: store._rebuild_fts_index() twice in sequence → no Cannot drop entry "fts_main_entries" error.
Reproduce subprocess SIGKILL test: spawn child that opens store, writes, calls rebuild, then SIGKILLs itself before clean shutdown. Reopen store in parent → no WAL replay error, FTS searchable.
Inspect rebuild path: confirm _rebuild_fts_index calls PRAGMA create_fts_index(..., overwrite=1) (no manual DROP SCHEMA ... CASCADE left in the code). Confirm a CHECKPOINT follows.
With default config, run 600 embed calls in a loop (mock provider returning fast) → no EmbeddingBudgetError. Set embedding_budget_daily=10 and run 11 calls → 11th raises EmbeddingBudgetError.
429 path: monkeypatch Jina/OpenAI client to return HTTP 429 with Retry-After: 12 after retry exhaustion → EmbeddingRateLimitError raised; MCP tool surfaces INVALID_PARAMS with details.provider, details.endpoint, details.http_status==429, details.retry_after==12. WARNING line in logs includes provider name.
429 without Retry-After header → error still raised, retry_after field absent or null.
Follow-on commits (b247f3e, 516b694, b983903): OpenAI.embed() routes through embed_batch() (structured errors), non-finite Retry-After values pinned, provider errors propagate through store dedup precheck. Spot-check: inject Retry-After: inf → error surfaces with retry_after clamped/omitted (no traceback).
Group F — Agent-driven E2E user journeys (1 subagent, live MCP)
Each scenario drives a live staging MCP as a real client would: sequential tool calls across multiple issues, verifying behaviour observable from outside the server. No pytest — the subagent speaks MCP JSON-RPC (or uses the distillery CLI / Python client) and inspects responses.
Agent-driver protocol. The subagent MUST call MCP tools via JSON-RPC over HTTP (or the Python FastMCP client), NOT by importing handlers. Every scenario must close with a cleanup that drops or archives every entry/source it created.
Each scenario's pass criterion is a cross-cutting predicate — not just "a single field equals X", but "the chained workflow an agent would run actually works".
distillery_status → returns {status:"ok", tool_count, transport:"http"}. Record tool_count.
distillery_store content="Research note about Claude prompt caching TTL" entry_type="inbox" → response has persisted:true, dedup_action:"stored", entry_id set, conflicts present but each conflict object has NO conflict_prompt key (default off).
distillery_store content="Research note about Claude prompt caching TTL" entry_type="inbox" (same content) → persisted:false, dedup_action:"skipped", entry_id == existing_entry_id (no ghost). Then distillery_get entry_id=<returned> succeeds.
distillery_store content="Research note about Claude prompt caching time-to-live" entry_type="inbox" (near-dup, merge band) → dedup_action:"merged", entry_id == first_entry_id. Version on first entry incremented by 1 (verify with get).
distillery_list (no args) → default output_mode="summary" (response bytes < 1.5KB for 1 entry; no conflicts/versions/metadata on rows). No archived entries included.
distillery_store content="Similar research on cache" entry_type="inbox" include_conflict_prompt=true → each conflict object NOW has conflict_prompt (~1KB string). Response bytes ≥ 2x a default call.
Cleanup: archive created entries.
Pass: every assertion above holds. Fail: any response shape or field missing.
F2 — Entry-type alias suggestion flow (covers #232, #245, #345)
distillery_store content="todo: wire radar digest" entry_type="note" → error code:"INVALID_PARAMS", message contains Did you mean 'inbox'?, details.suggestion == "inbox", details.allowed is a 12-element array including "github", details.provided == "note".
Seed 5 entries via distillery_store_batch with entry_type="inbox", distinct content, all with status="pending_review" forced (or via classification that routes to review).
distillery_classify batch endpoint (via CLI: distillery classify --batch) with no filter → exit non-zero, stderr contains at least one filter.
distillery classify --batch --inbox → exits 0; processes all 5; output reports counts by disposition.
Start staging MCP against a fresh on-disk DB (not in-memory).
Store 10 entries in rapid succession.
Trigger FTS rebuild (either via a distillery_search call that forces rebuild, or direct CLI maintenance: distillery maintenance rebuild-fts). Do it twice in a row — no Cannot drop entry "fts_main_entries" error.
Kill the server with SIGKILL (not SIGTERM). Restart it against the same DB path.
Pass: a hard kill mid-write does not lose committed entries; operators retain the corrupt WAL for forensics.
F6 — Briefing hook dynamic transport (covers #303, #347)
The subagent runs the hook itself, not inside a Claude Code session.
With DISTILLERY_MCP_URL=http://localhost:8765/mcp set, run python scripts/hooks/session_start_briefing.py → exit 0, briefing text on stdout (recent entries, corrections, radar).
Unset env. Create a temp dir with a .mcp.json pointing at the same HTTP URL. Run the hook from that dir → resolves via .mcp.json, exit 0.
Stand up a stub on :9100 that returns 404 on /health and a valid JSON-RPC tools/list on POST /mcp. Set DISTILLERY_MCP_URL=http://localhost:9100/mcp → hook exits 0 with briefing rendered (no silent no-op on /health 404).
Stub returning 401 on /mcp → hook exits 0, stderr has [Distillery] briefing disabled.
curl -i -X POST -H 'Origin: https://evil.test' -H 'content-type: application/json' -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' $MCP → response does NOT echo Access-Control-Allow-Origin: https://evil.test.
Restart server with DISTILLERY_CORS_ORIGINS=https://ok.test. Repeat curl with Origin: https://ok.test → response echoes Access-Control-Allow-Origin: https://ok.test.
POST /api/maintenance with valid bearer → response body includes search_logs_pruned: <n>. Verify default retention: grep search_log_retention_days src/distillery/config.py shows default 90.
distillery_store_batch entries=[{...} x 20] with mixed content (some near-dup, most distinct). Response: entry_ids length 20, count == 20, results[i].persisted varies per item, results[i].dedup_action ∈ {"stored","skipped","merged"}.
Per-item error isolation: inject one invalid entry (missing required metadata for entry_type="github") into the batch → batch returns partial success; other 19 persist; the bad one has results[i].error populated, no exception leaked.
Call distillery_list with default paging → reflects only the unique/non-merged entries (no ghosts).
Call distillery_store_batch with output_mode="summary" against the same content → each item's response object is minimal (no conflicts, no dedup preview). Measured response bytes ≤ 30% of full-mode.
Cleanup.
Pass: bulk path correctly isolates per-item failures and honours the summary contract.
You are the Group F agent-driver. A staging Distillery MCP is running on $MCP and you are authenticated as loopback. For each F-scenario, execute every step AS IF YOU WERE A REAL MCP CLIENT: call tools via JSON-RPC over HTTP (curl or requests), NEVER by importing Python handlers. Before each scenario, snapshot distillery_list(limit=0) count. After each scenario, run the documented cleanup and confirm the count returns to snapshot ±0. Any unhandled exception, non-2xx response code on a step expected to succeed, or schema mismatch is a FAIL. Report | scenario | issues | result | evidence |, where evidence is the single curl/Python line that failed (or "all steps ok") per scenario.
Critical files reference
Purpose
Path
Error codes
src/distillery/mcp/tools/_errors.py
Validation helpers
src/distillery/mcp/tools/_common.py
Store/list/update handlers
src/distillery/mcp/tools/crud.py
Classify/resolve_review handler
src/distillery/mcp/tools/classify.py
Watch/gh-sync/store-batch handlers
src/distillery/mcp/tools/feeds.py
Status tool
src/distillery/mcp/tools/meta.py
Server registration
src/distillery/mcp/server.py
Auth
src/distillery/mcp/auth.py
Middleware (CORS, rate-limit)
src/distillery/mcp/middleware.py
Webhooks (incl. log pruning)
src/distillery/mcp/webhooks.py
DuckDB store + migrations
src/distillery/store/duckdb.py
GitHub sync adapter
src/distillery/feeds/github_sync.py
RSS adapter
src/distillery/feeds/rss.py
Poller
src/distillery/feeds/poller.py
Background jobs
src/distillery/feeds/sync_jobs.py
Truncation
src/distillery/feeds/truncation.py
Embedding (Jina, OpenAI)
src/distillery/embedding/{jina,openai}.py
SessionStart hook
scripts/hooks/session_start_briefing.py
Skill files
skills/{setup,watch,briefing,classify}/SKILL.md
CVE suppressions
.grype.yaml
Pyproject pins
pyproject.toml
Verification of the test plan itself
Before dispatching subagents, run a smoke check on the orchestrator:
git worktree add /tmp/distillery-test staging/api-hardening
cd /tmp/distillery-test
pip install -e ".[dev]" --quiet
pytest --collect-only -q | tail -5 # confirm pytest finds suite
ruff check src/ # confirm tree is buildable
If both pass, dispatch the four group subagents in parallel and aggregate | issue | scenario | result | evidence | tables into a single coverage matrix. Any FAIL or BLOCKED triggers a follow-up task on the originating issue.
Test Plan —
staging/api-hardeningOpen-Issue CoverageContext
staging/api-hardeningis 152 commits ahead ofmain. It bundles two waves of work:output_modeshrinkage, default-status filter, group-by additions, newdistillery_status/store_batchtools.Plus security follow-up #112 and CI/CVE #271.
This plan enumerates the 28 open issues with code landed on the branch, each with a self-contained test scenario a Claude subagent can execute. The intent is parallel dispatch: 4 worker subagents pick up groups by isolation requirement (pure unit, in-memory store, real HTTP MCP, CI/manifest).
Branch under test:
staging/api-hardeningBase for diff:
mainCommit range:
git log main..staging/api-hardening(152 commits)Open issues with landed work
distillery_storeenum omitsgithubdistillery_storeoutput_mode="summary"/gh-syncinvalidoutput_mode="metadata"group_by='tags'indistillery_listdistillery_tag_treepermissiondistillery_stalemissing → route to listdistillery_list(source=feed_url)returns 0output_mode=fullfloods contextdistillery_statusMCP toolresolve_reviewreviewer ignoredresolve_reviewreclassify leaves pending_reviewdedup_action="merged"returns independent new entry_id (PR #341 merged)resolve_reviewdouble-approve silently bumps version (PR #339 merged)source=<url>vsfeed_url=<url>diverge (PR #340 merged)Update 2026-04-19: PRs #352, #353, #354, #358, #359, #360, #361 (issues 345–351) are now merged to
staging/api-hardening. Group E scenarios run against staging directly — no separate branch checkouts required. PRs #356/#357 againstmainare superseded; staging itself is landing on main tonight.Group key
pytest -k <pattern>against in-memory DuckDB fixture or invokes handler directly.Subagent dispatch strategy
Spawn 6 worker subagents in parallel, each owning one group (A, B, C, D, E, F). A 7th orchestrator (this conversation) collects reports and aggregates pass/fail. All groups share the
staging/api-hardeningworktree; Groups C and F additionally need a running HTTP MCP server.Per-subagent prompt template:
Common prerequisite for groups B and C:
For group C, additionally:
distillery-mcp --transport http --port 8765 &Group A — Schema / enum / static (1 subagent)
#232 —
githubentry typesrc/distillery/models.py→ assertEntryType.GITHUB == "github"andTYPE_METADATA_SCHEMAS["github"]exists with required keys{repo, ref_type, ref_number}.src/distillery/mcp/tools/crud.py→ assert_VALID_ENTRY_TYPEScontains"github".pytest tests/ -k "github_entry_type or entry_type_github" -v#241 — sanitiser
pytest tests/ -k "sanitize_tag or sanitiser or sanitizer" -v#245 — error code surface
src/distillery/mcp/tools/_errors.py→ assertToolErrorCodehas exactly:INVALID_PARAMS, NOT_FOUND, CONFLICT, INTERNAL, FORBIDDEN, BUDGET_EXCEEDED, RATE_LIMITED.pytest tests/test_mcp_errors.py -vpytest tests/ -k "validate_required or validate_enum or validate_limit" -v#313 —
distillery_statusregisteredpytest tests/test_mcp_meta.py -v(ortests/ -k status)distillery.mcp.serverand assertdistillery_statusis in the registered tool list (introspect FastMCP instance).{status, version, transport, tool_count, store, embedding_provider}.Group B — In-memory store + handler (1 subagent, runs full pytest by topic)
Use
tests/conftest.pyfixtures:store,make_entry,deterministic_embedding_provider.Each scenario: subagent runs the listed pytest pattern AND inlines a direct handler call to verify response shape against the issue's acceptance criteria.
#238 / #311 / #317 / #309 / #283 — distillery_list extensions
pytest tests/test_mcp_tools/test_list_extensions.py -v_handle_list({})→ response defaultoutput_mode == "summary"(distillery_list default output_mode=full floods agent context (~6KB/entry) #311); archived entry NOT in result (distillery_list default includes archived entries (no implicit status filter) #317)._handle_list({"include_archived": True})→ archived appears (distillery_list default includes archived entries (no implicit status filter) #317)._handle_list({"group_by": "tags"})→ response shape{groups: {...}, total: int}(feat(store): support group_by='tags' in distillery_list #283)._handle_list({"group_by": "invalid"})→INVALID_PARAMS(feat(store): support group_by='tags' in distillery_list #283).https://example.com/rss, store entries withsource=https://example.com/rss, then_handle_list({"source": "https://example.com/rss"})→ returns the seeded entries (distillery_list(source=feed_url) returns 0 entries for feeds with hundreds of ingested items #309).#232 / #238 / #314 — distillery_store
pytest tests/test_store_dedup_response.py -v(distillery_store returns entry_id even when entry was not persisted (ghost IDs) #314)pytest tests/ -k "store_batch or output_mode_summary or entry_type_github" -v_handle_store({"content":"x","entry_type":"github","source":"github","metadata":{"repo":"o/r","ref_type":"issue","ref_number":1}})→persisted=True,dedup_action="stored"(distillery_storetool description enum omitsgithubentry type (used bygh-sync) #232, distillery_store returns entry_id even when entry was not persisted (ghost IDs) #314).persisted=False,dedup_action="skipped",existing_entry_idmatches first id,similaritypresent,entry_id == existing_entry_id(distillery_store returns entry_id even when entry was not persisted (ghost IDs) #314)._handle_store({"content":"y","entry_type":"reference","output_mode":"summary"})→ success, response omitsdedup_checkandconflict_checkkeys (distillery_store: add output_mode: "summary" to skip dedup+conflict checks for bulk-store callers #238)._handle_store({"output_mode":"bogus", ...})→INVALID_PARAMS(Harden MCP interface: tool descriptions, error codes, validation, docs #245).#244 — store_batch + watch sync_history
pytest tests/ -k "store_batch or sync_history" -v_handle_store_batch({"entries":[{...},{...},{...}]})→ response hasentry_ids(3),count==3,resultslist of 3 withpersisted=Trueper entry._handle_watch({"action":"add","source_type":"github","url":"https://github.com/python/cpython","sync_history":True})→ response includessync_jobwithjob_id.#266 — FTS CASCADE
pytest tests/ -k "fts_cascade or rebuild_fts" -v_rebuild_fts_index()twice in sequence → no exception.#283 — covered above.
#301 — classify --batch
pytest tests/test_mcp_classify.py -k "batch or filter" -vdistillery classify --batch(no filter) → exit non-zero, stderr containsat least one filter.distillery classify --batch --inbox→ exits 0; processes ≤50 entries.#302 — real author
pytest tests/test_real_author.py -vuser.login=alicethroughGitHubSyncAdapter→ resulting Entry hasauthor=="alice"andmetadata["imported_by"]=="gh-sync".#308 — watch URL validation (handler-level)
pytest tests/test_mcp_watch.py -v_handle_watch({"action":"add","source_type":"rss","url":"not a url"})→INVALID_PARAMS(orINVALID_URL); no DB row._handle_watch({"action":"add","source_type":"github","url":"owner/repo"})→ accepted (bare slug allowed for github)._handle_watch({"action":"add","source_type":"rss","url":"owner/repo"})→ rejected.#310 — watch liveness metadata
pytest tests/test_mcp_feeds.py -k "liveness or last_polled or item_count" -vpytest tests/test_poller.py -k "record_poll_status" -vFeedPoller, then_handle_watch({"action":"list"})→ entry includeslast_polled_at,last_item_count,last_error(null on success),next_poll_at.#312 — gh-sync project + tags backfill
pytest tests/test_mcp_feeds.py -k "project or backfill" -vproject=="<repo-name>",tagscontainssource/github,repo/<name>,ref-type/issue,state/<x>.distillery maintenance backfill-github-metadata --dry-run→ reports counts of entries it WOULD update.#315 — reviewer parameter
pytest tests/test_mcp_classify.py -k "reviewer or actor or on_behalf_of" -v_handle_resolve_review({"entry_id": id, "action":"approve","reviewer":"bob"})from server context withactor="alice"→ entry metadata gainsreviewed_by="alice",reviewed_on_behalf_of="bob".reviewer="alice",actor="alice") → no*_on_behalf_offield.#316 — reclassify status
pytest tests/test_mcp_classify.py -k "reclassify_status or reclassify_pending" -vstatus="pending_review", call_handle_resolve_review({"action":"reclassify",...})→ entrystatus=="active". Repeat with seedstatus="archived"→ status remainsarchived.#274 — Jina truncation
pytest tests/test_truncation.py -vtruncate_content("x" * 60_000)returns ≤ 30_000 chars +[truncated]suffix.#276 / #278 — async sync jobs
pytest tests/test_async_sync_pipeline.py -vrun_sync_job_async(...)against a stub adapter →SyncJobTracker.get(job_id)transitions PENDING→RUNNING→COMPLETED withpages_processed > 0.#332 —
dedup_action="merged"ghost id (PR #341 merged on staging)pytest tests/ -k "merge or fold or dedup_action_merged" -v_handle_store(...)twice → second response:dedup_action == "merged",entry_id == first_entry_id(true fold; no fresh row). Verify with_handle_get(entry_id=first_entry_id)— content/refs folded into existing row, version incremented.dedup_action="merged".#333 —
resolve_reviewdouble-approve idempotency (PR #339 merged on staging)pytest tests/test_mcp_classify.py -k "double_approve or idempotent or no_op" -vstatus="active", captureversion=N. Call_handle_resolve_review({"action":"approve","entry_id":id})→ response indicates no-op (e.g.changed: false); entryversionstill N.archiveon already-archived entry: no version bump, no audit-log duplicate.#334 — watch list liveness actually populated (PR #338 merged on staging)
pytest tests/test_mcp_feeds.py -k "liveness or populate" -vFeedPoller.poll_once()against a stub adapter, then_handle_watch({"action":"list"})→ row has non-nulllast_polled_atAND non-nulllast_item_count(not just exposed-but-null). After a forced failure,last_errornon-null and ≤200 chars.sync_historyjob; while RUNNING and after COMPLETED, list shows liveness fields update from sync writes too.#335 —
source=<url>aliases tofeed_url(PR #340 merged on staging)pytest tests/ -k "source_alias or feed_url_alias" -vhttps://x.test/rsswith 3 entries._handle_list({"source": "https://x.test/rss"})and_handle_list({"feed_url": "https://x.test/rss"})MUST return identical entry sets._handle_list({"source": "manual"})(enum value) still works as a source-type filter — alias only kicks in when the value parses as a URL.Group C — Real HTTP MCP server (1 subagent, more setup)
#112 — security follow-up
Subagent runs:
httpx.Client(andhttpx.AsyncClient(callsites; assert each constructed withverify=True(or default which is True; flag any explicitverify=False).verify=Falseanywhere).pytest tests/ -k "ownership and classify" -v. Direct: as user-A, store entry; as user-B, calldistillery_classifyon it →FORBIDDEN.curl -H 'Origin: https://evil.test' -i http://localhost:8765/mcp→ response must NOT echoAccess-Control-Allow-Origin: https://evil.test. Then start withcors_allowed_origins=["https://ok.test"]and confirm allowed origin echoes back.pyproject.tomland assert upper bounds present onpyyaml,httpx,fastmcp,defusedxml./api/maintenancewith bearer token; assert response includessearch_logs_pruned: <n>field. Verify config defaultssearch_log_retention_days == 90.#303 — dynamic transport
pytest tests/test_session_start_briefing.py -vDISTILLERY_MCP_URL=http://localhost:8765/mcpset, runpython scripts/hooks/session_start_briefing.py→ exits 0, prints briefing..mcp.jsonat cwd with stdio entry → re-run, hook resolves stdio.localhost:8000; hook reports unreachable cleanly.#308 — watch URL probe (HTTP layer)
_handle_watch addagainst a known-404 host (e.g.https://nonexistent.invalid/feed.xml) → returnsUNREACHABLE_URLunlessforce=True.#276 / #278 — async pipeline end-to-end
distillery_watch action=add url=<small repo> sync_history=truevia MCP client → response containsjob_id.distillery_sync_status job_id=<id>untilstatus=="completed"or 60s timeout. Assertentries_created > 0anderrors == [].Group D — Skills, manifests, CI (1 subagent, read-only)
#269 — CronCreate uses MCP tool calls
skills/setup/SKILL.mdandskills/setup/references/cron-payloads.md. Assert no occurrences ofPOST /hooks/poll,POST /api/maintenance, or HTTP-only references in cron sections.distillery_list,distillery_watch,distillery_storetool calls in payload examples.skills/watch/SKILL.md. Same assertions.#271 — CVE suppression
.grype.yaml. Assert ≥40 entries.vulnerability:and ajustification:(orreason:) field with non-empty content.#286 — stale permission
.claude/settings.local.json. Assertdistillery_tag_treedoes NOT appear.#307 — stale section routing
skills/briefing/SKILL.md. Assert it referencesdistillery_listwithstale_daysparameter (not a missingdistillery_staletool).distillery_stale→ no matches outside historical changelogs.#330 — docs: stale tool count + self-host guidance
docs/install/plugin-install.md(ordocs/skills/index.md— wherever total tool count is published).distillery.mcp.serveror count tool decorators insrc/distillery/mcp/tools/).DISTILLERY_CONFIGenv var, (b) HTTP transport with GitHub OAuth, (c) plugin user-scope override.Group E — Issues 345–351 (now on staging)
All seven PRs are merged to
staging/api-hardening. Tests run against the single staging worktree.8c5f2be/7a199e800c1698/6d7043669a49e8/c75231e24c6cc8/188591adf25c15/4c96143921aab1/8fe43825e4f924/ab842ec+ CR rounds#345 (PR #353) — entry_type alias suggestions
pytest tests/test_entry_type_suggestions.py tests/test_mcp_classify.py tests/test_corrections.py tests/test_bulk_ingest.py -v_handle_store({"content":"x","entry_type":"note"})→INVALID_PARAMSwithdetails.suggestion == "inbox",details.allowedis the 12-element canonical list,details.field == "entry_type", message containsDid you mean 'inbox'?.task→idea,pr→github,article→bookmark,summary→digest,doc→reference,contact→person,repo→project.entry_type="ZzUnknownZz"→INVALID_PARAMS,detailspresent but nosuggestionkey.entry_type="NOTE"(case) and" note "(whitespace) both → suggestioninbox._handle_resolve_review({"action":"reclassify","new_entry_type":"note",...})→ same suggestion.test_entry_type_suggestions.py).#346 (PR #360) — checkpoint-after-write WAL durability
pytest tests/test_store_wal_durability.py tests/test_duckdb_store.py -v(target ≥118 passing)_handle_store(...), then inspect<db>.walsize — should be 0 or near-0 after the write (CHECKPOINT flushed it)._sync_initializeto raise), then trigger recovery → assert WAL renamed to*.wal.corrupt.<ts>, NOT unlinked. The original WAL bytes remain on disk under the new name.CHECKPOINTraises →_handle_storestill returnspersisted=true; warning logged.#347 (PR #359) — briefing hook tools/list probe
bash scripts/hooks/test-hooks.sh→ 34/34 pass.404on/healthand a valid JSON-RPCtools/listresponse onPOST /mcp. Run hook withDISTILLERY_MCP_URL=http://localhost:9100/mcp→ exit 0, briefing rendered (no longer no-ops on the 404)./mcp→ hook exits 0 with[Distillery] briefing disabled — auth failedon stderr.DISTILLERY_BRIEFING_QUIET=1set, the diagnostic stderr line MUST be suppressed.#348 (PR #354) —
include_conflict_promptflagpytest tests/test_conflict.py -v_handle_store({"content":"new...", ...})(default) → responseconflicts[*]carriesentry_id,content_preview,similarity_scoreONLY. NOconflict_promptkey. Total response bytes ≤ ~1KB._handle_store({"content":"new...","include_conflict_prompt":true, ...})→ each conflict carriesconflict_prompt. Response size approx 3x larger (~3KB+ per docs).output_mode="summary"(existing bulk-store fast path) still skips dedup+conflict entirely — unchanged behaviour, no regression.#349 (PR #358) — FTS WAL replay with overwrite=1
pytest tests/test_duckdb_store.py::TestWalFtsReplayHardening -vstore._rebuild_fts_index()twice in sequence → noCannot drop entry "fts_main_entries"error._rebuild_fts_indexcallsPRAGMA create_fts_index(..., overwrite=1)(no manualDROP SCHEMA ... CASCADEleft in the code). Confirm aCHECKPOINTfollows.#350 (PR #352) — staging-deploy comment escaping
.github/workflows/staging-deploy.yml. Assert:gh pr comment --body-file -with a<<'EOF'heredoc (not--body "...\url`..."`).GET /mcpreturns 405 (not 404), and bare hostname returns 404.python3 -c "import yaml; yaml.safe_load(open('.github/workflows/staging-deploy.yml'))"exits 0./deploy-stagingPR comment renders backticked URLs cleanly (no%5C%60in Fly access logs).#351 (PR #361) — embedding budget default unlimited
pytest tests/test_budget.py tests/test_embedding.py tests/test_mcp_errors.py tests/test_mcp_coverage_gaps.py -v(target 261 passing)src/distillery/config.py→embedding_budget_dailydefault ==0.EmbeddingBudgetError. Setembedding_budget_daily=10and run 11 calls → 11th raisesEmbeddingBudgetError.Retry-After: 12after retry exhaustion →EmbeddingRateLimitErrorraised; MCP tool surfacesINVALID_PARAMSwithdetails.provider,details.endpoint,details.http_status==429,details.retry_after==12. WARNING line in logs includes provider name.Retry-Afterheader → error still raised,retry_afterfield absent or null.b247f3e,516b694,b983903): OpenAI.embed() routes through embed_batch() (structured errors), non-finiteRetry-Aftervalues pinned, provider errors propagate through store dedup precheck. Spot-check: injectRetry-After: inf→ error surfaces withretry_afterclamped/omitted (no traceback).Group F — Agent-driven E2E user journeys (1 subagent, live MCP)
Each scenario drives a live staging MCP as a real client would: sequential tool calls across multiple issues, verifying behaviour observable from outside the server. No pytest — the subagent speaks MCP JSON-RPC (or uses the
distilleryCLI / Python client) and inspects responses.Setup (once per run):
Agent-driver protocol. The subagent MUST call MCP tools via JSON-RPC over HTTP (or the Python
FastMCPclient), NOT by importing handlers. Every scenario must close with a cleanup that drops or archives every entry/source it created.Each scenario's pass criterion is a cross-cutting predicate — not just "a single field equals X", but "the chained workflow an agent would run actually works".
F1 — Capture-to-classify round trip (covers #245, #311, #313, #314, #317, #332, #348)
distillery_status→ returns{status:"ok", tool_count, transport:"http"}. Recordtool_count.distillery_store content="Research note about Claude prompt caching TTL" entry_type="inbox"→ response haspersisted:true,dedup_action:"stored",entry_idset,conflictspresent but each conflict object has NOconflict_promptkey (default off).distillery_store content="Research note about Claude prompt caching TTL" entry_type="inbox"(same content) →persisted:false,dedup_action:"skipped",entry_id == existing_entry_id(no ghost). Thendistillery_get entry_id=<returned>succeeds.distillery_store content="Research note about Claude prompt caching time-to-live" entry_type="inbox"(near-dup, merge band) →dedup_action:"merged",entry_id == first_entry_id. Version on first entry incremented by 1 (verify with get).distillery_list(no args) → defaultoutput_mode="summary"(response bytes < 1.5KB for 1 entry; noconflicts/versions/metadataon rows). No archived entries included.distillery_resolve_review entry_id=<id> action="approve"→ on already-active entry, response is no-op (changed:false), version NOT bumped (distillery_resolve_review: double-approve of already-active entry silently bumps version #333 regression guard).distillery_store content="Similar research on cache" entry_type="inbox" include_conflict_prompt=true→ each conflict object NOW hasconflict_prompt(~1KB string). Response bytes ≥ 2x a default call.Pass: every assertion above holds. Fail: any response shape or field missing.
F2 — Entry-type alias suggestion flow (covers #232, #245, #345)
distillery_store content="todo: wire radar digest" entry_type="note"→ errorcode:"INVALID_PARAMS", message containsDid you mean 'inbox'?,details.suggestion == "inbox",details.allowedis a 12-element array including"github",details.provided == "note".entry_type="inbox"→ success.distillery_store content="gh-17" entry_type="pr"→ suggestion"github". Retry with"github"+ required metadata{repo, ref_type:"pr", ref_number:17}→ success.distillery_store content="x" entry_type="ZzZz"→INVALID_PARAMS,detailspresent butdetails.suggestionkey absent.distillery_store content="x" entry_type=" NOTE "(case + whitespace) → still suggestsinbox.Pass: alias map works on both
storeand reclassify paths; unknown types still get structured details.F3 — Watch: URL validation → liveness → async sync (covers #276, #278, #302, #308, #310, #312, #334)
distillery_watch action="add" source_type="rss" url="not a url"→INVALID_PARAMS(orINVALID_URL), nothing persisted.distillery_watch action="add" source_type="rss" url="https://nonexistent.invalid.test/feed.xml"→UNREACHABLE_URL(or similar), not persisted. Retry withforce=true→ persists.distillery_watch action="add" source_type="github" url="https://github.com/norrietaylor/distillery" sync_history=true→ response includessync_job.job_id. Remember the id.distillery_sync_status job_id=<id>every 3s untilstatus == "completed"or 90s timeout. Assertentries_created > 0anderrors == [].distillery_watch action="list"→ the GitHub source has non-nulllast_polled_atAND non-nulllast_item_count(distillery_watch list: liveness fields exposed but never populated (related #310) #334 — fields actually populated, not just exposed), non-nullnext_poll_at.last_erroris null.distillery_list source="https://github.com/norrietaylor/distillery"ANDdistillery_list feed_url="https://github.com/norrietaylor/distillery"→ identical result sets (distillery_list: source= vs feed_url= diverge; source= silently returns 0 (related #309) #335 alias).entry.project == "distillery"(gh-sync writes entries with project=null and tags=[] — project/tag filters broken for 4k+ entries #312),entry.tagscontainssource/githubandrepo/distilleryand aref-type/*,entry.authoris a real GitHub login (feat(sync): use real author (GitHub user / RSS author) instead of tool name #302 — not"gh-sync"literal).distillery_watch action="remove" url=..., archive ingested entries.Pass: the full ambient-intel path works end-to-end; the agent can trust the liveness table and the real-author metadata for downstream skills.
F4 — Classify batch + review queue (covers #301, #315, #316)
distillery_store_batchwithentry_type="inbox", distinct content, all withstatus="pending_review"forced (or via classification that routes to review).distillery_classifybatch endpoint (via CLI:distillery classify --batch) with no filter → exit non-zero, stderr containsat least one filter.distillery classify --batch --inbox→ exits 0; processes all 5; output reports counts by disposition.distillery_resolve_review entry_id=<id-1> action="approve" reviewer="bob"called as actoralice→ entry metadata:reviewed_by:"alice",reviewed_on_behalf_of:"bob"(distillery_resolve_review: reviewer parameter silently ignored #315).reviewer="alice"(= actor) → no*_on_behalf_offield written.distillery_resolve_review entry_id=<id-2> action="reclassify" new_entry_type="reference"(entry ispending_review) → post-statestatus == "active"(distillery_resolve_review: reclassify action leaves status=pending_review #316), not still pending.distillery_list(default) → the reclassified entry appears (no longer hidden from default view).Pass: review-queue exits align with reviewer/actor audit expectations; batch CLI composes filters cleanly.
F5 — WAL durability + FTS replay (covers #266, #346, #349)
distillery_searchcall that forces rebuild, or direct CLI maintenance:distillery maintenance rebuild-fts). Do it twice in a row — noCannot drop entry "fts_main_entries"error.distillery_list→ all 10 entries present (no WAL discarded by recovery path Delayed ghost entry_ids: store→get succeeds, update/get later returns NOT_FOUND #346)..wal.corrupt.<ts>files are preserved (if recovery fired). No silently-unlinked WALs.distillery_search query="..."→ FTS operational, returns expected hits.Pass: a hard kill mid-write does not lose committed entries; operators retain the corrupt WAL for forensics.
F6 — Briefing hook dynamic transport (covers #303, #347)
The subagent runs the hook itself, not inside a Claude Code session.
DISTILLERY_MCP_URL=http://localhost:8765/mcpset, runpython scripts/hooks/session_start_briefing.py→ exit 0, briefing text on stdout (recent entries, corrections, radar)..mcp.jsonpointing at the same HTTP URL. Run the hook from that dir → resolves via.mcp.json, exit 0./healthand a valid JSON-RPCtools/listonPOST /mcp. SetDISTILLERY_MCP_URL=http://localhost:9100/mcp→ hook exits 0 with briefing rendered (no silent no-op on/health404)./mcp→ hook exits 0, stderr has[Distillery] briefing disabled.DISTILLERY_BRIEFING_QUIET=1→ stderr silent.Pass: hook resolves transport from the full env/manifest chain and no longer requires a
/healthsibling.F7 — Embedding 429 surfacing (covers #245, #351)
DISTILLERY_EMBEDDING_PROVIDER=openaiand a stub OpenAI endpoint configured to always return HTTP 429 withRetry-After: 12.distillery_store content="..." entry_type="inbox"→ error, codeINVALID_PARAMS,details.provider == "openai",details.endpointset,details.http_status == 429,details.retry_after == 12. No stack trace leaked in message.Retry-Afterheader →details.retry_afterabsent or null; error still structured.Retry-After: inf(non-finite) → error surfaces,retry_afterclamped/omitted, no exception (regression guard forb247f3e).embedding_budget_daily == 0in config — run 500 sequential stores, none hitEmbeddingBudgetError.embedding_budget_daily=5; on the 6th store →EmbeddingBudgetErrorsurfaced as a structured MCP error.Pass: upstream provider throttling is the rate limiter; the local budget is opt-in.
F8 — Security perimeter (covers #112)
curl -i -X POST -H 'Origin: https://evil.test' -H 'content-type: application/json' -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' $MCP→ response does NOT echoAccess-Control-Allow-Origin: https://evil.test.DISTILLERY_CORS_ORIGINS=https://ok.test. Repeat curl withOrigin: https://ok.test→ response echoesAccess-Control-Allow-Origin: https://ok.test.distillery_classifyagainst that entry_id →FORBIDDEN(Security Review Follow-Up: Audit of Issue #51 Remediation Status #112 P2)./api/maintenancewith valid bearer → response body includessearch_logs_pruned: <n>. Verify default retention:grep search_log_retention_days src/distillery/config.pyshows default 90.verify=Falseinsrc/→ zero hits (TLS pin — Security Review Follow-Up: Audit of Issue #51 Remediation Status #112 P1).pyproject.toml→pyyaml,httpx,fastmcp,defusedxmlall have upper bounds (Security Review Follow-Up: Audit of Issue #51 Remediation Status #112 P4).Pass: the server does not echo unconfigured origins, enforces ownership on classify, prunes search logs, pins transitive deps.
F9 — Bulk ingest + dedup contract (covers #238, #244, #311, #314, #332, #348)
distillery_store_batch entries=[{...} x 20]with mixed content (some near-dup, most distinct). Response:entry_idslength 20,count == 20,results[i].persistedvaries per item,results[i].dedup_action∈{"stored","skipped","merged"}.entry_type="github") into the batch → batch returns partial success; other 19 persist; the bad one hasresults[i].errorpopulated, no exception leaked.distillery_listwith default paging → reflects only the unique/non-merged entries (no ghosts).distillery_store_batchwithoutput_mode="summary"against the same content → each item's response object is minimal (no conflicts, no dedup preview). Measured response bytes ≤ 30% of full-mode.Pass: bulk path correctly isolates per-item failures and honours the summary contract.
F10 — Docs/skills/catalog alignment (covers #232, #245, #269, #286, #307, #330)
distillery_status→ recordstool_count.docs/install/plugin-install.md(or equivalent published doc) — assert the published count matches (docs: fix stale tool count and add self-host guidance in plugin-install.md #330).tools/listvia JSON-RPC — assertdistillery_staleis NOT in the returned tools (distillery_stale tool missing from MCP catalog — briefing hook silently drops stale section #307 — routed to list instead). Assertdistillery_statusIS present (No distillery_status or distillery_metrics MCP tool — /setup cannot verify connectivity in-protocol #313).skills/briefing/SKILL.md— referencesdistillery_list stale_days=30, notdistillery_stale.skills/setup/SKILL.mdandskills/watch/SKILL.md— all CronCreate examples use MCP tool calls, notPOST /hooks/*(fix(skills): setup/watch CronCreate prompts use webhook POSTs instead of MCP tool calls for local transport #269)..claude/settings.local.json— nodistillery_tag_treepermission (bug(skills): stale distillery_tag_tree permission in settings.local.json #286).tools/list— every value ofEntryTypeappears in the description (distillery_storetool description enum omitsgithubentry type (used bygh-sync) #232 regression guard; doc drift = fail).Pass: surfaces agents rely on (docs, skill prompts, tool catalog, permissions) agree with the runtime.
Group F teardown:
Group F subagent prompt template:
Critical files reference
src/distillery/mcp/tools/_errors.pysrc/distillery/mcp/tools/_common.pysrc/distillery/mcp/tools/crud.pysrc/distillery/mcp/tools/classify.pysrc/distillery/mcp/tools/feeds.pysrc/distillery/mcp/tools/meta.pysrc/distillery/mcp/server.pysrc/distillery/mcp/auth.pysrc/distillery/mcp/middleware.pysrc/distillery/mcp/webhooks.pysrc/distillery/store/duckdb.pysrc/distillery/feeds/github_sync.pysrc/distillery/feeds/rss.pysrc/distillery/feeds/poller.pysrc/distillery/feeds/sync_jobs.pysrc/distillery/feeds/truncation.pysrc/distillery/embedding/{jina,openai}.pyscripts/hooks/session_start_briefing.pyskills/{setup,watch,briefing,classify}/SKILL.md.grype.yamlpyproject.tomlVerification of the test plan itself
Before dispatching subagents, run a smoke check on the orchestrator:
If both pass, dispatch the four group subagents in parallel and aggregate
| issue | scenario | result | evidence |tables into a single coverage matrix. Any FAIL or BLOCKED triggers a follow-up task on the originating issue.