Production polish: installable, multi-provider, observable#1
Conversation
…vable - Extract skill scripts from zip into repo source tree - Make package installable (drop private, add files field, prepublishOnly) - Add build:skill to auto-rebuild quorum.skill from source - Replace gh CLI with direct GitHub REST API (GITHUB_TOKEN), with gh fallback - Add AnthropicAdapter for multi-provider support (--provider anthropic) - Add retry with exponential backoff to all adapters - Add configurable model names via QUORUM_MODEL_HIGH/MED/LOW env vars - Add for programmatic Phase 1 without AI clustering - Add for end-to-end synthesis + exploration - Add for prerequisite validation - Add for reviewer precision analysis from run logs - Add structured run logging (run.log.jsonl per run) - Add .github/workflows/ci.yml for CI on push/PR Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- eval: scan .quorum/runs/ recursively for run.log.jsonl instead of assuming flat .quorum/log/ directory, so normal exploration run logs are discovered without manual copying - setup: treat GITHUB_TOKEN as optional (ok: true) since upsert and synthesis gracefully fall back to gh CLI - models: apply DAG overrides before env vars so QUORUM_MODEL_* env vars always take precedence over saved dag.json models Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- runner: add missing logTaskEvent call for SKIPPED tasks so run.log.jsonl and quorum eval capture skipped tasks - models: add provider-aware default model maps (ANTHROPIC_DEFAULT_MODELS vs CURSOR_DEFAULT_MODELS) so --provider anthropic uses valid Anthropic model IDs instead of Cursor-specific identifiers - eval: distinguish clean finishes from degraded ones with parseError; add parseError to log output so eval reports accurate success rates Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Extract shared retry infrastructure into src/retry.ts; both adapters now delegate to withRetry() instead of duplicating retry loops - synthesizeCommand respects --no-post flag; triage-pr --plan-only suppresses synthesis posting to keep plan-only truly read-only - plan-only runs pass provider to initialRunState() so state.json and canvas artifacts show correct model IDs for --provider anthropic - loadRunLogs scans only the explicit logDir when provided, falling back to .quorum/runs/ recursively only when no directory is given; fix misleading eval "No run logs found" message Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
🤝 Quorum — review synthesis12 findings from bugbot, devin → 12 distinct issues. Sorted by reviewer quorum, then severity. ◽ 1/2 reviewer
clusters.scored.json (machine-readable, for agents){
"generated_at": "2026-06-13T16:53:25+00:00",
"totals": {
"findings": 12,
"clusters": 12,
"reviewer_denominator": 2,
"gate_split": []
},
"clusters": [
{
"cluster_id": "devin-1-solo",
"member_ids": [
"devin-1"
],
"canonical_title": "<!-- devin-review-comment {\"id\": \"BUG_pr-review-job-acbbd5d892a8431593540d535f51",
"canonical_description": "<!-- devin-review-comment {\"id\": \"BUG_pr-review-job-acbbd5d892a8431593540d535f515e97_0001\", \"file_path\": \"src/adapters/anthropic.ts\", \"start_line\": 85, \"end_line\": 85, \"side\": \"RIGHT\"} -->",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/adapters/anthropic.ts",
"start_line": 85,
"end_line": 85
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"devin"
],
"members": [
{
"id": "devin-1",
"reviewer": "devin",
"file": "src/adapters/anthropic.ts",
"lines": [
85,
85
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3408278181",
"comment_id": 3408278181,
"node_id": "PRRC_kwDOS4LqW87LJjKl",
"outdated": false
}
]
},
{
"cluster_id": "bugbot-1-solo",
"member_ids": [
"bugbot-1"
],
"canonical_title": "Plan-only uses wrong models",
"canonical_description": "Plan-only uses wrong models",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/cli.ts",
"start_line": 159,
"end_line": 159
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"bugbot"
],
"members": [
{
"id": "bugbot-1",
"reviewer": "bugbot",
"file": "src/cli.ts",
"lines": [
159,
159
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3408252470",
"comment_id": 3408252470,
"node_id": "PRRC_kwDOS4LqW87LJc42",
"outdated": false
}
]
},
{
"cluster_id": "bugbot-2-solo",
"member_ids": [
"bugbot-2"
],
"canonical_title": "Synthesize ignores no-post flag",
"canonical_description": "Synthesize ignores no-post flag",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/cli.ts",
"start_line": 410,
"end_line": 410
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"bugbot"
],
"members": [
{
"id": "bugbot-2",
"reviewer": "bugbot",
"file": "src/cli.ts",
"lines": [
410,
410
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3408252466",
"comment_id": 3408252466,
"node_id": "PRRC_kwDOS4LqW87LJc4y",
"outdated": false
}
]
},
{
"cluster_id": "bugbot-3-solo",
"member_ids": [
"bugbot-3"
],
"canonical_title": "Plan-only triage hits GitHub",
"canonical_description": "Plan-only triage hits GitHub",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/cli.ts",
"start_line": 410,
"end_line": 410
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"bugbot"
],
"members": [
{
"id": "bugbot-3",
"reviewer": "bugbot",
"file": "src/cli.ts",
"lines": [
410,
410
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3408252472",
"comment_id": 3408252472,
"node_id": "PRRC_kwDOS4LqW87LJc44",
"outdated": false
}
]
},
{
"cluster_id": "bugbot-4-solo",
"member_ids": [
"bugbot-4"
],
"canonical_title": "Setup fails without GitHub token",
"canonical_description": "Setup fails without GitHub token",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/cli.ts",
"start_line": 538,
"end_line": 538
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"bugbot"
],
"members": [
{
"id": "bugbot-4",
"reviewer": "bugbot",
"file": "src/cli.ts",
"lines": [
538,
538
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3406732544",
"comment_id": 3406732544,
"node_id": "PRRC_kwDOS4LqW87LDp0A",
"outdated": false
}
]
},
{
"cluster_id": "devin-2-solo",
"member_ids": [
"devin-2"
],
"canonical_title": "<!-- devin-review-comment {\"id\": \"BUG_pr-review-job-acbbd5d892a8431593540d535f51",
"canonical_description": "<!-- devin-review-comment {\"id\": \"BUG_pr-review-job-acbbd5d892a8431593540d535f515e97_0002\", \"file_path\": \"src/cli.ts\", \"start_line\": 583, \"end_line\": 584, \"side\": \"RIGHT\"} -->",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/cli.ts",
"start_line": 583,
"end_line": 584
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"devin"
],
"members": [
{
"id": "devin-2",
"reviewer": "devin",
"file": "src/cli.ts",
"lines": [
583,
584
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3408278205",
"comment_id": 3408278205,
"node_id": "PRRC_kwDOS4LqW87LJjK9",
"outdated": false
}
]
},
{
"cluster_id": "bugbot-5-solo",
"member_ids": [
"bugbot-5"
],
"canonical_title": "Eval scans wrong log directory",
"canonical_description": "Eval scans wrong log directory",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/cli.ts",
"start_line": 584,
"end_line": 584
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"bugbot"
],
"members": [
{
"id": "bugbot-5",
"reviewer": "bugbot",
"file": "src/cli.ts",
"lines": [
584,
584
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3408273252",
"comment_id": 3408273252,
"node_id": "PRRC_kwDOS4LqW87LJh9k",
"outdated": false
}
]
},
{
"cluster_id": "bugbot-6-solo",
"member_ids": [
"bugbot-6"
],
"canonical_title": "Eval counts parse failures success",
"canonical_description": "Eval counts parse failures success",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/cli.ts",
"start_line": 591,
"end_line": 591
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"bugbot"
],
"members": [
{
"id": "bugbot-6",
"reviewer": "bugbot",
"file": "src/cli.ts",
"lines": [
591,
591
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3407408623",
"comment_id": 3407408623,
"node_id": "PRRC_kwDOS4LqW87LGO3v",
"outdated": true
}
]
},
{
"cluster_id": "bugbot-7-solo",
"member_ids": [
"bugbot-7"
],
"canonical_title": "Anthropic gets Cursor model IDs",
"canonical_description": "Anthropic gets Cursor model IDs",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/dag.ts",
"start_line": 21,
"end_line": 21
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"bugbot"
],
"members": [
{
"id": "bugbot-7",
"reviewer": "bugbot",
"file": "src/dag.ts",
"lines": [
21,
21
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3407408621",
"comment_id": 3407408621,
"node_id": "PRRC_kwDOS4LqW87LGO3t",
"outdated": false
}
]
},
{
"cluster_id": "bugbot-8-solo",
"member_ids": [
"bugbot-8"
],
"canonical_title": "Model env vars overridden by DAG",
"canonical_description": "Model env vars overridden by DAG",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/dag.ts",
"start_line": 54,
"end_line": 54
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"bugbot"
],
"members": [
{
"id": "bugbot-8",
"reviewer": "bugbot",
"file": "src/dag.ts",
"lines": [
54,
54
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3406732549",
"comment_id": 3406732549,
"node_id": "PRRC_kwDOS4LqW87LDp0F",
"outdated": false
}
]
},
{
"cluster_id": "bugbot-9-solo",
"member_ids": [
"bugbot-9"
],
"canonical_title": "Eval reads wrong log directory",
"canonical_description": "Eval reads wrong log directory",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/logging.ts",
"start_line": 145,
"end_line": 145
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"bugbot"
],
"members": [
{
"id": "bugbot-9",
"reviewer": "bugbot",
"file": "src/logging.ts",
"lines": [
145,
145
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3406732539",
"comment_id": 3406732539,
"node_id": "PRRC_kwDOS4LqW87LDpz7",
"outdated": false
}
]
},
{
"cluster_id": "bugbot-10-solo",
"member_ids": [
"bugbot-10"
],
"canonical_title": "Skipped tasks omit run logs",
"canonical_description": "Skipped tasks omit run logs",
"category": "other",
"severity": "minor",
"primary_location": {
"file": "src/runner.ts",
"start_line": 42,
"end_line": 42
},
"match_type": "singleton",
"match_confidence": 1,
"cross_file": false,
"quorum": 1,
"reviewers": [
"bugbot"
],
"members": [
{
"id": "bugbot-10",
"reviewer": "bugbot",
"file": "src/runner.ts",
"lines": [
42,
42
],
"url": "https://github.com/Wenjix/quorum/pull/1#discussion_r3407408620",
"comment_id": 3407408620,
"node_id": "PRRC_kwDOS4LqW87LGO3s",
"outdated": false
}
]
}
]
}Quorum · PR #1 · generated 2026-06-13T16:53:25+00:00 |
Quorum ExplorationExplored 6 cluster(s) from #1. devin-1-solo: <!-- devin-review-comment {"id": "BUG_pr-review-job-acbbd5d892a8431593540d535f51Quorum 1/2 | severity minor | category other | src/adapters/anthropic.ts:L85 Warnings:
Root CauseSummary: A refactor left AnthropicAdapter with dead parse-based status logic, so parse outcome cannot influence TaskExecutionResult.status.
Pattern SweepNo structured pattern-sweep result. bugbot-1-solo: Plan-only uses wrong modelsQuorum 1/2 | severity minor | category other | src/cli.ts:L159 Plan-only uses wrong models Warnings:
Root CauseNo structured root-cause result. Pattern SweepNo structured pattern-sweep result. bugbot-2-solo: Synthesize ignores no-post flagQuorum 1/2 | severity minor | category other | src/cli.ts:L410 Synthesize ignores no-post flag Warnings:
Root CauseNo structured root-cause result. Pattern SweepNo structured pattern-sweep result. bugbot-3-solo: Plan-only triage hits GitHubQuorum 1/2 | severity minor | category other | src/cli.ts:L410 Plan-only triage hits GitHub Warnings:
Root CauseNo structured root-cause result. Pattern SweepNo structured pattern-sweep result. bugbot-4-solo: Setup fails without GitHub tokenQuorum 1/2 | severity minor | category other | src/cli.ts:L538 Setup fails without GitHub token Warnings:
Root CauseNo structured root-cause result. Pattern SweepNo structured pattern-sweep result. devin-2-solo: <!-- devin-review-comment {"id": "BUG_pr-review-job-acbbd5d892a8431593540d535f51Quorum 1/2 | severity minor | category other | src/cli.ts:L583-L584 Warnings:
Root CauseNo structured root-cause result. Pattern SweepNo structured pattern-sweep result. Task Runs
Quorum exploration generated 2026-06-13T17:51:35.920Z Quorum exploration comment for #1 |
- dag.ts: replace invalid/deprecated Anthropic default models (nonexistent claude-haiku-4-20250514 and soon-retired claude-sonnet-4-20250514) with claude-opus-4-8 / claude-sonnet-4-6 / claude-haiku-4-5; switch Cursor MED/LOW defaults to composer-2.5 - cli.ts: default --concurrency to 1 for the cursor provider so runs don't trip Cursor's simultaneous Cloud Agent plan limit (4 for anthropic); an explicit --concurrency still overrides - cursor-cloud-adapter.ts: translate the opaque "Upgrade to Ultra" plan-limit error into an actionable message pointing at --concurrency - cli.ts: eval scans .quorum/runs/ recursively when --log-dir is omitted (the prior default defeated the recursive scan) and reports the correct directory when no logs are found - anthropic.ts: drop dead parse-status ternary and now-unused import; parse validation is owned by the runner (applyResult)
- logging.test.ts: loadRunLogs() recurses into .quorum/runs/ when no logDir is given (the path the eval default-arg bug broke), returns empty when the dir is absent, and scans only the flat directory when one is passed explicitly - cursor-cloud-adapter.test.ts: describeCursorError rewrites the simultaneous-agent plan-limit error into actionable guidance, passes unrelated errors through unchanged, and wraps non-Error values - export describeCursorError so it can be unit-tested
| scoredDoc, | ||
| parsed, | ||
| planOnly: hasFlag(parsed, "plan-only"), | ||
| post: !hasFlag(parsed, "no-post") && !hasFlag(parsed, "dry-run"), |
There was a problem hiding this comment.
Plan-only triage still posts
Medium Severity
quorum triage-pr --plan-only skips synthesis posting via --no-post, but phase 2 still sets post true unless --no-post or --dry-run is passed. executeExplore then upserts the exploration PR comment even though no cloud agents ran, unlike quorum explore --plan-only.
Reviewed by Cursor Bugbot for commit 04f3ac3. Configure here.
Drop the cursor-specific serial default; both providers now default to 4 simultaneous tasks. If a Cursor plan caps concurrent agents, the launch still fails with the actionable describeCursorError message, and --concurrency 1 remains available as an override.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 5 potential issues.
There are 6 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f3dc0ad. Configure here.
| max_tokens: 4096, | ||
| system: systemPrompt, | ||
| messages: [{ role: "user", content: input.prompt }], | ||
| }), |
There was a problem hiding this comment.
Anthropic path lacks repository access
High Severity
With --provider anthropic, exploration tasks still use prompts that ask for evidence in the repository, but AnthropicAdapter only sends the stitched text prompt to the Messages API and never uses repoUrl or prUrl. Unlike Cursor Cloud, there is no repo checkout, so investigations cannot reliably inspect the PR code.
Reviewed by Cursor Bugbot for commit f3dc0ad. Configure here.
…ror text Addresses Cursor Bugbot + Devin findings on f3dc0ad: - paths.ts: resolve scripts/ via fileURLToPath (extracted from cli.ts) so Windows drive letters and percent-encoded paths (e.g. spaces) resolve - retry.ts: word-boundary 5xx match so "5000" no longer triggers retries; backoff sleep is abort-aware and withRetry re-checks the signal after waking - adapters/anthropic.ts: guard body.content with Array.isArray before iterating - cursor-cloud-adapter.ts: drop stale "default is 1" from the plan-limit hint - validate_partition.py: split-out singletons inherit the parent cluster's severity instead of hardcoding "minor" - tests: paths + retry regression suites; assert no hardcoded default in the Cursor adapter error message
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_8c839896-9c51-4808-82a4-bcf62b431140) |
- Fold 429 into the word-boundary status-code regex so number tokens like "4290" aren't read as a 429 (same class as the earlier 5xx/"5000" fix). - Match "rate" only within "rate limit" (rate limit / rate_limit / ratelimit) so it no longer fires on words like "generate"/"moderate". A bare \brate\b would regress rate_limit_error (underscore is a word char), so match the phrase. - Add a regression test for rate-limit phrasings vs lookalikes.
tsc does not prune outputs for sources removed by a branch switch, so a stale dist/test/*.js can run against the wrong sources. A prebuild step that clears dist/ keeps npm test honest locally and in CI.
| signal: input.signal, | ||
| body: JSON.stringify({ | ||
| model: input.model, | ||
| max_tokens: 4096, |
There was a problem hiding this comment.
🚩 Hardcoded max_tokens: 4096 in Anthropic adapter
The Anthropic adapter hardcodes max_tokens: 4096 at src/adapters/anthropic.ts:49. While the Anthropic API requires this parameter, 4096 tokens may be insufficient for complex exploration tasks that need a detailed human-readable explanation plus a fenced JSON block. Truncated responses are handled gracefully (the runner records a parseError via extractMarkedJson), and the adapter logs a warning at line 76-78. However, unlike the model ID (configurable via env vars), max_tokens has no override mechanism — users hitting truncation limits would need to modify the source.
Was this helpful? React with 👍 or 👎 to provide feedback.
| export const CURSOR_DEFAULT_MODELS: Record<Complexity, string> = { | ||
| HIGH: "gpt-5.3-codex", | ||
| MED: "composer-2", | ||
| LOW: "auto-low", | ||
| MED: "composer-2.5", | ||
| LOW: "composer-2.5", | ||
| }; |
There was a problem hiding this comment.
🚩 Default model changes: LOW complexity moved from auto-low to composer-2.5
The Cursor default models changed: MED from composer-2 to composer-2.5, and LOW from auto-low to composer-2.5 (src/dag.ts:5-8). This means LOW and MED now use the same model. Users relying on the old auto-low behavior for cost optimization on simple tasks will see different model selection unless they set QUORUM_MODEL_LOW env var. This is a behavioral change that affects existing users who don't override models.
Was this helpful? React with 👍 or 👎 to provide feedback.
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_0162c318-992b-44db-a944-fe6febdfc2c2) |


Summary
Production-ready polish across 12 workstreams to make Quorum usable by any Claude + Cursor user.
Distribution & Installation
fetch_findings.sh,validate_partition.py,post_synthesis.py) and rubric from the.skillzip into the repo source tree"private": true, added"files"field,prepublishOnlyscript,build:skillto auto-rebuild the zip from source.github/workflows/ci.yml: typecheck, test, and build:skill on push/PRquorum setupcommand: validates Node 22+, gh CLI, jq, python3, API keys, and skill installationquorum triage-prcommand: runs synthesis + exploration end-to-endRobustness & Flexibility
GITHUB_TOKEN, with gh CLI fallback--provider anthropicflag andAnthropicAdapterfor direct Anthropic Messages API usageQUORUM_MODEL_HIGH/MED/LOWenv varsObservability
run.log.jsonlwith timestamped dag/task lifecycle eventsquorum evalcommand: computes task success rate, cluster counts, avg duration from accumulated logsStandalone Clustering
quorum synthesizecommand: runs the full Phase 1 pipeline (fetch -> all-singletons clustering -> validate/score -> post) without needing an AI agent sessionNote
Medium Risk
Touches PR comment posting, external API keys (Cursor/Anthropic/GitHub), and concurrent cloud agent execution; changes are mostly additive with fallbacks, but misconfiguration could spam or fail synthesis/exploration on real PRs.
Overview
Production-ready distribution and automation for Quorum: the package is npm-publishable (
files,prepublishOnly,build:skill), skill assets and scripts live in-repo, and CI runs typecheck, tests, and skill rebuild on main PRs.New CLI workflows:
quorum synthesizeruns Phase 1 (fetch bot findings → validate/score → optional PR synthesis comment) without AI clustering;quorum triage-prchains synthesis + exploration;quorum setupchecks toolchain and keys;quorum evalaggregates stats fromrun.log.jsonl.Exploration backends:
--provider anthropic/AnthropicAdapter(Messages API) alongside Cursor Cloud, with per-provider default models andQUORUM_MODEL_*overrides. Both adapters use retry with backoff on transient errors; Cursor plan-limit errors get clearer messaging.GitHub: exploration comment upsert and synthesis recovery prefer
GITHUB_TOKENREST withghfallback.Observability: DAG runs append structured JSONL lifecycle logs; runner wires logging at task/dag boundaries.
Reviewed by Cursor Bugbot for commit e83fc6d. Bugbot is set up for automated code reviews on this repo. Configure here.