feat: add explore, refine, critique refinement pipeline agents#2439
feat: add explore, refine, critique refinement pipeline agents#2439ascerra wants to merge 8 commits into
Conversation
E2E tests did not runE2E tests run automatically for org/repo members and collaborators on pull requests. For other contributors, a maintainer must add the See E2E testing guide for details. |
Site previewPreview: https://97fc4ff5-site.fullsend-ai.workers.dev Commit: |
|
🤖 Review · |
Three new first-class agents forming a chained refinement pipeline: explore → refine → critique, with artifact-based data passing and revision loops (max 3 rounds). Includes: Go role registration, GitHub App configs, scaffold files (agents, harness, policies, schemas, scripts, skills, env, workflows), reusable workflows, Jira integration with credential isolation, JIRA_PROJECT_VISIBILITY safety check, shared pipeline-helpers.sh, and two ADRs (0049 pipeline architecture, 0050 Jira integration). Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Adam Scerra <ascerra@redhat.com>
- Merge TestHarnessForgeRunnerEnvMergeRefinementPipeline into existing TestHarnessForgeRunnerEnvMerge table to avoid duplicate scaffold extraction - Add Jira secret threading assertions (JIRA_HOST, JIRA_EMAIL, JIRA_API_TOKEN, jira_project_visibility) to refine/critique content tests - Fix post-explore-test.sh no-op assertion (replace && true with real check) - Add /tmp/workspace cleanup to post-critique-test.sh trap to prevent CI state leakage - Add assert_gh_not_called helpers to explore/refine test scripts - Add invalid JSON test case to post-refine-test.sh Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Adam Scerra <ascerra@redhat.com>
- create-children: actions:write → actions:read (terminal step, no dispatch) - Renumber ADRs: 0049→0051, 0050→0052 (avoid conflicts with other branches) - Add required YAML frontmatter to both ADRs - Add missing agent doc sections (Control labels, Configuration and extension) - Add OPTIONAL_SECTIONS to hack/lint-agent-docs for pipeline-specific headings - Add placeholder agent icons (explore, refine, critique) - Fix ruff lint/format violations in markdown-to-adf.py - Fix shellcheck SC2034 (HAS_PE) and SC2129 (redirect grouping) - Fix trailing newlines in policy YAML files - Add header comments to reusable workflows matching existing convention Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Adam Scerra <ascerra@redhat.com>
0e6497a to
3ff5056
Compare
|
🤖 Review · |
The *.env gitignore rule prevented these from being committed. Force-add to match existing agent env files (triage.env, review.env, etc.). Signed-off-by: Adam Scerra <ascerra@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
🤖 Review · |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Signed-off-by: Adam Scerra <ascerra@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
🤖 Finished Review · ✅ Success · Started 5:44 PM UTC · Completed 6:21 PM UTC |
ReviewFindingsHigh
Medium
Low
Info
Previous runReviewFindingsHigh
Medium
Low
Info
Labels: PR adds three new pipeline agents with dispatch workflows, harness configs, and supporting documentation. Previous runReviewFindingsHigh
Medium
Low
Info
Previous run (2)ReviewFindingsHigh
Medium
Low
Info
Labels: PR adds three new pipeline agents with dispatch workflows, harness configs, and supporting documentation. |
- pre-explore.sh: Jira token preflight check — catches 401/403 early with clear error message instead of opaque pipeline crash - create-children.sh: JQ dry-run validation — verifies children array structure (title, type, description) before touching any external API - post-critique.sh: synthesized review history table on max-rounds escalation so humans see what each round debated, plus refine-stalled label - sanitize-artifacts.sh: PII redaction metrics — counts and reports what was scrubbed per category, writes redaction-summary.json Signed-off-by: Adam Scerra <ascerra@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
🤖 Review · |
- Remove sanitize-artifacts.sh + test (dead: JIRA_PROJECT_VISIBILITY safety check blocks the scenario it was designed for) - Remove scaffold.go, scaffold_test.go, Makefile references - Remove unwired confidence_threshold from post-explore.sh dispatch - Fix agent docs: control labels now match implementation, auto_create default corrected to false Signed-off-by: Adam Scerra <ascerra@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
🤖 Finished Review · ✅ Success · Started 6:59 PM UTC · Completed 7:16 PM UTC |
| # Validate parent_title references resolve within the tree | ||
| ORPHAN_REFS=$(jq -r ' | ||
| [.children[].title] as $titles | | ||
| [.children[] | select(.parent_title != null and .parent_title != "") | |
There was a problem hiding this comment.
[medium] logic-error
The orphan parent_title validation uses jq inside() which performs substring matching. [.parent_title] | inside($titles) returns true if parent_title is a substring of any title, not an exact match. A parent_title of 'API' would falsely match 'Add API Gateway', suppressing the orphan warning.
Suggested fix: Replace select([.parent_title] | inside($titles) | not) with select(.parent_title as $pt | $titles | any(. == $pt) | not).
| echo "::error::Use a private config repo, or set JIRA_PROJECT_VISIBILITY=public for public Jira projects." | ||
| exit 1 | ||
| fi | ||
| fi |
There was a problem hiding this comment.
[medium] GHA-workflow-command-injection
RAW_COMMAND from inputs.command or github.event.client_payload.command is echoed unsanitized. A repository_dispatch sender can embed :: workflow commands. TARGET_REPO parsed from RAW_COMMAND is also echoed inside ::notice:: without sanitization.
Suggested fix: Sanitize RAW_COMMAND and TARGET_REPO by stripping :: sequences, or use stop-commands guards.
| if [[ "$REPO_VISIBILITY" == "public" ]]; then | ||
| echo "::error::SECURITY: Private Jira source blocked — this repo (${CONFIG_REPO}) is public." | ||
| echo "::error::Private Jira data would leak into public workflow artifacts and logs." | ||
| echo "::error::Options:" |
There was a problem hiding this comment.
[medium] fail-open
The private-Jira-on-public-repo safety check defaults REPO_VISIBILITY to 'unknown' when gh api fails. Since the check only blocks on 'public', API failure causes the check to pass (fail-open), potentially allowing private Jira data to leak into public repo artifacts. Same pattern in jira-dispatch.yml and jira-comment-poller.yml.
Suggested fix: Treat unknown visibility as unsafe: block on anything other than 'private' or 'internal', or fail the check when the API call fails.
|
|
||
| This pipeline extends fullsend's existing patterns: | ||
|
|
||
| 1. **Chained agents with artifact passing** (new): Existing agents are standalone — triage triggers code via a `ready-to-code` label, which fires a webhook, and the code agent reads the issue fresh from GitHub. No data passes between them. The refinement pipeline is different because each stage produces structured context that the next stage needs: explore produces `exploration_context.json` (~50-100KB of analyzed research), refine produces `refine-result.json` (the full decomposition plan with children, acceptance criteria, etc.), and critique needs both to evaluate the plan. Labels can signal "go/no-go" but can't carry a run ID for artifact correlation, so we use explicit `workflow_dispatch` chaining instead. For example, post-explore.sh chains to refine like this: |
There was a problem hiding this comment.
Hm, I think this is probably the decision. Adding new agents is non controversial, but having some agents depend on the execution details of others is a new idea. I suggest refactoring this so that the decision is adding this new pattern and the context sets up why that's necessary.
There was a problem hiding this comment.
Good call — refactored. The ADR now leads with agent chaining as the decision (one agent's post-script triggers the next via workflow_dispatch, passing a run ID for artifact correlation). The three refinement agents are the first use of this pattern, not the decision itself.
Also added an Alternatives Considered section covering:
- Single-workflow pipeline (all 3 stages as jobs in one run — would reduce apps from 3→1, eliminates data leakage risk, but requires harness changes tracked in Design validation loop and code→review→code orchestration patterns for harness definitions #234 and Investigate deterministic orchestration for intra-stage multi-agent pipelines #1817)
- Issue comments as data store (rejected — 65K char limit vs ~100KB exploration context)
- Repository dispatch with client_payload (rejected — size/property constraints without clear benefit)
The single-workflow alternative is cross-referenced to the Deferred table as the strongest Phase 2 candidate.
There was a problem hiding this comment.
I see issue comments were considered and rejected for the 65K limit, which makes sense. Were issue attachments considered though? JIRA has a proper attachment API (POST /rest/api/2/issue/{key}/attachments) and the size limits are way more generous than 65K.
GitHub issues are trickier — the UI supports file attachments up to 25MB but there's no public API for uploading them. So this might only work cleanly on the JIRA side.
If we could attach the exploration JSON to the JIRA ticket, then /fs-refine pulls it from there instead of snaking back to the workflow run. That'd also make /fs-explore useful as a standalone thing — you explore, the artifact lives on the ticket, and refinement can happen later (or not at all).
There was a problem hiding this comment.
For GitHub, maybe we post a comment on the issue with a link back to the workflow artifact. The downstream agent navigates to the issue first, finds the link, follows it. Extra hop, but it decouples the agents from each other's execution details.
Right now the refine agent has to know that the explore agent runs as a GitHub Actions workflow and how to find its run. That coupling is the thing that feels off — other agents aren't aware of each other's runtime like that (except retro, which is arguably special). If the issue is always the rendezvous point, agents just need to know how to read from the issue tracker, which they already do.
…rame - Change REPO_VISIBILITY fallback from "unknown" to "public" in pre-explore.sh, jira-dispatch.yml, jira-comment-poller.yml so gh api failures trigger the safety block (fail-closed) - Add ::stop-commands:: guards around user-controlled echo blocks in refine-dispatch.yml and jira-dispatch.yml to prevent GHA command injection - Convert jq|while pipes to process substitution in jira-comment-poller.yml so DISPATCHED counter propagates to parent shell - Standardize ERROR: prefixes to ::error:: annotations in pre-explore.sh - Reframe ADR 0051 per Ralph's feedback: decision is agent chaining as a new orchestration pattern, not adding 3 agents. Add Alternatives Considered section with single-workflow pipeline (#234, #1817), issue-comment data store, and repository_dispatch options Signed-off-by: Adam Scerra <ascerra@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
🤖 Finished Review · ✅ Success · Started 8:01 PM UTC · Completed 8:41 PM UTC |
| | `/fs-refine` | Issue comment | Triggers the refinement pipeline starting with explore | | ||
|
|
||
| The `/fs-refine` command kicks off the full pipeline. Explore runs first, then automatically chains to [refine](refine.md). There is no separate command to run explore in isolation. |
There was a problem hiding this comment.
This feels weird, that the agent's command is /fs-refine when the agent is the "explore" agent. Should it be /fs-explore?
I think what this really reveals to me is that explore agent is not useful on its own - as this is currently structured. Its only purpose is to be a prelude to the refine agent.
Perhaps we should restructure that and treat it as a standalone thing. i.e., what if you could just /fs-explore a feature and get an exploration json attached to the JIRA (or to the gh issue, if that's possible to upload attachments). Then /fs-refine would launch the refinement agent that takes in that attachment from the jira or from the github issue, rather than snaking back to the gh workflow run.
There was a problem hiding this comment.
OTOH, if we want explore->refine->critique->(loop with refine until 👍) to be an uninterrupted thing, then perhaps it should just be one agent inside one giant claude code session - one harness file.
|
|
||
| ## Decision | ||
|
|
||
| We will introduce **agent chaining** as a new orchestration pattern in fullsend: the ability for one agent's post-script to trigger a subsequent agent via `workflow_dispatch`, passing a run ID that the next agent uses to download the previous stage's artifacts. |
There was a problem hiding this comment.
Does this mean that the explore agent always triggers the refine agent and the refine agent always triggers the critique agent?
Is there anyway to "start" the workflow at the refinement agent, if, say, humans have feedback and want to tweak something without having to re-do all of the exploration?
| --jq '[.[] | select(.status == "completed" and .conclusion == "success")] | .[0].databaseId // empty' \ | ||
| 2>/dev/null || true) | ||
|
|
||
| FOLLOWUP_ARGS=( |
There was a problem hiding this comment.
[medium] logic-error
Follow-up path (IS_FOLLOWUP=true) searches for the most recent successful explore run without filtering by issue number. If multiple issues have had explore runs, the wrong exploration context will be attached to the refine stage.
Suggested fix: Add a displayTitle filter matching the issue number to the gh run list jq query, consistent with how /fs-create searches for refine runs.
| # Validate parent_title references resolve within the tree | ||
| ORPHAN_REFS=$(jq -r ' | ||
| [.children[].title] as $titles | | ||
| [.children[] | select(.parent_title != null and .parent_title != "") | |
There was a problem hiding this comment.
[medium] logic-error
Orphan parent_title validation uses jq inside() for substring matching instead of exact match. Runtime behavior is correct (creation loop uses exact match), but validation warning is misleading.
Suggested fix: Replace inside() with exact matching.
Summary
Adds three new first-class agents forming a chained refinement pipeline: explore → refine → critique, triggered by
/fs-refineon an issue. Includes Jira integration with credential isolation.Review guide
~70% of this diff is docs, declarative config, and tests. Focus review time here:
scripts/post-critique.shscripts/create-children.shscripts/pre-explore.shscripts/sanitize-artifacts.shscripts/pipeline-helpers.shrefine-dispatch.yml,jira-dispatch.yml,jira-comment-poller.yml)What's NOT included (Phase 2+)
fullsend jira enrollCLI commandTest plan
make go-test— all packages pass (scaffold, harness, config, forge/github)make go-vet— cleanmake script-test— 5 new test scripts pass (post-explore, post-refine, post-critique, create-children, sanitize-artifacts)/fs-refine(planned)Made with Cursor