perf(#2354): bound enrollment wait with timeout and backoff#65
perf(#2354): bound enrollment wait with timeout and backoff#65guyoron1 wants to merge 41 commits into
Conversation
Design for a new `prerequisites` triage action that replaces `blocked`. The agent can now express both existing blockers and new issues that need to be created upstream before progress can happen. Includes allowlist configuration for cross-repo issue creation and a degraded path when targets are not authorized. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…nd-ai#401) Seven-task plan covering config structs, JSON schema, agent prompt, post-script, user docs, and caller updates. TDD approach with exact file paths and code blocks. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Add CreateIssuesConfig and AllowTargets types to both OrgConfig and PerRepoConfig. NewOrgConfig populates defaults with the org and fullsend-ai/fullsend. NewPerRepoConfig populates with the target repo and fullsend-ai/fullsend. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…ues (fullsend-ai#401) Pass org name and target repo to config constructors so create_issues defaults are populated at install time. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…pt (fullsend-ai#401) The triage agent can now recommend creating upstream issues via the prerequisites action's create array, in addition to referencing existing blockers. Adds hard constraint against emitting sufficient when prerequisites exist. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…d-ai#401) Update triage agent docs to explain the new prerequisites action and the create_issues.allow_targets configuration surface. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…#401) Replace the blocked handler with prerequisites. The post-script reads the create_issues allowlist from config.yaml, creates permitted upstream issues via gh, and includes collapsed draft bodies for disallowed or failed creates so humans can file them manually. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…ullsend-ai#401) The agent prompt referenced a nonexistent `prerequisites` label when checking for prior blockers — the post-script actually applies the `blocked` label. Also removed unused SOURCE_ORG variable from post-triage.sh. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…end-ai#401) Replace the four blocked-action test cases with five prerequisites-action test cases that exercise the new schema (existing[], create[], allowlist validation). Set up GITHUB_WORKSPACE with a config.yaml fixture and add a mock gh issue-create handler that returns a fake URL. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…ullsend-ai#401) Replace blocked-action test cases with prerequisites-action equivalents and update the expected property list (blocked_by → prerequisites). Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…d-ai#401) - Replace stale blocked-* schema validation tests with prerequisites equivalents (missing field, both arrays empty, malformed URL) - Fix validateCreateIssues to reject malformed repo formats like "/", "/repo", "owner/" - Align triage.md section 2c terminology from "blocker" to "prerequisite" consistently - Update bugfix-workflow.md and architecture.md to document upstream issue creation capability - Emit ::warning:: when yq is unavailable so silent degradation of cross-repo issue creation is diagnosable Signed-off-by: Ralph Bean <rbean@redhat.com> Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Adds a skill that summarizes recent E2E Tests workflow runs on main, presents them in a table with clickable links, and diagnoses failures by grepping failed step logs for signal lines. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
The markdown link linter was parsing `[run-id](url)` as a real file reference. Wrapping it in backticks marks it as a code example. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
- Move list-runs.sh to scripts/ subdirectory to match convention - Add bash command prefix to allowed-tools declaration - Clarify status vs conclusion field handling for in-progress runs - Use case-insensitive grep to catch Timeout/timeout variants - Tighten frontmatter description Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
When multiple runners exhaust the GraphQL rate limit simultaneously, they all sleep until the same reset timestamp and wake up together. The existing slot jitter (250-750ms) is too narrow to desynchronize them, causing collisions that surface as "unknown owner type" errors from gh project view. Add a post-reset spread of up to 60s (configurable via GITHUB_CSMA_SPREAD_MAX_SEC) so runners fan out over a wide window after waking from a rate-limit sleep. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…ss schema Add end-to-end integration tests covering the full Phase 2 pipeline (PR 6 of 6 in the ADR-0045 forge-portable harness schema adoption): - LoadWithBase wrapper→scaffold merge with field inheritance and override - All scaffold templates forge resolution (pre/post scripts, runner_env) - Backward compatibility via Load() (no forge platform) - DiscoverAgents scaffold directory scanning with correct role/slug pairs - HarnessContentHash integrity verification against embedded content - LoadRaw generated wrapper format validation - ResolveForge scaffold runner_env merge with per-template key assertions Resolves fullsend-ai#2328 Signed-off-by: Greg Allen <greg@fullsend.ai> Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Greg Allen <gallen@redhat.com>
Status comments on PRs/issues get stuck in "Started" when the pre-minted agent token expires before PostCompletion runs. Instead of relying on a static token, have the fullsend binary mint its own fresh short-lived token via mintclient.MintToken() before each status comment API call. Key changes: - Add ClientFactory pattern to statuscomment.Notifier so each API operation gets a freshly minted forge.Client - Add --mint-url flag to fullsend run and reconcile-status commands - Add mint-url input to action.yml and all reusable workflows - Deprecate --status-token (run) and --token (reconcile-status) with runtime warnings; hidden from help output - Deprecate status-token input in action.yml; mask unconditionally - Validate token format before ::add-mask:: to prevent workflow command injection - Move refreshClient below commentEnabled guard in PostCompletion - Make refreshClient failure in cleanup path fail-open (warning) - Add "code" -> "coder" role alias for agent name resolution Closes fullsend-ai#2130 Signed-off-by: Greg Allen <gallen@redhat.com> Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Greg Allen <gallen@redhat.com>
…window fix: widen CSMA post-reset jitter to prevent thundering herd
The previous backtick-escaping attempt (7c40a70) did not prevent lychee from resolving `url` as a relative file path. Remove the markdown link syntax entirely so the link checker has nothing to chase. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…kill feat(skills): add e2e-health skill
…ter_rate_limit PR fullsend-ai#2304 added post-reset spread to github_csma_sense to prevent thundering herd when runners wake after a rate-limit reset. The structurally parallel _github_csma_sleep_after_rate_limit function was missing the same treatment — multiple runners hitting a 429 would all wake at the same reset timestamp and fire simultaneously. Extract the spread logic into a shared _github_csma_post_reset_spread helper and call it from both github_csma_sense (replacing the inline code) and _github_csma_sleep_after_rate_limit (added after the backoff sleep). Both paths now use GITHUB_CSMA_SPREAD_MAX_SEC to stagger runner wake times. Note: pre-commit and make lint could not run due to shellcheck-py network restriction in sandbox. Scaffold Go tests pass. Closes fullsend-ai#2343
…spread-rate-limit fix(fullsend-ai#2343): add post-reset spread to _github_csma_sleep_after_rate_limit
The NewOrgConfig call gained a 6th parameter (org string) on this branch. Main didn't have it yet, causing a conflict. Keep the 6-parameter version to match the current function signature. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
…atus-token fix(fullsend-ai#2130): mint fresh tokens for status comments
test: Phase 2 integration tests for ADR-0045 forge-portable harness
…-decompose-issues feat(triage): add prerequisites action for upstream issue creation (fullsend-ai#401)
Replace the hardcoded 36-iteration fixed-interval polling loop in
awaitWorkflowRun with a time-bounded loop using exponential
backoff. The total wait is capped at 3 minutes (matching the
previous maximum), but polling starts at 2s intervals and doubles
up to 15s, reducing API calls and giving faster feedback when
the workflow completes quickly.
Changes:
- Add enrollmentWaitTimeout, enrollmentPollInitial, and
enrollmentPollMax constants to control polling behavior
- Replace iteration-count loop with deadline-based loop
- Use exponential backoff (2s → 4s → 8s → 15s cap) via
nextInterval helper
- Improve progress messages to show elapsed time instead of
attempt numbers
- Include actionable guidance in timeout error message
("check the workflow in .fullsend and re-run install")
- Add progress indicator before starting the wait
Closes fullsend-ai#2354
|
/fs-qf |
|
🤖 Finished Review · ✅ Success · Started 12:18 PM UTC · Completed 12:33 PM UTC |
ReviewReason: stale-head The review agent reviewed commit Previous runReviewReason: stale-head The review agent reviewed commit Previous run (2)ReviewReason: stale-head The review agent reviewed commit Previous run (3)ReviewReason: stale-head The review agent reviewed commit |
|
/fs-review |
|
🤖 Finished Review · ✅ Success · Started 12:36 PM UTC · Completed 12:52 PM UTC |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…[skip ci] Resolved 2 major and 5 minor findings across 2 iterations: - Removed internal Go type references from formal STP sections - Clarified STP purpose as regression coverage for PR fullsend-ai#1954 - Rewrote implementation-level scenarios to user-observable outcomes - Simplified Testing Tools, removed N/A risk boilerplate - Added admin CLI dispatch to Out of Scope Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/fs-review |
|
🤖 Finished Review · ✅ Success · Started 12:55 PM UTC · Completed 1:14 PM UTC |
|
/fs-review |
|
🤖 Finished Review · ✅ Success · Started 1:17 PM UTC · Completed 1:32 PM UTC |
STD refinement iteration 1 — upgraded verdict from APPROVED_WITH_FINDINGS (82/100) to APPROVED (94/100). Resolved 7 MAJOR and 5 MINOR findings: - Standardize tier values from "Functional" to "Tier 1" across all 21 scenarios - Add patterns.primary_pattern to all 21 scenarios with semantic assignments - Remove related_prs from document_metadata (belongs in STP only) - Replace literal Go code in test_data with declarative descriptions - Add missing imports (bytes, errors, runtime, regexp) - Align code_structure with actual stub t.Run grouping - Add buffer-inspection steps and specific verification patterns to PSE blocks - Add test_data sections to all scenarios - Fix vague/informal assertions (scenarios 018, 021) - Deduplicate parent-level Preconditions in all 8 Go stub files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/fs-review |
Replaces intermediate pipeline artifacts with organized test files. Total: 8 test files → qf-tests/fullsend-aiGH-2354/ Jira: fullsend-aiGH-2354 [skip ci]
QualityFlow Pipeline Summary
Test Output
Issue: GH-2354 Generated by QualityFlow |
Mirror of upstream fullsend-ai#2359
Adds a timeout and exponential backoff to the enrollment wait loop, preventing unbounded waits when the enrollment PR is stuck or slow to merge.