Skip to content

perf(#2354): bound enrollment wait with timeout and backoff#65

Closed
guyoron1 wants to merge 41 commits into
mainfrom
mirror/2359-2354-enrollment-wait-timeout
Closed

perf(#2354): bound enrollment wait with timeout and backoff#65
guyoron1 wants to merge 41 commits into
mainfrom
mirror/2359-2354-enrollment-wait-timeout

Conversation

@guyoron1

Copy link
Copy Markdown
Owner

Mirror of upstream fullsend-ai#2359

Adds a timeout and exponential backoff to the enrollment wait loop, preventing unbounded waits when the enrollment PR is stuck or slow to merge.

ralphbean and others added 29 commits June 11, 2026 15:45
Design for a new `prerequisites` triage action that replaces `blocked`.
The agent can now express both existing blockers and new issues that need
to be created upstream before progress can happen. Includes allowlist
configuration for cross-repo issue creation and a degraded path when
targets are not authorized.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…nd-ai#401)

Seven-task plan covering config structs, JSON schema, agent prompt,
post-script, user docs, and caller updates. TDD approach with exact
file paths and code blocks.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Add CreateIssuesConfig and AllowTargets types to both OrgConfig and
PerRepoConfig. NewOrgConfig populates defaults with the org and
fullsend-ai/fullsend. NewPerRepoConfig populates with the target repo
and fullsend-ai/fullsend.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…ues (fullsend-ai#401)

Pass org name and target repo to config constructors so create_issues
defaults are populated at install time.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
)

Replace the blocked action and blocked_by field with a prerequisites
action containing existing[] and create[] arrays. At least one array
must be non-empty.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…pt (fullsend-ai#401)

The triage agent can now recommend creating upstream issues via the
prerequisites action's create array, in addition to referencing existing
blockers. Adds hard constraint against emitting sufficient when
prerequisites exist.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…d-ai#401)

Update triage agent docs to explain the new prerequisites action and the
create_issues.allow_targets configuration surface.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…#401)

Replace the blocked handler with prerequisites. The post-script reads
the create_issues allowlist from config.yaml, creates permitted upstream
issues via gh, and includes collapsed draft bodies for disallowed or
failed creates so humans can file them manually.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…ullsend-ai#401)

The agent prompt referenced a nonexistent `prerequisites` label when
checking for prior blockers — the post-script actually applies the
`blocked` label. Also removed unused SOURCE_ORG variable from
post-triage.sh.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…end-ai#401)

Replace the four blocked-action test cases with five prerequisites-action
test cases that exercise the new schema (existing[], create[], allowlist
validation). Set up GITHUB_WORKSPACE with a config.yaml fixture and add
a mock gh issue-create handler that returns a fake URL.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…ullsend-ai#401)

Replace blocked-action test cases with prerequisites-action equivalents
and update the expected property list (blocked_by → prerequisites).

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…d-ai#401)

- Replace stale blocked-* schema validation tests with prerequisites
  equivalents (missing field, both arrays empty, malformed URL)
- Fix validateCreateIssues to reject malformed repo formats like "/",
  "/repo", "owner/"
- Align triage.md section 2c terminology from "blocker" to
  "prerequisite" consistently
- Update bugfix-workflow.md and architecture.md to document upstream
  issue creation capability
- Emit ::warning:: when yq is unavailable so silent degradation of
  cross-repo issue creation is diagnosable

Signed-off-by: Ralph Bean <rbean@redhat.com>
Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Adds a skill that summarizes recent E2E Tests workflow runs on main,
presents them in a table with clickable links, and diagnoses failures
by grepping failed step logs for signal lines.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
The markdown link linter was parsing `[run-id](url)` as a real file
reference. Wrapping it in backticks marks it as a code example.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
- Move list-runs.sh to scripts/ subdirectory to match convention
- Add bash command prefix to allowed-tools declaration
- Clarify status vs conclusion field handling for in-progress runs
- Use case-insensitive grep to catch Timeout/timeout variants
- Tighten frontmatter description

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
When multiple runners exhaust the GraphQL rate limit simultaneously,
they all sleep until the same reset timestamp and wake up together.
The existing slot jitter (250-750ms) is too narrow to desynchronize
them, causing collisions that surface as "unknown owner type" errors
from gh project view.

Add a post-reset spread of up to 60s (configurable via
GITHUB_CSMA_SPREAD_MAX_SEC) so runners fan out over a wide window
after waking from a rate-limit sleep.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…ss schema

Add end-to-end integration tests covering the full Phase 2 pipeline
(PR 6 of 6 in the ADR-0045 forge-portable harness schema adoption):

- LoadWithBase wrapper→scaffold merge with field inheritance and override
- All scaffold templates forge resolution (pre/post scripts, runner_env)
- Backward compatibility via Load() (no forge platform)
- DiscoverAgents scaffold directory scanning with correct role/slug pairs
- HarnessContentHash integrity verification against embedded content
- LoadRaw generated wrapper format validation
- ResolveForge scaffold runner_env merge with per-template key assertions

Resolves fullsend-ai#2328

Signed-off-by: Greg Allen <greg@fullsend.ai>
Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Greg Allen <gallen@redhat.com>
Status comments on PRs/issues get stuck in "Started" when the
pre-minted agent token expires before PostCompletion runs. Instead of
relying on a static token, have the fullsend binary mint its own fresh
short-lived token via mintclient.MintToken() before each status
comment API call.

Key changes:
- Add ClientFactory pattern to statuscomment.Notifier so each API
  operation gets a freshly minted forge.Client
- Add --mint-url flag to fullsend run and reconcile-status commands
- Add mint-url input to action.yml and all reusable workflows
- Deprecate --status-token (run) and --token (reconcile-status) with
  runtime warnings; hidden from help output
- Deprecate status-token input in action.yml; mask unconditionally
- Validate token format before ::add-mask:: to prevent workflow
  command injection
- Move refreshClient below commentEnabled guard in PostCompletion
- Make refreshClient failure in cleanup path fail-open (warning)
- Add "code" -> "coder" role alias for agent name resolution

Closes fullsend-ai#2130

Signed-off-by: Greg Allen <gallen@redhat.com>
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Greg Allen <gallen@redhat.com>
…window

fix: widen CSMA post-reset jitter to prevent thundering herd
The previous backtick-escaping attempt (7c40a70) did not prevent
lychee from resolving `url` as a relative file path. Remove the
markdown link syntax entirely so the link checker has nothing to chase.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…ter_rate_limit

PR fullsend-ai#2304 added post-reset spread to github_csma_sense to prevent
thundering herd when runners wake after a rate-limit reset. The
structurally parallel _github_csma_sleep_after_rate_limit function
was missing the same treatment — multiple runners hitting a 429
would all wake at the same reset timestamp and fire simultaneously.

Extract the spread logic into a shared _github_csma_post_reset_spread
helper and call it from both github_csma_sense (replacing the inline
code) and _github_csma_sleep_after_rate_limit (added after the
backoff sleep). Both paths now use GITHUB_CSMA_SPREAD_MAX_SEC to
stagger runner wake times.

Note: pre-commit and make lint could not run due to shellcheck-py
network restriction in sandbox. Scaffold Go tests pass.

Closes fullsend-ai#2343
…spread-rate-limit

fix(fullsend-ai#2343): add post-reset spread to _github_csma_sleep_after_rate_limit
The NewOrgConfig call gained a 6th parameter (org string) on this
branch. Main didn't have it yet, causing a conflict. Keep the
6-parameter version to match the current function signature.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…atus-token

fix(fullsend-ai#2130): mint fresh tokens for status comments
test: Phase 2 integration tests for ADR-0045 forge-portable harness
…-decompose-issues

feat(triage): add prerequisites action for upstream issue creation (fullsend-ai#401)
Replace the hardcoded 36-iteration fixed-interval polling loop in
awaitWorkflowRun with a time-bounded loop using exponential
backoff. The total wait is capped at 3 minutes (matching the
previous maximum), but polling starts at 2s intervals and doubles
up to 15s, reducing API calls and giving faster feedback when
the workflow completes quickly.

Changes:
- Add enrollmentWaitTimeout, enrollmentPollInitial, and
  enrollmentPollMax constants to control polling behavior
- Replace iteration-count loop with deadline-based loop
- Use exponential backoff (2s → 4s → 8s → 15s cap) via
  nextInterval helper
- Improve progress messages to show elapsed time instead of
  attempt numbers
- Include actionable guidance in timeout error message
  ("check the workflow in .fullsend and re-run install")
- Add progress indicator before starting the wait

Closes fullsend-ai#2354
@guyoron1

Copy link
Copy Markdown
Owner Author

/fs-qf

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 21, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 12:18 PM UTC · Completed 12:33 PM UTC
Commit: 1cf60b6 · View workflow run →

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review

Reason: stale-head

The review agent reviewed commit 324343713b6f26ac2b2a516c62af64453703586b but the PR HEAD is now d3bfda48961d4b018539aeb00c61a61c35f6559b. This review was discarded to avoid approving unreviewed code.

Previous run

Review

Reason: stale-head

The review agent reviewed commit 148a3ac5f4a49bf11de4dbd87043bd923f065fae but the PR HEAD is now 324343713b6f26ac2b2a516c62af64453703586b. This review was discarded to avoid approving unreviewed code.

Previous run (2)

Review

Reason: stale-head

The review agent reviewed commit 7bd446b88f53604ec724299a4699426e30901c01 but the PR HEAD is now cea5514e793752a8dfcaf49098de56fb6a7e5e7c. This review was discarded to avoid approving unreviewed code.

Previous run (3)

Review

Reason: stale-head

The review agent reviewed commit 8526637473d417c6915aa1f3fe01c075b64b59d5 but the PR HEAD is now 200530128beb67e4e2e101786617b3b896cbf708. This review was discarded to avoid approving unreviewed code.

@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 21, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 12:36 PM UTC · Completed 12:52 PM UTC
Commit: 1cf60b6 · View workflow run →

QualityFlow and others added 2 commits June 21, 2026 12:36
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…[skip ci]

Resolved 2 major and 5 minor findings across 2 iterations:
- Removed internal Go type references from formal STP sections
- Clarified STP purpose as regression coverage for PR fullsend-ai#1954
- Rewrote implementation-level scenarios to user-observable outcomes
- Simplified Testing Tools, removed N/A risk boilerplate
- Added admin CLI dispatch to Out of Scope

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 21, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 12:55 PM UTC · Completed 1:14 PM UTC
Commit: 1cf60b6 · View workflow run →

@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 21, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 1:17 PM UTC · Completed 1:32 PM UTC
Commit: 1cf60b6 · View workflow run →

STD refinement iteration 1 — upgraded verdict from APPROVED_WITH_FINDINGS (82/100)
to APPROVED (94/100). Resolved 7 MAJOR and 5 MINOR findings:

- Standardize tier values from "Functional" to "Tier 1" across all 21 scenarios
- Add patterns.primary_pattern to all 21 scenarios with semantic assignments
- Remove related_prs from document_metadata (belongs in STP only)
- Replace literal Go code in test_data with declarative descriptions
- Add missing imports (bytes, errors, runtime, regexp)
- Align code_structure with actual stub t.Run grouping
- Add buffer-inspection steps and specific verification patterns to PSE blocks
- Add test_data sections to all scenarios
- Fix vague/informal assertions (scenarios 018, 021)
- Deduplicate parent-level Preconditions in all 8 Go stub files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

QualityFlow and others added 2 commits June 21, 2026 13:39
Replaces intermediate pipeline artifacts with organized test files.

Total: 8 test files → qf-tests/fullsend-aiGH-2354/
Jira: fullsend-aiGH-2354
[skip ci]
@github-actions

Copy link
Copy Markdown

QualityFlow Pipeline Summary

Stage Agent Status
1 STP Builder
2 STP Reviewer
3 STP Refiner
4 STD Builder
5 STD Reviewer
6 STD Refiner
7 Test Generator

Test Output

Language Count Location
Go 8 files qf-tests/GH-2354/go/

Issue: GH-2354


Generated by QualityFlow

@guyoron1 guyoron1 closed this Jun 21, 2026
@guyoron1 guyoron1 deleted the mirror/2359-2354-enrollment-wait-timeout branch June 21, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants