Skip to content

perf(#2354): bound enrollment wait with timeout and backoff#76

Open
guyoron1 wants to merge 53 commits into
mainfrom
mirror/2359-2354-enrollment-wait-timeout
Open

perf(#2354): bound enrollment wait with timeout and backoff#76
guyoron1 wants to merge 53 commits into
mainfrom
mirror/2359-2354-enrollment-wait-timeout

Conversation

@guyoron1

Copy link
Copy Markdown
Owner

Mirror of upstream fullsend-ai#2359

Adds timeout and backoff to enrollment wait to prevent unbounded blocking.

ralphbean and others added 29 commits June 11, 2026 15:45
Design for a new `prerequisites` triage action that replaces `blocked`.
The agent can now express both existing blockers and new issues that need
to be created upstream before progress can happen. Includes allowlist
configuration for cross-repo issue creation and a degraded path when
targets are not authorized.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…nd-ai#401)

Seven-task plan covering config structs, JSON schema, agent prompt,
post-script, user docs, and caller updates. TDD approach with exact
file paths and code blocks.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Add CreateIssuesConfig and AllowTargets types to both OrgConfig and
PerRepoConfig. NewOrgConfig populates defaults with the org and
fullsend-ai/fullsend. NewPerRepoConfig populates with the target repo
and fullsend-ai/fullsend.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…ues (fullsend-ai#401)

Pass org name and target repo to config constructors so create_issues
defaults are populated at install time.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
)

Replace the blocked action and blocked_by field with a prerequisites
action containing existing[] and create[] arrays. At least one array
must be non-empty.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…pt (fullsend-ai#401)

The triage agent can now recommend creating upstream issues via the
prerequisites action's create array, in addition to referencing existing
blockers. Adds hard constraint against emitting sufficient when
prerequisites exist.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…d-ai#401)

Update triage agent docs to explain the new prerequisites action and the
create_issues.allow_targets configuration surface.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…#401)

Replace the blocked handler with prerequisites. The post-script reads
the create_issues allowlist from config.yaml, creates permitted upstream
issues via gh, and includes collapsed draft bodies for disallowed or
failed creates so humans can file them manually.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…ullsend-ai#401)

The agent prompt referenced a nonexistent `prerequisites` label when
checking for prior blockers — the post-script actually applies the
`blocked` label. Also removed unused SOURCE_ORG variable from
post-triage.sh.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…end-ai#401)

Replace the four blocked-action test cases with five prerequisites-action
test cases that exercise the new schema (existing[], create[], allowlist
validation). Set up GITHUB_WORKSPACE with a config.yaml fixture and add
a mock gh issue-create handler that returns a fake URL.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…ullsend-ai#401)

Replace blocked-action test cases with prerequisites-action equivalents
and update the expected property list (blocked_by → prerequisites).

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…d-ai#401)

- Replace stale blocked-* schema validation tests with prerequisites
  equivalents (missing field, both arrays empty, malformed URL)
- Fix validateCreateIssues to reject malformed repo formats like "/",
  "/repo", "owner/"
- Align triage.md section 2c terminology from "blocker" to
  "prerequisite" consistently
- Update bugfix-workflow.md and architecture.md to document upstream
  issue creation capability
- Emit ::warning:: when yq is unavailable so silent degradation of
  cross-repo issue creation is diagnosable

Signed-off-by: Ralph Bean <rbean@redhat.com>
Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Adds a skill that summarizes recent E2E Tests workflow runs on main,
presents them in a table with clickable links, and diagnoses failures
by grepping failed step logs for signal lines.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
The markdown link linter was parsing `[run-id](url)` as a real file
reference. Wrapping it in backticks marks it as a code example.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
- Move list-runs.sh to scripts/ subdirectory to match convention
- Add bash command prefix to allowed-tools declaration
- Clarify status vs conclusion field handling for in-progress runs
- Use case-insensitive grep to catch Timeout/timeout variants
- Tighten frontmatter description

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
When multiple runners exhaust the GraphQL rate limit simultaneously,
they all sleep until the same reset timestamp and wake up together.
The existing slot jitter (250-750ms) is too narrow to desynchronize
them, causing collisions that surface as "unknown owner type" errors
from gh project view.

Add a post-reset spread of up to 60s (configurable via
GITHUB_CSMA_SPREAD_MAX_SEC) so runners fan out over a wide window
after waking from a rate-limit sleep.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…ss schema

Add end-to-end integration tests covering the full Phase 2 pipeline
(PR 6 of 6 in the ADR-0045 forge-portable harness schema adoption):

- LoadWithBase wrapper→scaffold merge with field inheritance and override
- All scaffold templates forge resolution (pre/post scripts, runner_env)
- Backward compatibility via Load() (no forge platform)
- DiscoverAgents scaffold directory scanning with correct role/slug pairs
- HarnessContentHash integrity verification against embedded content
- LoadRaw generated wrapper format validation
- ResolveForge scaffold runner_env merge with per-template key assertions

Resolves fullsend-ai#2328

Signed-off-by: Greg Allen <greg@fullsend.ai>
Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Greg Allen <gallen@redhat.com>
Status comments on PRs/issues get stuck in "Started" when the
pre-minted agent token expires before PostCompletion runs. Instead of
relying on a static token, have the fullsend binary mint its own fresh
short-lived token via mintclient.MintToken() before each status
comment API call.

Key changes:
- Add ClientFactory pattern to statuscomment.Notifier so each API
  operation gets a freshly minted forge.Client
- Add --mint-url flag to fullsend run and reconcile-status commands
- Add mint-url input to action.yml and all reusable workflows
- Deprecate --status-token (run) and --token (reconcile-status) with
  runtime warnings; hidden from help output
- Deprecate status-token input in action.yml; mask unconditionally
- Validate token format before ::add-mask:: to prevent workflow
  command injection
- Move refreshClient below commentEnabled guard in PostCompletion
- Make refreshClient failure in cleanup path fail-open (warning)
- Add "code" -> "coder" role alias for agent name resolution

Closes fullsend-ai#2130

Signed-off-by: Greg Allen <gallen@redhat.com>
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Greg Allen <gallen@redhat.com>
…window

fix: widen CSMA post-reset jitter to prevent thundering herd
The previous backtick-escaping attempt (7c40a70) did not prevent
lychee from resolving `url` as a relative file path. Remove the
markdown link syntax entirely so the link checker has nothing to chase.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…ter_rate_limit

PR fullsend-ai#2304 added post-reset spread to github_csma_sense to prevent
thundering herd when runners wake after a rate-limit reset. The
structurally parallel _github_csma_sleep_after_rate_limit function
was missing the same treatment — multiple runners hitting a 429
would all wake at the same reset timestamp and fire simultaneously.

Extract the spread logic into a shared _github_csma_post_reset_spread
helper and call it from both github_csma_sense (replacing the inline
code) and _github_csma_sleep_after_rate_limit (added after the
backoff sleep). Both paths now use GITHUB_CSMA_SPREAD_MAX_SEC to
stagger runner wake times.

Note: pre-commit and make lint could not run due to shellcheck-py
network restriction in sandbox. Scaffold Go tests pass.

Closes fullsend-ai#2343
…spread-rate-limit

fix(fullsend-ai#2343): add post-reset spread to _github_csma_sleep_after_rate_limit
The NewOrgConfig call gained a 6th parameter (org string) on this
branch. Main didn't have it yet, causing a conflict. Keep the
6-parameter version to match the current function signature.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
…atus-token

fix(fullsend-ai#2130): mint fresh tokens for status comments
test: Phase 2 integration tests for ADR-0045 forge-portable harness
…-decompose-issues

feat(triage): add prerequisites action for upstream issue creation (fullsend-ai#401)
Replace the hardcoded 36-iteration fixed-interval polling loop in
awaitWorkflowRun with a time-bounded loop using exponential
backoff. The total wait is capped at 3 minutes (matching the
previous maximum), but polling starts at 2s intervals and doubles
up to 15s, reducing API calls and giving faster feedback when
the workflow completes quickly.

Changes:
- Add enrollmentWaitTimeout, enrollmentPollInitial, and
  enrollmentPollMax constants to control polling behavior
- Replace iteration-count loop with deadline-based loop
- Use exponential backoff (2s → 4s → 8s → 15s cap) via
  nextInterval helper
- Improve progress messages to show elapsed time instead of
  attempt numbers
- Include actionable guidance in timeout error message
  ("check the workflow in .fullsend and re-run install")
- Add progress indicator before starting the wait

Closes fullsend-ai#2354
@guyoron1

Copy link
Copy Markdown
Owner Author

/fs-qf

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 4:08 AM UTC · Completed 4:26 AM UTC
Commit: 73a8f7b · View workflow run →

QualityFlow added 2 commits June 22, 2026 04:12
Resolved all 4 major and 7 minor findings from initial review:
- Rewrote scenarios to use user-observable behavior (D1-A-001)
- Fixed personal fork URLs to upstream (D1-N-001)
- Added triage prerequisites to Out of Scope (D2-COV-001)
- Replaced vague qualifiers with specific outcomes (D1-A2-001)
- Reclassified Dependencies, added QE kickoff timing
- Added P2 priority tier, removed empty risk entries
- Simplified Feature Overview, acknowledged security dimension

Weighted score: 81.6 → 96.4
@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 4:30 AM UTC · Completed 4:46 AM UTC
Commit: 73a8f7b · View workflow run →

QualityFlow and others added 2 commits June 22, 2026 04:36
STD refiner could not proceed: STD YAML was never generated.
Verdict remains BLOCKED. Run std-builder first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

QualityFlow and others added 2 commits June 22, 2026 04:50
Generated 58 Go tests across 3 packages covering:
- Enrollment wait timeout and exponential backoff (25 tests)
- Status comment lifecycle and orphan reconciliation (29 tests)
- reconcile-status CLI command validation (14 tests)

All tests compile and pass. Co-located with source packages per
QualityFlow convention (qf_ prefix).
Removes intermediate pipeline artifacts (STP, STD, reviews).
Test files (3) are co-located in source tree with qf_ prefix.
Jira: GH-76
[skip ci]
@guyoron1

Copy link
Copy Markdown
Owner Author

/fs-qf

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 5:49 AM UTC · Completed 6:05 AM UTC
Commit: 6ead6d0 · View workflow run →

@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 6:08 AM UTC · Completed 6:43 AM UTC
Commit: 6ead6d0 · View workflow run →

QualityFlow and others added 4 commits June 22, 2026 06:15
Removes intermediate pipeline artifacts (STP, STD, reviews).
Test files (3) are co-located in source tree with qf_ prefix.
Jira: GH-76
[skip ci]
@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 6:46 AM UTC · Completed 7:03 AM UTC
Commit: 6ead6d0 · View workflow run →

@fullsend-ai-review fullsend-ai-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

Comment thread internal/cli/run.go
n := statuscomment.New(client, notifyCfg, owner, repo, sOpts.statusNum, sOpts.runURL, sha, runID)
var initialClient forge.Client
if staticToken != "" {
initialClient = gh.New(staticToken)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] nil-deref

initialClient is nil when mintURL is empty (the else branch at ~line 1880 does not set it). This nil client is passed to statuscomment.New(...). If ClientFactory is also nil or fails on first refresh, methods on the nil initialClient will panic.

Suggested fix: Either initialize initialClient to a no-op client, or guard against nil in statuscomment.New.

if mintURL != "" {
if role == "" {
return fmt.Errorf("--role is required when using --mint-url")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] api-contract

MintRequest.Repos is set to []string{statusRepo} where statusRepo is owner/repo format. Verify that the mint endpoint expects full owner/repo strings in the Repos field rather than bare repo names.

Suggested fix: Confirm the mint API contract for the Repos field format and add a comment documenting the expected format.

Comment thread action.yml
@@ -405,17 +420,19 @@ runs:
JOB_STATUS: ${{ job.status }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] fail-open

The reconciliation step is gated on inputs.mint-url being non-empty. When mint-url is not provided, the entire status reconciliation is silently skipped with no warning.

Suggested fix: Add a warning annotation when reconciliation is skipped due to missing mint-url.

return gh.New(token)
}

func newReconcileStatusCmd() *cobra.Command {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] pattern-inconsistency

The deprecated --token flag is mapped to --status-token but the deprecation message references --mint-url as the replacement. Verify the migration path is documented.

Co-located tests (qf_* prefix) are now in source package directories.
The qf-tests/ directory contained non-compiling tests from the old pipeline.
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 8:33 AM UTC · Completed 8:53 AM UTC
Commit: 6ead6d0 · View workflow run →

@fullsend-ai-review fullsend-ai-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

fmt.Fprintf(os.Stderr, "WARNING: --token is deprecated; use --mint-url instead\n")
client = newForgeClient(token)
} else {
return fmt.Errorf("--mint-url or FULLSEND_MINT_URL required (--token is deprecated)")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[high] cli_behavior_breaking_change

GITHUB_TOKEN environment variable fallback for status reconciliation silently removed. Existing CI configurations relying on GITHUB_TOKEN for status updates will fail at runtime with no clear error message.

Suggested fix: Add a clear error message when neither mint-url nor a working token source is available, pointing to migration path. Consider a deprecation period.

Comment thread internal/cli/run.go
@@ -1882,10 +1888,34 @@ func setupStatusNotifier(fullsendDir string, sOpts statusOpts, printer *ui.Print
runID = fmt.Sprintf("%d", time.Now().UnixNano())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] nil-deref

ClientFactory set on statuscomment package but empty mintURL with no fallback leads to unclear error behavior rather than explicit early failure.

Suggested fix: Add explicit check for empty mintURL before setting client factory.

@@ -206,23 +228,26 @@ run_test "duplicate-self-reference-fails" \
"" \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] test-inadequacy

Test for prerequisites case covers happy path but not error conditions (failed issue creation, invalid allowlist, network failures).

Suggested fix: Add test cases for error paths in prerequisites handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants