Skip to content

feat(ext-api): phase stop endpoint (#1290)#1300

Merged
zbnerd merged 21 commits into
developfrom
feature/issue-1290-stop-endpoint
Jun 18, 2026
Merged

feat(ext-api): phase stop endpoint (#1290)#1300
zbnerd merged 21 commits into
developfrom
feature/issue-1290-stop-endpoint

Conversation

@zbnerd

@zbnerd zbnerd commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds POST /api/internal/stop/phase/{phaseName} to gracefully halt an in-flight ext-api phase at its chunk/page boundary
  • Phase transitions to STOPPED (new terminal value) on stop; slot record persists, next acquire overwrites
  • Cross-phase safe: stop on phase A does not affect phase B
  • Per-phase AtomicBoolean map (PhaseStopSignal) checked at every batch/page boundary
  • PhaseStoppedException thrown by phase beans, caught specifically in scheduler whenComplete handlers

Spec

  • docs/superpowers/specs/2026-06-18-issue-1290-phase-stop-endpoint-design.md
  • docs/superpowers/plans/2026-06-18-issue-1290-phase-stop-endpoint.md

Components

  • PhaseStopSignal — ConcurrentHashMap<PipelinePhase, AtomicBoolean> with CAS requestStop
  • PhaseStoppedException — control-flow signal (not BaseException; not an error)
  • PipelinePhase.STOPPED — new terminal enum value
  • RunStatus.isTerminal — extended to include STOPPED
  • RunStatusTracker.stopRun — terminal-but-retainable (slot persists, next acquire overwrites)
  • Each of 4 phase beans — injects signal, checks at chunk/page boundary, throws PhaseStoppedException
  • ExternalApiScheduler.requestPhaseStop — sets the flag if slot has non-terminal run
  • ExternalApiScheduler.runXxxPhase whenComplete — 3 branches: STOPPED → stopRun + clear, exception → failRun + release + clear, success → completeRun + clear
  • InternalApiController.stopPhase — POST endpoint (202 STOP_REQUESTED, 200 NOT_RUNNING, 400 INVALID_PHASE)

Acceptance criteria

  • POST /api/internal/stop/phase/ITEM_EQUIPMENT halts within one chunk boundary (≤30s) — BatchFetchSupport loop-top check
  • Same endpoint works for each phase
  • Stopped phase shows phase=STOPPED in /run-status, terminal
  • Not-running returns 200 NOT_RUNNING (not 404/409)
  • No half-written chunks in MinIO (ChunkedSnapshotSink atomic temp-file move)
  • A new phase can be triggered immediately after a stop (terminal-overwrite CAS)
  • Existing daily unaffected by stop on different phase (per-phase flag)

Test plan

🤖 Generated with Claude Code

zbnerd and others added 18 commits June 18, 2026 07:12
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ponse

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The default value '= PipelinePhase.IDLE' added in 2221e23 defeated the
spec's intended compile-failure handshake between T5 and T8/T9. Removing
it so T8/T9 must add 'phase = PipelinePhase.XXX' to existing call sites
before the build is restored.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ase ctor

Follows the same fix applied in T6 (RankingFetchPhase) — T7 missed
updating this test file when adding the stopSignal ctor param.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The check was in processPages, but execute() reads chunkingProperties
before invoking processPages — NPE on mocked configFor fired before
the stop check could trip.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dling

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Thread.sleep(5) is a no-op with Clock.fixed — Instant.now(clock) returns
the same value every call. The two ordering-dependent tests relied on
distinct startedAt timestamps but with fixed clock the values were
identical, exposing hash-iteration-order dependence. Use a per-test
advancing Clock for these two tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4cdb650579

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +56 to +58
if (stopSignal.isStopRequested(PipelinePhase.RANKING_FETCH)) {
throw PhaseStoppedException(PipelinePhase.RANKING_FETCH)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Poll the stop flag inside ranking page processing

When /api/internal/stop/phase/RANKING_FETCH is called after execute has already started, requestPhaseStop sets the flag but this phase only checks it once before any page work begins. processPages then recursively fetches all remaining pages without consulting stopSignal, so the stop endpoint can return 202 while the current ranking run continues to completion and is marked COMPLETED; polling at each page boundary, like the other phases do at batch boundaries, is needed.

Useful? React with 👍 / 👎.

Comment on lines +365 to +367
val hadNonTerminal = runStatusTracker.hasNonTerminalRun(phase) != null
if (hadNonTerminal) {
stopSignal.requestStop(phase)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid leaving stale stop flags after completion races

If a stop request races with normal phase completion, hasNonTerminalRun can observe the run before the phase's whenComplete clears the signal, then requestStop sets the flag after that clear has already happened. That stale flag makes later stop calls report STOP_REQUESTED even with no running slot, and the next run of the same phase can stop immediately before doing work; recheck the slot after setting the flag or bind the signal to the observed runId.

Useful? React with 👍 / 👎.

zbnerd and others added 3 commits June 18, 2026 09:01
Consolidate 4 near-identical whenComplete blocks (runRankingPhase,
runOcidPhase, runCharBasicPhase, runItemEquipmentPhase) into a single
private helper. Behavior unchanged: stop/fail/success branches with
per-phase metrics drain and signal clear in all branches.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…abasic

ExternalApiSchedulerStopTest previously exercised only runItemEquipmentPhase's
stop/fail/success branches. After extracting handlePhaseTerminal helper, all
4 phases share the same logic, but per-phase dispatch (enum + label) is still
per-method. Add thin regression tests for the other 3 phases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The isStopRequested fallback was defensive but unreachable — flag is
only set inside the hadNonTerminal branch and cleared in all 3
whenComplete branches, so hadNonTerminal=false implies isStopRequested=false.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@zbnerd zbnerd merged commit a8653e0 into develop Jun 18, 2026
@zbnerd zbnerd deleted the feature/issue-1290-stop-endpoint branch June 18, 2026 07:21
@zbnerd

zbnerd commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant