Skip to content

ML Orchestration: live stage progress, graceful degradation, and timeout fix#321

Closed
Copilot wants to merge 4 commits into
developfrom
copilot/ml-orchestration-resource-management
Closed

ML Orchestration: live stage progress, graceful degradation, and timeout fix#321
Copilot wants to merge 4 commits into
developfrom
copilot/ml-orchestration-resource-management

Conversation

Copilot AI commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

The analysis pipeline used a mock-style 30-second process timeout and swallowed only FileNotFoundError/ValueError, leaving real ML inference jobs killed mid-run and unexpected runtime errors uncaught. The workspace UI showed no stage-level progress during analysis.

Changes

Rust — timeout (src-tauri/src/main.rs)

  • ANALYSIS_PROCESS_TIMEOUT raised 30 s → 600 s to match the engine's max audio duration (15 min / 900 s).

Python — graceful degradation (api.py)

  • except (FileNotFoundError, ValueError)except Exception in run_analysis_job_updates so MemoryError, RuntimeError (GPU OOM, etc.) also return a typed engine_unavailable envelope instead of crashing the orchestrator.
  • Added logger.exception(...) before envelope conversion to preserve full traceback for debugging.
try:
    audio_features = _build_local_audio_features(request)
except Exception as error:
    logger.exception("Stem separation failed for job %s: %s", job_id, error)
    updates.append(_build_job_status(..., error={"code": "engine_unavailable", ...}))
    return updates

React UI — live stage progress (WorkspaceStates.tsx, App.tsx)

  • LoadingState gains optional progressLabel and progressPercent props. When the engine emits a stage update (e.g. "Separating stems… (45%)"), the workspace card renders the label and an accessible <Progress> bar; falls back to the static pulse text when absent.
  • renderWorkspaceState() threads jobStatus?.progressLabel / jobStatus?.progressPercent through to LoadingState.

Tests

  • Fixed a broken assertion (getByTextgetAllByText) caused by "Queued for analysis" now appearing in both the status badge and the workspace card simultaneously.
  • Added frontend test covering the progressMessage fallback switch (state-derived label when engine omits progressLabel).
  • Added Python test asserting RuntimeError from the separator returns a typed failure envelope (not an uncaught exception).

Copilot AI requested review from Copilot and removed request for Copilot June 16, 2026 14:21
Copilot AI requested review from Copilot and removed request for Copilot June 16, 2026 14:26
Copilot AI changed the title [WIP] Transition analysis API to asynchronous inference orchestrator ML Orchestration: live stage progress, graceful degradation, and timeout fix Jun 16, 2026
Copilot AI requested a review from seonghobae June 16, 2026 14:27
@seonghobae

Copy link
Copy Markdown
Collaborator

Closing as superseded by #323, which merged the selected async analysis orchestration path with push progress, app-owned cache/temp handling, process-enforced stem timeout fallback, and Security Notes for the IPC/subprocess/cache boundary. Keeping #109 tied to #323.

@seonghobae seonghobae closed this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants