Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Jan 25, 2026

Fixes the issue where Orchestrator mode tasks would stop prematurely and disappear from the task list after creating multiple subtasks in rapid succession.

Root Cause

Lock contention during rapid file writes when Orchestrator creates multiple subtasks. Each delegation involves concurrent writes to:

  • Parent task API conversation history
  • Child task API conversation history
  • Task history metadata

When lock acquisition failed or metadata persistence failed silently, parent tasks would appear to "disappear" from the UI.

Changes

  1. Increased lock resilience in safeWriteJson:

    • Lock staleness timeout: 31s → 60s (handles complex orchestrator scenarios)
    • Lock retries: 5 → 10 attempts
    • Backoff timeout range: 100-1000ms → 200-2000ms
    • Enhanced error messages with contention context
  2. Added retry logic for critical delegation metadata:

    • 3 retry attempts with exponential backoff for parent task status persistence
    • User-visible warnings when all retries fail (prevents silent data loss)
    • Detailed logging for debugging lock contention issues

Testing

  • All existing safeWriteJson tests pass (16/16)
  • All delegation tests pass (13/13)
  • No lint errors or type check failures

View task on Roo Code Cloud


Important

Enhances task persistence and lock handling in Orchestrator mode by adding retry logic and increasing lock resilience in ClineProvider.ts and safeWriteJson.ts.

  • Behavior:
    • Fixes premature stopping and disappearance of Orchestrator tasks after creating multiple subtasks.
    • Adds retry logic for parent task metadata persistence in ClineProvider.ts.
    • Increases lock resilience in safeWriteJson.ts with longer timeouts and more retries.
  • Lock Handling:
    • safeWriteJson.ts: Lock staleness timeout increased from 31s to 60s, retries from 5 to 10, and backoff timeout range from 100-1000ms to 200-2000ms.
    • Enhanced error messages for lock acquisition failures.
  • Retry Logic:
    • ClineProvider.ts: 3 retry attempts with exponential backoff for parent task status persistence.
    • Logs user-visible warnings and detailed debugging information when retries fail.
  • Testing:
    • All existing safeWriteJson tests pass (16/16).
    • All delegation tests pass (13/13).

This description was created by Ellipsis for 8a5eccb. You can customize this summary. It will automatically update as commits are pushed.

- Increase lock staleness timeout from 31s to 60s for complex orchestrator scenarios
- Increase lock retries from 5 to 10 with higher backoff timeouts (200ms-2s)
- Add retry logic (3 attempts) for critical parent task metadata persistence
- Add user-visible warnings when delegation metadata persistence fails
- Improve error messages with context about lock contention causes

This addresses issues where Orchestrator mode tasks would disappear from the task list after creating multiple subtasks in rapid succession, caused by lock contention during concurrent file writes.
@roomote
Copy link
Contributor Author

roomote bot commented Jan 25, 2026

Rooviewer Clock   See task on Roo Cloud

Review complete. No issues found.

The changes appropriately address lock contention during rapid orchestrator delegation by:

  • Increasing lock resilience parameters in safeWriteJson (staleness timeout, retries, backoff ranges)
  • Adding retry logic with exponential backoff for critical parent task metadata persistence
  • Providing user-visible warnings when all retries fail instead of silent failures

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Triage

Development

Successfully merging this pull request may close these issues.

2 participants