Skip to content

[P1][Reliability] Persist task state for crash recovery #15

@aWN4Y25pa2EK

Description

@aWN4Y25pa2EK

Summary

TaskManager is in-memory only. If process crashes mid-execution, task state is lost and tasks get stuck in limbo.

Source

Adversarial Security Assessment

Location

  • impl/mvp/src/tasks/manager.ts:78-130
  • impl/mvp/src/holons/root.ts:461-492

Problems

  • Process crash → task orphaned (not DONE, not FAILED)
  • No recovery mechanism on restart
  • Manual cleanup required
  • Resource leaks (Linear sub-issues created but task "lost")

Impact

  • Tasks stuck indefinitely after crash
  • Inconsistent state between Linear and HCA
  • Manual intervention required
  • Resource leaks

Severity

High | Likelihood: High (processes crash: OOM, deployments, exceptions)

Recommended Fix

  1. Persist task state to Redis/database
  2. Implement recovery on startup
  3. Add TTL to orphaned tasks
  4. Log state transitions for debugging
class PersistentTaskManager extends TaskManager {
  constructor(private redis: Redis) { super(); }
  
  async createTask(opts: CreateTaskOptions): Promise<Task> {
    const task = super.createTask(opts);
    await this.redis.hset(`task:${task.id}`, task);
    return task;
  }
  
  async recoverOrphanedTasks(): Promise<void> {
    const keys = await this.redis.keys('task:*');
    for (const key of keys) {
      const task = await this.redis.hgetall(key);
      if (task.state.status === 'PLANNING' || task.state.status === 'WORKING') {
        // Recover or fail orphaned tasks
        await this.failTask(task.id, 'Recovered after crash');
      }
    }
  }
}

Generated by HCA Architecture Assessment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions