Skip to content

bug: agent pool slots remain occupied after sessions terminate #121

@furquanuddin94

Description

@furquanuddin94

Description

Agent pools can remain at capacity after the worker or steward session occupying the slot has already terminated. The agent is idle and has no active session, but the pool still reports the agent in activeAgentIds, which blocks new work until pool status is manually refreshed.

Steps to reproduce

  1. Create an agent pool with maxSize: 1.
  2. Let the dispatch daemon start a worker or steward session that occupies the pool slot.
  3. Let that session terminate or be cleaned up as dead.
  4. Check the pool status and agent/session status.

Expected behavior

When the worker or steward session terminates, the pool slot should be released automatically. The pool should report activeCount: 0, availableSlots: 1, and no active agent IDs when no governed worker/steward sessions are running.

Actual behavior

The pool can remain full even though the occupying agent has no active session.

Observed API state:

{
  "pool": {
    "id": "el-rmw5",
    "config": { "name": "test", "maxSize": 1, "agentTypes": [], "enabled": true },
    "status": {
      "activeCount": 1,
      "availableSlots": 0,
      "activeByType": { "worker:ephemeral": 1 },
      "activeAgentIds": ["el-5d6n"]
    }
  }
}

At the same time:

  • GET /api/agents/el-5d6n/status returned hasActiveSession: false and activeSession: null.
  • GET /api/sessions?status=running showed no running worker or steward sessions, only the director session.
  • The latest el-5d6n session was terminated with terminationReason: "Process no longer alive (PID check)".

Calling POST /api/pools/refresh recomputed the status correctly when no governed sessions were running, which suggests the persisted session state is correct and the pool runtime status/cache is stale.

Operating system

macOS

Node version

20

Package version

1.25.0

Additional context

Code inspection found that AgentPoolService.onAgentSpawned() is called from dispatch paths, while AgentPoolService.onAgentSessionEnded() exists but does not appear to have call sites. Session cleanup paths such as dead-process cleanup mark agents idle and persist terminated session history, but the pool slot is not released automatically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions