Skip to content

Replace wall-clock limit with activity-based idle timeout#207

Merged
dcellison merged 4 commits intomainfrom
fix/activity-idle-timeout
Mar 28, 2026
Merged

Replace wall-clock limit with activity-based idle timeout#207
dcellison merged 4 commits intomainfrom
fix/activity-idle-timeout

Conversation

@dcellison
Copy link
Copy Markdown
Owner

Summary

The wall-clock timer in claude.py killed interactions after timeout_seconds * 5 (10 min at defaults) of total elapsed time, regardless of whether Claude was actively working. Two failures on Mar 27 - both healthy interactions doing multi-step research with continuous tool-use output:

ERROR Interaction exceeded wall-clock limit (616s > 600s)
ERROR Interaction exceeded wall-clock limit (765s > 600s)

This replaces the fixed wall-clock with an activity-based idle timeout that resets every time the process emits a line of output. Same duration, different measurement: "time since last output" instead of "time since start."

What changed

  • claude.py: interaction_start becomes last_activity, reset on every non-empty readline. The idle check fires only after prolonged silence, not after prolonged work.
  • bot.py: _LOCK_ACQUIRE_TIMEOUT increased from 660s to 3600s. With no wall-clock cap, the lock timeout needs to accommodate legitimately long interactions. The idle timer in claude.py is the real safety net.
  • Error message: "timed out (too long)" becomes "timed out (no output)" to reflect the new semantics.

Timeout behavior after this change

Scenario Readline timeout (6 min) Idle timeout (10 min since last output)
Dead process (no output) Fires at 6 min Fires at 10 min (redundant backup)
Stuck tool loop (events flowing) Never fires Never fires (events reset it)
Legitimate long task (events flowing) Never fires Never fires (events reset it)

The "stuck tool loop" row changes from "caught" to "not caught." This is an acceptable tradeoff: true infinite tool loops are extremely rare, the user can always /stop, and killing legitimate long work is a worse failure mode.

Implementation note

With the current multipliers (readline=3x, idle=5x), the idle timeout can never fire independently in practice - the readline timeout always fires first for a truly silent process (3x < 5x). The idle timer is a "belt and suspenders" backup that would only matter if the multipliers were changed or if there were an edge case where readline keeps barely succeeding. The test_idle_timeout test uses a mocked time.monotonic in the kai.claude namespace to force the idle check to fire, which is the correct approach for verifying an error path that is dominated by another timeout in real execution.

Test plan

  • test_idle_timeout - mocked time forces idle check to fire, verifies error path and "no output" message
  • test_idle_timeout_normal_completion_unaffected - fast interaction completes without hitting idle timer
  • test_active_process_survives_past_old_wall_clock - core behavioral test: active process emits output for ~8s (past old 5s limit), completes successfully
  • test_readline_timeout (existing, unchanged) - fully dead process caught by per-readline timeout
  • test_timeout_kills_claude (bot.py, comment-only update) - lock timeout still works
  • Full suite: 1213 tests passing

Fixes #206

The wall-clock timer killed interactions after timeout_seconds * 5 of
total elapsed time, regardless of whether Claude was actively working.
This caused healthy long-running tasks (multi-step research, detailed
planning) to be killed mid-work.

Replace with an idle timeout that resets every time the process emits
a line of output. Same duration (10 min at defaults), different
measurement: time since last output instead of time since start.

Increase bot lock acquire timeout from 660s to 3600s since interactions
can now legitimately run much longer than 10 minutes.

Fixes #206
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: Replace wall-clock limit with activity-based idle timeout

The core logic is sound and the motivation is well-documented. A few findings:


Warning: Infinite tool loops now run uncapped for up to 1 hour

bot.py, _LOCK_ACQUIRE_TIMEOUT change

The PR explicitly acknowledges this tradeoff, but it's worth surfacing in review: a process stuck in a tool loop emitting continuous output was previously killed after 10 minutes. It now runs until the 1-hour lock timeout fires. That's a 6× regression in worst-case blast radius for that failure mode. The PR's reasoning ("legitimate long tasks are more common") is defensible, but there's no intermediate backstop between "idle 10 min" and "lock 60 min" for the active-but-stuck case.


Warning: fake_monotonic is fragile — tied to exact call count

tests/test_claude.py, test_idle_timeout, lines ~880–895

def fake_monotonic():
    call_count[0] += 1
    if call_count[0] <= 5:
        return call_count[0] * 0.1
    return 100.0

The threshold of <= 5 is load-bearing but implicit. It depends on exactly how many times time.monotonic() is called in kai.claude before the idle check fires on iteration 3: 1 (init) + 2 (check + reset, iter 1) + 2 (check + reset, iter 2) = 5. Any refactor that adds or removes a time.monotonic() call in the hot path silently breaks this test — it would either never fire (if count threshold is too low) or fire too early. A more robust approach would key the jump off readline_count rather than an opaque call count.


Suggestion: test_active_process_survives_past_old_wall_clock is slow by design (~8s)

tests/test_claude.py, ~line 965

20 iterations × 0.4s = 8 real seconds. This is inherent to testing real-time behavior and there's no better way to do it without mocking, but it's worth flagging for CI budget awareness. The comment says "total time ~8s, well past the 5s limit" — accurate, but the test itself doesn't assert on timing, so it would still pass if someone changed asyncio.sleep(0.4) to 0.04.


Clean

  • The last_activity reset placement (after readline, only on if line) is correct.
  • interaction_start being renamed to last_activity throughout is consistent.
  • Error message update ("too long" → "no output") accurately reflects semantics.
  • The test_idle_timeout_normal_completion_unaffected and test_readline_timeout tests are unaffected, providing good regression coverage.
  • No security issues.

Use mocked time.monotonic for both test_idle_timeout and
test_active_process_survives_past_old_wall_clock. No test should
ever wait for a real timeout to fire.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

The prior review's findings are still open (fragile mono_call threshold, infinite tool loop blast radius), but neither has materially changed. One new finding:


Suggestion: readline_with_output silently absorbs "unreachable" path

tests/test_claude.py, test_idle_timeout, ~line 908

    # Should not reach here - idle timeout fires first
    return _assistant_event("unreachable")

If the call count threshold in fake_monotonic drifts (the already-flagged fragility) and the idle timer stops firing before readline 3, this branch executes and the test continues — returning a valid event instead of failing. The comment signals intent but provides no enforcement. A defensive assertion would make the failure visible:

    pytest.fail("readline reached call 3 — idle timeout did not fire as expected")

Clean

  • last_activity reset placement in claude.py:618 is correct: fires on if line:, before the if not line: EOF branch, and the two are mutually exclusive — no logic error.
  • advancing_monotonic in test_active_process_survives_past_old_wall_clock correctly shares sim_time with active_readline. Time advances only during readline calls, so consecutive time.monotonic() calls within the same iteration return the same value — this is correct and the idle timer never fires spuriously.
  • test_active_process_survives_past_old_wall_clock now uses mocked time rather than asyncio.sleep(0.4) — the prior review's CI budget concern is resolved.
  • Error message change ("too long""no output") and bot.py comment update are accurate.
  • No security issues.

@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: Replace wall-clock limit with activity-based idle timeout

Clean

The pytest.fail fix requested in the previous review is properly implemented in readline_with_output (tests/test_claude.py, readline 3 branch). If fake_monotonic drifts, the test now fails loudly instead of silently passing with a valid event.

The advancing_monotonic approach in test_active_process_survives_past_old_wall_clock is sound. sim_time[0] advances only inside active_readline, so at each loop iteration the idle check sees sim_time[0] - last_activity == 0.0 — the timer never fires spuriously. The side_effect=advancing_monotonic wiring is correct (time.monotonic takes no args, neither does the mock function).

last_activity reset placement in claude.py:618 is correct: fires on if line:, mutually exclusive with the if not line: EOF branch, not reachable from the asyncio.TimeoutError path.


Remaining open items (from prior reviews, no material change)

  • mono_call[0] <= 5 threshold — still tied to exact call count in kai.claude. The added comment documenting the call sequence mitigates the maintenance risk but doesn't eliminate it.
  • Active-but-stuck blast radius — still 6× larger (10 min → 60 min). The acknowledged tradeoff.

Both were raised previously and the author has made their decisions. No new issues found.

@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: Replace wall-clock limit with activity-based idle timeout

Clean. No new issues found.


Verification of open items from prior reviews

mono_call[0] <= 5 fragility — still present, now has a doc comment explaining the exact call sequence. No material change; author's accepted tradeoff.

Active-but-stuck blast radius — still 6× larger (10 min → 60 min). No material change; author's accepted tradeoff.


New code reviewed

claude.py:618last_activity reset placement

if line: / if not line: are mutually exclusive. Reset fires only on non-empty output, not on EOF, not on asyncio.TimeoutError. Correct.

test_idle_timeoutfake_monotonic + readline_with_output

Call sequence matches the documented pattern (1 init + 2 check/reset per active iteration = 5 before the jump). pytest.fail on readline 3 is in place. side_effect=fake_monotonic wiring is correct — mock passes no args to the callable, matching def fake_monotonic():.

test_active_process_survives_past_old_wall_clock

advancing_monotonic reads sim_time[0] which only advances inside active_readline, so the idle check always sees sim_time[0] - last_activity == 0.0 at each iteration. The timer never fires spuriously. After call 21 returns _result_event, sim_time[0] stays at 10.0 (no += 0.5 for the terminal event), last_activity is reset to 10.0, and the loop exits normally. Correct.

bot.py comment — accurately describes the new semantics. No logic change beyond the constant.

@dcellison dcellison merged commit 86cc574 into main Mar 28, 2026
1 check passed
@dcellison dcellison deleted the fix/activity-idle-timeout branch March 28, 2026 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace wall-clock limit with activity-based idle timeout

2 participants