Replace wall-clock limit with activity-based idle timeout by dcellison · Pull Request #207 · dcellison/kai

dcellison · 2026-03-28T11:41:41Z

Summary

The wall-clock timer in claude.py killed interactions after timeout_seconds * 5 (10 min at defaults) of total elapsed time, regardless of whether Claude was actively working. Two failures on Mar 27 - both healthy interactions doing multi-step research with continuous tool-use output:

ERROR Interaction exceeded wall-clock limit (616s > 600s)
ERROR Interaction exceeded wall-clock limit (765s > 600s)

This replaces the fixed wall-clock with an activity-based idle timeout that resets every time the process emits a line of output. Same duration, different measurement: "time since last output" instead of "time since start."

What changed

claude.py: interaction_start becomes last_activity, reset on every non-empty readline. The idle check fires only after prolonged silence, not after prolonged work.
bot.py: _LOCK_ACQUIRE_TIMEOUT increased from 660s to 3600s. With no wall-clock cap, the lock timeout needs to accommodate legitimately long interactions. The idle timer in claude.py is the real safety net.
Error message: "timed out (too long)" becomes "timed out (no output)" to reflect the new semantics.

Timeout behavior after this change

Scenario	Readline timeout (6 min)	Idle timeout (10 min since last output)
Dead process (no output)	Fires at 6 min	Fires at 10 min (redundant backup)
Stuck tool loop (events flowing)	Never fires	Never fires (events reset it)
Legitimate long task (events flowing)	Never fires	Never fires (events reset it)

The "stuck tool loop" row changes from "caught" to "not caught." This is an acceptable tradeoff: true infinite tool loops are extremely rare, the user can always /stop, and killing legitimate long work is a worse failure mode.

Implementation note

With the current multipliers (readline=3x, idle=5x), the idle timeout can never fire independently in practice - the readline timeout always fires first for a truly silent process (3x < 5x). The idle timer is a "belt and suspenders" backup that would only matter if the multipliers were changed or if there were an edge case where readline keeps barely succeeding. The test_idle_timeout test uses a mocked time.monotonic in the kai.claude namespace to force the idle check to fire, which is the correct approach for verifying an error path that is dominated by another timeout in real execution.

Test plan

test_idle_timeout - mocked time forces idle check to fire, verifies error path and "no output" message
test_idle_timeout_normal_completion_unaffected - fast interaction completes without hitting idle timer
test_active_process_survives_past_old_wall_clock - core behavioral test: active process emits output for ~8s (past old 5s limit), completes successfully
test_readline_timeout (existing, unchanged) - fully dead process caught by per-readline timeout
test_timeout_kills_claude (bot.py, comment-only update) - lock timeout still works
Full suite: 1213 tests passing

Fixes #206

The wall-clock timer killed interactions after timeout_seconds * 5 of total elapsed time, regardless of whether Claude was actively working. This caused healthy long-running tasks (multi-step research, detailed planning) to be killed mid-work. Replace with an idle timeout that resets every time the process emits a line of output. Same duration (10 min at defaults), different measurement: time since last output instead of time since start. Increase bot lock acquire timeout from 660s to 3600s since interactions can now legitimately run much longer than 10 minutes. Fixes #206

dcellison · 2026-03-28T11:42:57Z

Review by Kai

PR Review: Replace wall-clock limit with activity-based idle timeout

The core logic is sound and the motivation is well-documented. A few findings:

Warning: Infinite tool loops now run uncapped for up to 1 hour

bot.py, _LOCK_ACQUIRE_TIMEOUT change

The PR explicitly acknowledges this tradeoff, but it's worth surfacing in review: a process stuck in a tool loop emitting continuous output was previously killed after 10 minutes. It now runs until the 1-hour lock timeout fires. That's a 6× regression in worst-case blast radius for that failure mode. The PR's reasoning ("legitimate long tasks are more common") is defensible, but there's no intermediate backstop between "idle 10 min" and "lock 60 min" for the active-but-stuck case.

Warning: `fake_monotonic` is fragile — tied to exact call count

tests/test_claude.py, test_idle_timeout, lines ~880–895

def fake_monotonic():
    call_count[0] += 1
    if call_count[0] <= 5:
        return call_count[0] * 0.1
    return 100.0

The threshold of <= 5 is load-bearing but implicit. It depends on exactly how many times time.monotonic() is called in kai.claude before the idle check fires on iteration 3: 1 (init) + 2 (check + reset, iter 1) + 2 (check + reset, iter 2) = 5. Any refactor that adds or removes a time.monotonic() call in the hot path silently breaks this test — it would either never fire (if count threshold is too low) or fire too early. A more robust approach would key the jump off readline_count rather than an opaque call count.

Suggestion: `test_active_process_survives_past_old_wall_clock` is slow by design (~8s)

tests/test_claude.py, ~line 965

20 iterations × 0.4s = 8 real seconds. This is inherent to testing real-time behavior and there's no better way to do it without mocking, but it's worth flagging for CI budget awareness. The comment says "total time ~8s, well past the 5s limit" — accurate, but the test itself doesn't assert on timing, so it would still pass if someone changed asyncio.sleep(0.4) to 0.04.

Clean

The last_activity reset placement (after readline, only on if line) is correct.
interaction_start being renamed to last_activity throughout is consistent.
Error message update ("too long" → "no output") accurately reflects semantics.
The test_idle_timeout_normal_completion_unaffected and test_readline_timeout tests are unaffected, providing good regression coverage.
No security issues.

Use mocked time.monotonic for both test_idle_timeout and test_active_process_survives_past_old_wall_clock. No test should ever wait for a real timeout to fire.

dcellison · 2026-03-28T12:04:17Z

Review by Kai

The prior review's findings are still open (fragile mono_call threshold, infinite tool loop blast radius), but neither has materially changed. One new finding:

Suggestion: `readline_with_output` silently absorbs "unreachable" path

tests/test_claude.py, test_idle_timeout, ~line 908

    # Should not reach here - idle timeout fires first
    return _assistant_event("unreachable")

If the call count threshold in fake_monotonic drifts (the already-flagged fragility) and the idle timer stops firing before readline 3, this branch executes and the test continues — returning a valid event instead of failing. The comment signals intent but provides no enforcement. A defensive assertion would make the failure visible:

    pytest.fail("readline reached call 3 — idle timeout did not fire as expected")

Clean

last_activity reset placement in claude.py:618 is correct: fires on if line:, before the if not line: EOF branch, and the two are mutually exclusive — no logic error.
advancing_monotonic in test_active_process_survives_past_old_wall_clock correctly shares sim_time with active_readline. Time advances only during readline calls, so consecutive time.monotonic() calls within the same iteration return the same value — this is correct and the idle timer never fires spuriously.
test_active_process_survives_past_old_wall_clock now uses mocked time rather than asyncio.sleep(0.4) — the prior review's CI budget concern is resolved.
Error message change ("too long" → "no output") and bot.py comment update are accurate.
No security issues.

dcellison · 2026-03-28T12:08:48Z

Review by Kai

PR Review: Replace wall-clock limit with activity-based idle timeout

Clean

The pytest.fail fix requested in the previous review is properly implemented in readline_with_output (tests/test_claude.py, readline 3 branch). If fake_monotonic drifts, the test now fails loudly instead of silently passing with a valid event.

The advancing_monotonic approach in test_active_process_survives_past_old_wall_clock is sound. sim_time[0] advances only inside active_readline, so at each loop iteration the idle check sees sim_time[0] - last_activity == 0.0 — the timer never fires spuriously. The side_effect=advancing_monotonic wiring is correct (time.monotonic takes no args, neither does the mock function).

last_activity reset placement in claude.py:618 is correct: fires on if line:, mutually exclusive with the if not line: EOF branch, not reachable from the asyncio.TimeoutError path.

Remaining open items (from prior reviews, no material change)

mono_call[0] <= 5 threshold — still tied to exact call count in kai.claude. The added comment documenting the call sequence mitigates the maintenance risk but doesn't eliminate it.
Active-but-stuck blast radius — still 6× larger (10 min → 60 min). The acknowledged tradeoff.

Both were raised previously and the author has made their decisions. No new issues found.

…naffected

dcellison · 2026-03-28T12:12:58Z

Review by Kai

PR Review: Replace wall-clock limit with activity-based idle timeout

Clean. No new issues found.

Verification of open items from prior reviews

mono_call[0] <= 5 fragility — still present, now has a doc comment explaining the exact call sequence. No material change; author's accepted tradeoff.

Active-but-stuck blast radius — still 6× larger (10 min → 60 min). No material change; author's accepted tradeoff.

New code reviewed

claude.py:618 — last_activity reset placement

if line: / if not line: are mutually exclusive. Reset fires only on non-empty output, not on EOF, not on asyncio.TimeoutError. Correct.

test_idle_timeout — fake_monotonic + readline_with_output

Call sequence matches the documented pattern (1 init + 2 check/reset per active iteration = 5 before the jump). pytest.fail on readline 3 is in place. side_effect=fake_monotonic wiring is correct — mock passes no args to the callable, matching def fake_monotonic():.

test_active_process_survives_past_old_wall_clock

advancing_monotonic reads sim_time[0] which only advances inside active_readline, so the idle check always sees sim_time[0] - last_activity == 0.0 at each iteration. The timer never fires spuriously. After call 21 returns _result_event, sim_time[0] stays at 10.0 (no += 0.5 for the terminal event), last_activity is reset to 10.0, and the loop exits normally. Correct.

bot.py comment — accurately describes the new semantics. No logic change beyond the constant.

Fix tests: mock time instead of using real sleeps

110848c

Use mocked time.monotonic for both test_idle_timeout and test_active_process_survives_past_old_wall_clock. No test should ever wait for a real timeout to fire.

Replace unreachable return with pytest.fail for drift detection

8bb640e

Fix stale wall-clock comment in test_idle_timeout_normal_completion_u…

1b02ccc

…naffected

dcellison merged commit 86cc574 into main Mar 28, 2026
1 check passed

dcellison deleted the fix/activity-idle-timeout branch March 28, 2026 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace wall-clock limit with activity-based idle timeout#207

Replace wall-clock limit with activity-based idle timeout#207
dcellison merged 4 commits intomainfrom
fix/activity-idle-timeout

dcellison commented Mar 28, 2026

Uh oh!

dcellison commented Mar 28, 2026

Uh oh!

dcellison commented Mar 28, 2026

Uh oh!

dcellison commented Mar 28, 2026

Uh oh!

dcellison commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dcellison commented Mar 28, 2026

Summary

What changed

Timeout behavior after this change

Implementation note

Test plan

Uh oh!

dcellison commented Mar 28, 2026

Review by Kai

PR Review: Replace wall-clock limit with activity-based idle timeout

Warning: Infinite tool loops now run uncapped for up to 1 hour

Warning: fake_monotonic is fragile — tied to exact call count

Suggestion: test_active_process_survives_past_old_wall_clock is slow by design (~8s)

Clean

Uh oh!

dcellison commented Mar 28, 2026

Review by Kai

Suggestion: readline_with_output silently absorbs "unreachable" path

Clean

Uh oh!

dcellison commented Mar 28, 2026

Review by Kai

PR Review: Replace wall-clock limit with activity-based idle timeout

Clean

Remaining open items (from prior reviews, no material change)

Uh oh!

dcellison commented Mar 28, 2026

Review by Kai

PR Review: Replace wall-clock limit with activity-based idle timeout

Verification of open items from prior reviews

New code reviewed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Warning: `fake_monotonic` is fragile — tied to exact call count

Suggestion: `test_active_process_survives_past_old_wall_clock` is slow by design (~8s)

Suggestion: `readline_with_output` silently absorbs "unreachable" path