Skip to content

fix: Graceful 429 rate-limit retry with countdown#8

Open
vrinek wants to merge 3 commits intoShinMegamiBoson:mainfrom
vrinek:fix/429-rate-limit-retry
Open

fix: Graceful 429 rate-limit retry with countdown#8
vrinek wants to merge 3 commits intoShinMegamiBoson:mainfrom
vrinek:fix/429-rate-limit-retry

Conversation

@vrinek
Copy link

@vrinek vrinek commented Feb 20, 2026

Summary

  • When the API returns HTTP 429, the app now retries up to 5 times honoring the Retry-After header instead of crashing
  • A visible per-second countdown ([d0/s6] Rate limited (attempt 1/5). Retrying in 20s...) appears in the TUI trace so the user knows what's happening
  • Both _http_stream_sse (streaming LLM calls) and _http_json (model listing) are covered
  • Non-429 HTTP errors still raise immediately; connection-timeout retries remain independent

Changes

File What changed
agent/model.py Added _parse_retry_after(), _notify_retry(), _sleep_with_countdown() helpers. Added outer 429-retry loop to _http_stream_sse() and _http_json(). Added on_retry field to both model classes. Error bodies truncated to 8KB.
agent/engine.py Wire model.on_retry at all depths (not depth-gated like on_content_delta). Clear in finally block. Messages include [d/s] prefix.
tests/test_rate_limit.py 22 new tests: helpers, transport 429 logic, engine integration.
tests/conftest.py Updated mock helper signatures for new params.
tests/test_model_complex.py Updated fake_stream_sse signatures for new params.

Test plan

  • 22 new tests covering all acceptance criteria
  • Full suite: 436 passed, 22 skipped, 0 failed
  • Manual verification: confirmed countdown appears in TUI during real 429 from Anthropic API

Known limitations (v1)

  • Parallel workers retry independently (no jitter or coordination)
  • _ThinkingDisplay spinner continues alongside countdown messages
  • HTTP 529 (overloaded_error) not handled (architecture supports trivial addition)
  • Worst-case hang: 5 retries × 120s cap ≈ 10 min per call (no total wall-clock timeout)

When the Anthropic API returns HTTP 429, the app now retries up to 5
times honoring the Retry-After header, with a visible per-second
countdown via on_retry callback. Both _http_stream_sse (streaming) and
_http_json (model listing) are covered. Non-429 errors still raise
immediately. Connection-timeout retries remain independent.
Verify that 429 retry messages reach the engine's on_event callback
end-to-end, and that model.on_retry is cleared after complete() returns.
Retry countdown messages now include the [d0/s6] prefix matching
all other engine trace lines, for consistent TUI output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant