Cascading Fallback

Problem

Depending on a single LLM provider means a single point of failure. API outages, rate limits, billing issues, or model deprecations take your entire agent system offline.

Even without failures, using the most expensive model for every task wastes money. A simple "restart this service" doesn't need the same model as "architect a new feature."

Solution

Configure a chain of models from premium to economical to local. Route tasks to the cheapest model that can handle them, and fall back down the chain on failures.

Premium API (Claude Opus, GPT-4)
  ↓ (on failure or for simpler tasks)
Mid-tier API (Claude Sonnet, GPT-4o-mini)
  ↓ (on failure)
Local Model (Qwen 8B, Llama 8B)

Implementation

1. Define the Chain

fallback_chain:
  - provider: anthropic
    model: claude-opus-4
    use_for: complex reasoning, architecture, code review
  - provider: anthropic
    model: claude-sonnet-4
    use_for: routine tasks, simple fixes
  - provider: local
    model: qwen3-8b
    use_for: last resort, offline operation

2. Automatic Failover

When the primary model returns an error (429, 500, timeout), automatically retry with the next model in the chain:

Request → Model A → 429 Rate Limited
                  → Model B → Success ✓

3. Task-Based Routing

Not every task needs the top model. Route by complexity:

Task Type	Suggested Tier
Architecture decisions	Premium
Code review	Premium
Debugging from stack trace	Premium
Simple service restart	Mid-tier
Log grep and summary	Mid-tier
Status checks	Mid-tier or Local
Heartbeat polls	Mid-tier or Local

4. Monitor Fallback Frequency

Track how often each fallback level is triggered. High fallback rates indicate:

Primary provider reliability issues
Budget constraints hitting rate limits
Tasks being over-routed to the premium tier

Trade-offs

Quality variance: Lower-tier models produce lower-quality output. Tasks routed to fallback models may need human review.
Complexity: Managing multiple providers means multiple API keys, billing, and configuration.
Latency: Failover adds latency (failed request timeout + retry).

When to Skip

You have a single provider with excellent uptime and budget isn't a concern.
All tasks genuinely require the premium model's capabilities.
You're in a regulated environment where model consistency matters more than availability.

Related Patterns

Graceful Degradation — cascading fallback is one form of graceful degradation
Heartbeat Monitoring — heartbeats can use cheaper models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cascading Fallback

Problem

Solution

Implementation

1. Define the Chain

2. Automatic Failover

3. Task-Based Routing

4. Monitor Fallback Frequency

Trade-offs

When to Skip

Related Patterns

FilesExpand file tree

cascading-fallback.md

Latest commit

History

cascading-fallback.md

File metadata and controls

Cascading Fallback

Problem

Solution

Implementation

1. Define the Chain

2. Automatic Failover

3. Task-Based Routing

4. Monitor Fallback Frequency

Trade-offs

When to Skip

Related Patterns