Skip to content

[[fallback_providers]] in config.toml not used as runtime fallback when primary provider goes offline #1003

@michael1coding

Description

@michael1coding

Description

When a local provider (e.g. LM Studio) is configured as [default_model] and one or more
[[fallback_providers]] (e.g. Gemini, OpenRouter) are defined in config.toml, agents do not fall over to the
cloud fallbacks when the local provider goes offline. Instead they keep retrying the offline endpoint until
the request fails.

Expected Behavior

The agent falls over to Gemini and responds successfully.

Actual Behavior
The agent keeps retrying LM Studio, then returns an error. Gemini is never called.

Steps to Reproduce

  1. Configure config.toml with a local LM Studio instance as [default_model] and Gemini as
    [[fallback_providers]]:
    [default_model]
    provider = "lmstudio"
    model = "google/gemma-4-26b-a4b"
    base_url = "http://192.168.1.1:1234/v1"

[[fallback_providers]]
provider = "gemini"
model = "gemini-2.5-pro"
api_key_env = "GEMINI_API_KEY"

  1. Start the daemon, verify agents respond normally via LM Studio.
  2. Shut down LM Studio.
  3. Send a message to any agent.

OpenFang Version

0.5.5

Operating System

Linux (x86_64)

Logs / Screenshots

Root Cause

[[fallback_providers]] are added to self.default_driver at boot time, which is a FallbackDriver chain.
However, resolve_driver() in kernel.rs creates a fresh primary driver for each agent call (an HTTP client
that always succeeds at creation time), and only returns default_driver if this fresh creation fails — which
never happens for HTTP-based providers like LM Studio.

At runtime, when LM Studio is unreachable, the error is classified as Timeout / is_retryable = true, causing
retries of the same provider. The fallback chain in default_driver is never reached.

Specifically, in kernel.rs resolve_driver():

// This branch is never taken for HTTP providers — create_driver() always succeeds
Err(e) => {
if agent_provider == default_provider ... {
Arc::clone(&self.default_driver) // ← FallbackDriver with Gemini
}
}
// ...
Ok(primary) // ← always returns bare LM Studio driver, no fallbacks wired in

Fix

When an agent uses the default provider with no custom overrides and no per-agent [[fallback_models]], wire
the global [[fallback_providers]] into the agent's driver chain at resolution time, so any runtime failure
(not just init failure) triggers the fallback.

Added after the existing fallback_models block in resolve_driver():

let uses_global_defaults = agent_provider == default_provider
&& !has_custom_key
&& !has_custom_url
&& !self.config.fallback_providers.is_empty()
&& !Arc::ptr_eq(&primary, &self.default_driver);
if uses_global_defaults {
let mut chain = vec![(primary, String::new())];
for fb in &self.config.fallback_providers {
if fb.provider == *agent_provider { continue; } // skip duplicate primary
// ... create driver, push to chain
}
if chain.len() > 1 {
return Ok(Arc::new(FallbackDriver::with_models(chain)));
}
}

Notes

  • Agents with explicit [[fallback_models]] in their manifest are unaffected — those already create a
    FallbackDriver correctly.
  • The duplicate-provider filter (fb.provider == *agent_provider) avoids adding a second unreachable LM Studio
    entry to the chain, which would otherwise happen when [[fallback_providers]] includes the same provider as
    [default_model] (a common configuration pattern).
  • The Arc::ptr_eq guard prevents double-wrapping in the rare case where fresh driver creation failed and
    primary is already default_driver.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions