Skip to content

Conversation

@kiranandcode
Copy link
Contributor

@kiranandcode kiranandcode commented Jan 21, 2026

This PR simplifies the implementation of RetryLLMHandler to instead use LiteLLM's built in retry mechanism.

class RetryLLMHandler(ObjectInterpretation):
    """Retries LLM requests if they fail.

    Args:
        num_retries: The maximum number of retries.
    """

    def __init__(
        self,
        num_retries: int = 3,
    ):
        self.num_retries = num_retries

    @implements(completion)
    def _completion(self, *args, **kwargs):
        return fwd(*args, **({"num_retries": self.num_retries} | kwargs))

We do not need to have specific handling for tool calls as this is already handled by call_too_with_json_args.

Closes #494

@eb8680
Copy link
Contributor

eb8680 commented Jan 22, 2026

I'm inclined to say we shouldn't have standalone handlers in the library that are just passing a particular kwarg to litellm.completion. We could just cover any such cases with a single generic handler (currently LiteLLMProvider) that forwards arbitrary kwargs to completion, and call it with appropriate arguments in user code. That way we could address #494 #493 #492 #496 simply by deleting the relevant library code.

@kiranandcode
Copy link
Contributor Author

@eb8680 that makes sense

@kiranandcode
Copy link
Contributor Author

kiranandcode commented Jan 26, 2026

@eb8680 , @datvo06 LiteLLM's retry only seems to be intended for the network errors that arise during LLM calls. Validation errors seem to be out of scope. The following fails.

class EngineState(enum.Enum):
    OFF = "off"
    WARMING_UP = "warming_up"
    READY = "ready"
    SHUTTING_DOWN = "shutting_down"

class EngineConfig(BaseModel):
    description: str
    state: EngineState

    @pydantic.model_validator(mode='after')
    def verify_self(self) -> Self:
        if self.state != EngineState.WARMING_UP:
            raise ValueError("The infinity engine is never ready, and always in a warming up state.")
        return self

@Template.define
def predict_engine_config(description: str) -> EngineConfig:
    """Given the description \"{description}\" of things I did,
    predict the configuration of the engine after I perform those
    tasks."""
    raise NotHandled

@requires_openai
def test_num_retries_allowed_for_provider(request):
    """Test that LiteLLMProvider works with `num_retries`."""
    description = "I insert my keys into the car, turn it. The car revs. I drive off into the distance."

    with handler(ReplayLiteLLMProvider(request, model_name="gpt-5-nano", num_retries=3)):
        config = predict_engine_config(description)
        print(config)

Given this, it might be worthwhile keeping RetryLLMHandler?

@eb8680
Copy link
Contributor

eb8680 commented Jan 26, 2026

No, I don't think RetryLLMHandler in its current form makes much sense because Template calls are the wrong unit of failure, see e.g. #495

I'm also not sure how we would implement a sensible version without an internal API like #484 in place first.

@kiranandcode
Copy link
Contributor Author

@eb8680 yep, both points makes sense. Moving this PR and the issue to blocked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider offloading LLM call retrying

3 participants