Skip to content

fix: add configurable HTTP timeout for LLM API calls#238

Open
zy84338719 wants to merge 2 commits into
alibaba:mainfrom
zy84338719:fix/llm-http-timeout
Open

fix: add configurable HTTP timeout for LLM API calls#238
zy84338719 wants to merge 2 commits into
alibaba:mainfrom
zy84338719:fix/llm-http-timeout

Conversation

@zy84338719

Copy link
Copy Markdown

Problem

The LLM HTTP client has no user-configurable request timeout. When the LLM API becomes unresponsive (rate limiting, network issues, server overload), the ocr scan process hangs indefinitely with 0% CPU, holding an ESTABLISHED TCP connection that never completes.

Reproduction: Run ocr scan against an LLM API that occasionally hangs. The process blocks forever until manually killed.

Root cause in internal/llm/client.go:195: NewLLMClient creates ClientConfig without setting Timeout (defaults to 0). While NewOpenAIClient/NewAnthropicClient have a 5-minute fallback, there is no way for users to configure this.

Fixes #237

Changes

  • Add Timeout field to ResolvedEndpoint struct
  • Add OCR_LLM_TIMEOUT environment variable (value in seconds)
  • Add timeout_sec field to config.json llm and provider sections
  • Pass timeout from ResolvedEndpoint to ClientConfig in NewLLMClient

Usage

# Environment variable
export OCR_LLM_TIMEOUT=120  # 2 minutes per LLM call

# Or in ~/.opencodereview/config.json
{
  "llm": {
    "url": "https://api.example.com/v1",
    "timeout_sec": 120
  }
}

The default remains 5 minutes when not explicitly configured. All existing behavior is preserved.

The LLM HTTP client had no user-configurable request timeout. When the
LLM API becomes unresponsive (e.g., rate limiting, network issues), the
ocr process hangs indefinitely with 0% CPU, holding an ESTABLISHED TCP
connection that never completes.

Changes:
- Add Timeout field to ResolvedEndpoint struct
- Add OCR_LLM_TIMEOUT environment variable (value in seconds)
- Add timeout_sec field to config.json llm and provider sections
- Pass timeout from ResolvedEndpoint to ClientConfig in NewLLMClient

The default timeout remains 5 minutes when not explicitly configured.
Users can now set a shorter timeout via:
  - Environment: OCR_LLM_TIMEOUT=120
  - Config file: {"llm": {"timeout_sec": 120}}

Fixes alibaba#237
Copilot AI review requested due to automatic review settings June 28, 2026 08:49
@github-actions

Copy link
Copy Markdown
Contributor

OpenCodeReview: No comments generated. Looks good to me.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds user-configurable HTTP request timeouts for LLM API calls by extending endpoint resolution/config parsing and wiring the resolved timeout into the LLM client factory, aiming to prevent ocr scan from hanging indefinitely on stalled LLM requests.

Changes:

  • Introduce ResolvedEndpoint.Timeout (time.Duration) and parse timeout from env/config (OCR_LLM_TIMEOUT, timeout_sec).
  • Support timeout_sec in both legacy llm config and provider entries.
  • Propagate the resolved timeout into ClientConfig via NewLLMClient.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
internal/llm/resolver.go Adds timeout fields and parsing from environment/config into ResolvedEndpoint.
internal/llm/client.go Wires ResolvedEndpoint.Timeout into ClientConfig.Timeout so downstream clients can apply it.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/llm/resolver.go Outdated
Comment on lines +129 to +134
var timeout time.Duration
if timeoutStr := os.Getenv(envOCRLLMTimeout); timeoutStr != "" {
timeoutSec, err := strconv.Atoi(strings.TrimSpace(timeoutStr))
if err != nil {
return ResolvedEndpoint{}, false, fmt.Errorf("OCR_LLM_TIMEOUT must be an integer (seconds): %w", err)
}
Comment thread internal/llm/resolver.go Outdated
Comment on lines +129 to +138
var timeout time.Duration
if timeoutStr := os.Getenv(envOCRLLMTimeout); timeoutStr != "" {
timeoutSec, err := strconv.Atoi(strings.TrimSpace(timeoutStr))
if err != nil {
return ResolvedEndpoint{}, false, fmt.Errorf("OCR_LLM_TIMEOUT must be an integer (seconds): %w", err)
}
if timeoutSec > 0 {
timeout = time.Duration(timeoutSec) * time.Second
}
}
Comment thread internal/llm/resolver.go Outdated
Comment on lines +308 to +312
var timeout time.Duration
if entry.TimeoutSec > 0 {
timeout = time.Duration(entry.TimeoutSec) * time.Second
}

Comment thread internal/llm/resolver.go Outdated
Comment on lines +362 to +365
var timeout time.Duration
if cfg.Llm.TimeoutSec > 0 {
timeout = time.Duration(cfg.Llm.TimeoutSec) * time.Second
}
Comment thread internal/llm/resolver.go
Comment on lines +150 to 153
TimeoutSec int `json:"timeout_sec,omitempty"` // per-request HTTP timeout in seconds
ExtraBody map[string]any `json:"extra_body,omitempty"`
ExtraHeaders map[string]string `json:"extra_headers,omitempty"`
}
Comment thread internal/llm/client.go
Comment on lines 196 to 204
cfg := ClientConfig{
URL: ep.URL,
APIKey: ep.Token,
Model: ep.Model,
AuthHeader: ep.AuthHeader,
Timeout: ep.Timeout,
ExtraBody: ep.ExtraBody,
ExtraHeaders: ep.ExtraHeaders,
}
- Make OCR_LLM_TIMEOUT a global override (works with any endpoint strategy)
- Add proper overflow detection for timeout values (check before multiply)
- Add validateTimeoutSec helper with negative/overflow guards
- Add comprehensive tests for timeout parsing, validation, and forwarding
- Reject negative timeout_sec in config files with explicit error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: LLM HTTP client has no configurable request timeout, causing scan to hang indefinitely

2 participants