feat(http): opt-in HTTP/2 for outbound LLM requests#30369
Conversation
Add an opt-in `litellm.enable_http2` flag (also `LITELLM_ENABLE_HTTP2` env / config.yaml) that switches the outbound transport to httpx with HTTP/2 enabled. Default is OFF, so existing behavior is unchanged. Why: aiohttp (litellm's default async transport) cannot speak HTTP/2, and the httpx clients were never created with http2=True, so all outbound provider calls used HTTP/1.1. Enabling HTTP/2 forces the httpx transport and passes http2=True to the sync/async clients. - enable_http2 + http2_max_connections / http2_max_keepalive_connections globals (+ LITELLM_HTTP2_* env vars), with fail-fast validation - central AsyncHTTPHandler / HTTPHandler + get_async/sync httpx client cache (h2 vs h1 isolated by cache-key suffix that also encodes limits) - OpenAI/Azure provider client builders honor the flag - force_ipv4 path applies http2/limits to the explicit transport (httpx ignores client http2=/limits= once a transport is passed) - a shared aiohttp session takes priority over http2 (warns, stays h1) - /health/readiness reports enable_http2 - optional `litellm[http2]` extra (h2); clear ImportError if missing Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
cron-bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Superseded by #30370, which targets |
Greptile SummaryAdds opt-in HTTP/2 support for outbound LLM requests, default OFF. Enabled via
Confidence Score: 4/5Safe to merge with the sync-transport ordering fix addressed; the default-off path leaves existing behavior completely unchanged. The implementation is well-structured and the default-off guarantee is solid. One real defect exists in HTTPHandler.init: _verify_http2_available() fires before user_transport_wins is computed, meaning a user with a custom sync_transport and HTTP/2 enabled in their environment gets a spurious ImportError about h2 even though h2 is never used for that client. The async path handles the analogous shared_session case correctly, so the fix is clear. The remaining findings are minor inconsistencies that don't affect the common deployment paths. litellm/llms/custom_httpx/http_handler.py (HTTPHandler.init ordering of _verify_http2_available vs user_transport_wins) and litellm/llms/openai/common_utils.py (_get_sync_http_client HTTP/2 path).
|
| Filename | Overview |
|---|---|
| litellm/llms/custom_httpx/http_handler.py | Core HTTP/2 implementation: adds flag resolution, h2 availability check, pool-limit helpers, cache-key suffix, and wires http2/limits through AsyncHTTPHandler and HTTPHandler. Logic is sound with one ordering inconsistency around _verify_http2_available vs user_transport_wins. |
| litellm/llms/openai/common_utils.py | Adds HTTP/2 wiring for OpenAI/Azure async and sync client builders. Async path builds the client inline (consistent with pre-existing pattern). Sync path delegates to HTTPHandler(ssl_verify=ssl_config).client — functional but extracts only .client from a throwaway HTTPHandler. |
| litellm/init.py | Adds three new module-level globals (enable_http2, http2_max_connections, http2_max_keepalive_connections) defaulting to False/None. Minimal and correct. |
| litellm/proxy/health_endpoints/_health_endpoints.py | Adds enable_http2 to /health/readiness response in both code paths. Correct and unambiguous. |
| tests/test_litellm/llms/custom_httpx/test_http2_support.py | 490-line mock-only test suite covering flag resolution, transport selection, client construction, force_ipv4 interaction, shared_session priority, pool limits, cache isolation, and missing-h2 error. No real network calls. One test name is slightly misleading. |
| pyproject.toml | Adds http2 optional extra pinning h2>=4.0.0,<5.0. Clean and appropriate. |
Comments Outside Diff (1)
-
tests/test_litellm/llms/custom_httpx/test_http2_support.py, line 661-664 (link)Test name overclaims "global takes priority over env"
The test verifies
global=Trueoverridesenv=False, which is correct. But the inverse —global=Falseoverridingenv=True— is not true:_should_enable_http2()uses OR logic, soLITELLM_ENABLE_HTTP2=Trueenables HTTP/2 even whenlitellm.enable_http2 = False.test_env_varalready confirms this behavior. Naming the test "global takes priority" implies a symmetry that doesn't exist and could mislead future readers into assuming programmaticlitellm.enable_http2 = Falseis a reliable disable switch. Consider renaming to something liketest_global_true_overrides_env_falseand adding a complementary test that documents the env-wins-over-global-false case explicitly.
Reviews (1): Last reviewed commit: "feat(http): opt-in HTTP/2 for outbound L..." | Re-trigger Greptile
| default_headers = get_default_headers() if not disable_default_headers else None | ||
|
|
||
| if client is None: | ||
| transport = self._create_sync_transport() | ||
| http2_enabled = _should_enable_http2() | ||
| _http2_limits = _get_http2_limits() if http2_enabled else None | ||
| if http2_enabled: | ||
| _verify_http2_available() | ||
|
|
||
| transport = self._create_sync_transport( | ||
| http2=http2_enabled, limits=_http2_limits | ||
| ) | ||
|
|
||
| # A user-supplied litellm.sync_transport (returned by | ||
| # _create_sync_transport when force_ipv4 is False) takes priority and | ||
| # is used as-is. httpx ignores Client(http2=...) once an explicit | ||
| # transport is passed, so HTTP/2 cannot be applied to it — warn rather | ||
| # than silently downgrade. | ||
| user_transport_wins = ( | ||
| http2_enabled and not litellm.force_ipv4 and transport is not None | ||
| ) | ||
| if user_transport_wins: | ||
| verbose_logger.warning( | ||
| "litellm: HTTP/2 is enabled but a custom litellm.sync_transport " | ||
| "was provided. httpx cannot apply HTTP/2 to an explicit transport " | ||
| "— this client will use the provided transport as-is. Set " | ||
| "http2=True on your transport to use HTTP/2." | ||
| ) | ||
|
|
||
| # Create a client with a connection pool | ||
| self.client = httpx.Client( | ||
| client_kwargs: Dict[str, Any] = dict( | ||
| transport=transport, | ||
| timeout=timeout, | ||
| verify=ssl_config, | ||
| cert=cert, | ||
| headers=default_headers, |
There was a problem hiding this comment.
_verify_http2_available() called before user_transport_wins is known
_verify_http2_available() (which raises ImportError if h2 is absent) fires at line 1345 — before _create_sync_transport() is called and before user_transport_wins is computed. If the user has a custom litellm.sync_transport set, HTTP/2 will not be applied anyway (user_transport_wins=True at line 1357), but the h2 package is still required just to reach the warning. A user with a custom sync transport and LITELLM_ENABLE_HTTP2=True in their environment gets an ImportError("install h2") even though h2 is never used for that client.
The async path handles the analogous shared_session case before calling _verify_http2_available() (lines 684–694): it sets http2_enabled=False first, so the h2 check is skipped entirely. The sync path should apply the same pattern — compute transport first, derive user_transport_wins, and only then check h2 availability if HTTP/2 will actually be used.
| if _should_enable_http2(): | ||
| return HTTPHandler(ssl_verify=ssl_config).client |
There was a problem hiding this comment.
Throwaway
HTTPHandler created to extract .client
HTTPHandler(ssl_verify=ssl_config).client instantiates a full HTTPHandler object (including a concurrent_requests semaphore, default-header computation, and the _verify_http2_available() side-effect on _HTTP2_AVAILABLE) just to return .client. The HTTPHandler wrapper is then discarded. By contrast, the async path in this same file builds the httpx.AsyncClient inline. For consistency and to avoid the wasted allocation, the sync path could also build the httpx.Client inline (mirroring the async path), or delegate to _get_httpx_client() and return .client from the cached instance.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Relevant issues
Fixes #30362 (and the earlier stale-closed #3533) — outbound requests to upstream LLM providers always used HTTP/1.1.
Type
🆕 New Feature
Changes
Adds opt-in HTTP/2 for outbound LLM requests. Default is OFF, so existing behavior is byte-for-byte unchanged.
Why
litellm's default async transport is aiohttp, which cannot speak HTTP/2, and the httpx clients were never created with
http2=True. As a result every outbound provider call used HTTP/1.1, even though providers like Bedrock, Vertex and Anthropic front their APIs with HTTP/2-capable endpoints. Enabling HTTP/2 forces the httpx transport and passeshttp2=Trueto the sync/async clients.How to enable
or
LITELLM_ENABLE_HTTP2=True, orlitellm.enable_http2 = True(SDK).Optional pool tuning:
http2_max_connections/http2_max_keepalive_connections(orLITELLM_HTTP2_MAX_CONNECTIONS/LITELLM_HTTP2_MAX_KEEPALIVE_CONNECTIONS).Implementation notes
enable_http2,http2_max_connections,http2_max_keepalive_connections(with fail-fast validation of pool limits).AsyncHTTPHandler/HTTPHandler+get_async_httpx_client/_get_httpx_clientcache — HTTP/2 clients are isolated from HTTP/1.1 in the cache via a key suffix that also encodes the pool limits (no stale-client reuse when the config changes).openai/common_utils.py) honor the flag, so OpenAI/Azure outbound calls also negotiate HTTP/2.force_ipv4path applieshttp2/limitsto the explicit transport (httpx ignoresClient(http2=..., limits=...)once a transport is passed).shared_sessiontakes priority over HTTP/2 (it cannot speak HTTP/2) — we warn and keep HTTP/1.1 rather than silently dropping the session./health/readinessnow reportsenable_http2.litellm[http2](pulls inh2); a clearImportErroris raised if HTTP/2 is requested withouth2installed.Pre-Submission checklist
tests/test_litellm/llms/custom_httpx/test_http2_support.py)Screenshots / Proof of Fix
Functional verification — outbound HTTP/2 negotiated end-to-end
Verified on a real EKS cluster (ap-northeast-1) using the modified handler against real upstreams. All checks passed (
response.http_version == "HTTP/2"on Bedrock and Anthropic, sync + async), and the default-off path still selects the aiohttp transport:Performance — the large-enterprise / connection-constrained case
This is where HTTP/2 matters most. Large deployments routing high request volume to a single provider host commonly run behind egress controls that cap the number of outbound TCP connections — corporate proxies, NAT Gateways, load balancers, or simply a deliberately small connection pool to conserve resources. Under HTTP/1.1 each connection carries one in-flight request, so concurrency is hard-capped at the pool size and excess requests queue. HTTP/2 multiplexes many streams over the same few connections, eliminating that queueing.
Benchmark: Claude Haiku 4.5 on Bedrock (ap-northeast-1), 60 concurrent small requests, 150 requests total, connection pool capped at 6 for both modes:
With the same 6 connections, HTTP/2 delivered ~4.9× the throughput (+388%) and cut p50 latency by ~85% (5.93s → 0.91s). The control row shows why: HTTP/1.1 with an unconstrained pool reaches the same ~0.9s p50 — so HTTP/2's win is specifically that it achieves unconstrained-level performance using a fraction of the connections. For enterprises whose outbound connection count is constrained by network policy, this is a direct, large latency/throughput improvement.
For the non-constrained, large-payload case (few big requests over many connections) HTTP/2 and HTTP/1.1 perform equivalently — HTTP/2 is not a universal speed-up, which is exactly why this ships as an opt-in flag rather than a default change.