feat(http): opt-in HTTP/2 for outbound LLM requests#30370
Conversation
Add an opt-in `litellm.enable_http2` flag (also `LITELLM_ENABLE_HTTP2` env / config.yaml) that switches the outbound transport to httpx with HTTP/2 enabled. Default is OFF, so existing behavior is unchanged. Why: aiohttp (litellm's default async transport) cannot speak HTTP/2, and the httpx clients were never created with http2=True, so all outbound provider calls used HTTP/1.1. Enabling HTTP/2 forces the httpx transport and passes http2=True to the sync/async clients. - enable_http2 + http2_max_connections / http2_max_keepalive_connections globals (+ LITELLM_HTTP2_* env vars), with fail-fast validation - central AsyncHTTPHandler / HTTPHandler + get_async/sync httpx client cache (h2 vs h1 isolated by cache-key suffix that also encodes limits) - OpenAI/Azure provider client builders honor the flag - force_ipv4 path applies http2/limits to the explicit transport (httpx ignores client http2=/limits= once a transport is passed) - a shared aiohttp session takes priority over http2 (warns, stays h1) - /health/readiness reports enable_http2 - requires the `h2` package (pip install h2); clear ImportError if missing
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
HTTPHandler(ssl_verify=...) only accepts bool|str|None, but get_ssl_configuration() can return an ssl.SSLContext. HTTPHandler resolves SSL config internally, so pass no ssl_verify — fixes a mypy arg-type error.
Status / CI notesThe two remaining red checks —
All code-owned issues are resolved (the earlier mypy This PR deliberately does not modify |
Greptile SummaryThis PR adds opt-in HTTP/2 support for outbound LLM requests (
Confidence Score: 5/5Safe to merge. The feature is default-off and all new code paths are gated behind the opt-in flag, leaving existing HTTP/1.1 behavior byte-for-byte unchanged. The implementation is well-scoped: HTTP/2 is strictly opt-in, the aiohttp default path is untouched, and every non-trivial edge case (force_ipv4, shared_session, user sync_transport, missing h2 package, runtime config changes) is explicitly handled and tested. The previously identified issues in the OpenAI/Azure client builder are correctly resolved in this diff. tests/test_litellm/llms/custom_httpx/test_http2_support.py — several test classes construct live HTTP/2 clients and will raise a bare ImportError rather than skip cleanly if the h2 package is absent from the test environment.
|
| Filename | Overview |
|---|---|
| litellm/init.py | Adds three new module-level globals: enable_http2 (bool, default False), http2_max_connections (Optional[int]), and http2_max_keepalive_connections (Optional[int]). Clean default-off addition that doesn't touch existing settings. |
| litellm/llms/custom_httpx/http_handler.py | Core implementation: adds helpers to resolve the HTTP/2 flag, validate h2, compute pool limits, and generate cache-key suffixes; threads http2/limits through AsyncHTTPHandler and HTTPHandler construction; handles force_ipv4, shared_session, and user sync_transport edge cases correctly. |
| litellm/llms/openai/common_utils.py | Updates _get_async_http_client to pass http2/limits through _create_async_transport; emits shared-session downgrade warning inline; updates _get_sync_http_client to return cached _get_httpx_client().client when HTTP/2 is enabled. |
| litellm/proxy/health_endpoints/_health_endpoints.py | Minimal change: imports _should_enable_http2 and adds enable_http2 to both readiness-response branches. |
| tests/test_litellm/llms/custom_httpx/test_http2_support.py | 504 lines covering flag logic, transport selection, client construction, force_ipv4, shared-session priority, pool limits, cache isolation, and missing-h2 error path — all without real network calls. Tests constructing HTTP/2 clients will raise ImportError instead of skipping cleanly if h2 is absent. |
| tests/test_litellm/llms/openai/test_openai_common_utils.py | Adds TestOpenAIClientHttp2 covering async/sync client construction, shared-session warning, and sync-client caching under HTTP/2. |
Reviews (4): Last reviewed commit: "fix(http): restore 'Using AiohttpTranspo..." | Re-trigger Greptile
Greptile SummaryThis PR adds opt-in HTTP/2 support for outbound LLM requests behind a
Confidence Score: 4/5The default-off path is unchanged and the new code is well-isolated behind the opt-in flag, but the sync OpenAI HTTP/2 path creates a fresh connection pool per call rather than reusing the cached pool. The implementation is thorough and the default (HTTP/1.1) behavior is byte-for-byte preserved. The one functional concern is that _get_sync_http_client() with HTTP/2 enabled instantiates a new HTTPHandler() per call instead of going through _get_httpx_client(), meaning sync OpenAI calls do not share a connection pool across requests if the upstream OpenAI client object is not long-lived. This could silently undercut the connection-reuse benefit that motivates the feature. litellm/llms/openai/common_utils.py — the sync HTTP/2 branch in _get_sync_http_client() and the dead ssl_config assignment above it.
|
| Filename | Overview |
|---|---|
| litellm/init.py | Adds three new module-level globals (enable_http2, http2_max_connections, http2_max_keepalive_connections), all defaulting to off/None — clean, backward-compatible addition. |
| litellm/llms/custom_httpx/http_handler.py | Core HTTP/2 logic: adds helper functions, extends transport creation for async and sync handlers, adds cache-key suffix for HTTP/2 isolation. Logic is sound; the uncached-pool concern is in the common_utils.py caller. |
| litellm/llms/openai/common_utils.py | _get_async_http_client() correctly threads HTTP/2 settings through transport creation; _get_sync_http_client() with HTTP/2 delegates to HTTPHandler().client, bypassing the _get_httpx_client() cache, and leaves ssl_config as dead code in the HTTP/2 branch. |
| litellm/proxy/health_endpoints/_health_endpoints.py | Minimal, correct addition of enable_http2 field to both readiness detail response paths. |
| tests/test_litellm/llms/custom_httpx/test_http2_support.py | Comprehensive unit test coverage with proper fixture-based global/env restoration; tests that construct real HTTP/2 clients need a pytest.importorskip("h2") guard to avoid opaque ImportError when h2 is absent. |
Comments Outside Diff (1)
-
tests/test_litellm/llms/custom_httpx/test_http2_support.py, line 685-706 (link)Tests that construct HTTP/2 clients will raise
ImportErrorinstead of skipping whenh2is absentTestClientConstruction,TestForceIpv4WithHttp2,TestHttp2Limits, andTestCacheIsolationall callAsyncHTTPHandler()orHTTPHandler()withenable_http2=True, which triggers_verify_http2_available()→import h2. Ifh2is not installed the tests crash with a bareImportErrorrather than being marked as skipped. Adding a module-level guard would produce a clean skip:h2 = pytest.importorskip("h2", reason="h2 package required for HTTP/2 tests")
This is especially important because the PR explicitly states that
uv.lockwas not modified andh2must be installed separately (pip install h2).
Reviews (2): Last reviewed commit: "fix(http): drop SSLContext arg to HTTPHa..." | Re-trigger Greptile
…ders - async: emit a warning when a shared aiohttp session is combined with enable_http2 (this builder calls the static _create_async_transport directly, so it must warn itself rather than relying on AsyncHTTPHandler.create_client). Previously it silently downgraded to HTTP/1.1. - sync: route the HTTP/2 path through the cached _get_httpx_client() instead of instantiating a fresh HTTPHandler() per call, so sync OpenAI requests share a single connection pool (the connection-reuse benefit HTTP/2 is for). Also drop the dead ssl_config assignment on the HTTP/2 branch. - tests: cover both builders' default-off, http2-on, shared-session-warning, and sync cache-reuse behavior.
|
Addressed both Greptile findings in
Added tests in |
|
Both fixes look correct and complete. Async shared-session warning ( Sync pool reuse ( The test coverage in
No further concerns on these two items. Tip: You can customize Greptile's behavior for this repo with |
5aeb7d0 to
35d35e5
Compare
|
recheck |
|
The companion docs PR is ready: BerriAI/litellm-docs#346 It adds the three new env vars ( The two remaining red checks here ( CLA is signed ✅ ("All committers have signed the CLA"). All other CI is green. |
|
@KevinZhao can you get the greptile score to 5/5? Thanks |
|
Thanks for the review! Status update: greptile feedback — all addressed in
CI: the failing |
…ort path The HTTP/2 refactor moved transport selection out of _should_use_aiohttp_transport() (now global-config only) into _create_async_transport(), which dropped the 'Using AiohttpTransport...' verbose debug log on the path that actually builds the aiohttp transport. Restore it there and add a regression test.
|
@greptileai review |
|
Thanks for the contribution! We're triggering a Greptile code review on this PR — we'll take a closer look once the results are in! |
Register LITELLM_ENABLE_HTTP2, LITELLM_HTTP2_MAX_CONNECTIONS and LITELLM_HTTP2_MAX_KEEPALIVE_CONNECTIONS in the environment variables reference table so the documentation_tests/test_env_keys.py doc-lint (run in BerriAI/litellm CI) passes for the outbound HTTP/2 feature (BerriAI/litellm#30370).
Heads-up on the two failing checks (
|
Relevant issues
Fixes #30362 (and the earlier stale-closed #3533) — outbound requests to upstream LLM providers always used HTTP/1.1.
Type
🆕 New Feature
Changes
Adds opt-in HTTP/2 for outbound LLM requests. Default is OFF, so existing behavior is byte-for-byte unchanged.
Why
litellm's default async transport is aiohttp, which cannot speak HTTP/2, and the httpx clients were never created with
http2=True. As a result every outbound provider call used HTTP/1.1, even though providers like Bedrock, Vertex and Anthropic front their APIs with HTTP/2-capable endpoints. Enabling HTTP/2 forces the httpx transport and passeshttp2=Trueto the sync/async clients.How to enable
or
LITELLM_ENABLE_HTTP2=True, orlitellm.enable_http2 = True(SDK).Optional pool tuning:
http2_max_connections/http2_max_keepalive_connections(orLITELLM_HTTP2_MAX_CONNECTIONS/LITELLM_HTTP2_MAX_KEEPALIVE_CONNECTIONS).Implementation notes
enable_http2,http2_max_connections,http2_max_keepalive_connections(with fail-fast validation of pool limits).AsyncHTTPHandler/HTTPHandler+get_async_httpx_client/_get_httpx_clientcache — HTTP/2 clients are isolated from HTTP/1.1 in the cache via a key suffix that also encodes the pool limits (no stale-client reuse when the config changes).openai/common_utils.py) honor the flag, so OpenAI/Azure outbound calls also negotiate HTTP/2.force_ipv4path applieshttp2/limitsto the explicit transport (httpx ignoresClient(http2=..., limits=...)once a transport is passed).shared_sessiontakes priority over HTTP/2 (it cannot speak HTTP/2) — we warn and keep HTTP/1.1 rather than silently dropping the session./health/readinessnow reportsenable_http2.litellm[http2](pulls inh2); a clearImportErroris raised if HTTP/2 is requested withouth2installed.Pre-Submission checklist
tests/test_litellm/llms/custom_httpx/test_http2_support.py)Screenshots / Proof of Fix
Functional verification — outbound HTTP/2 negotiated end-to-end
Verified on a real EKS cluster (ap-northeast-1) using the modified handler against real upstreams. All checks passed (
response.http_version == "HTTP/2"on Bedrock and Anthropic, sync + async), and the default-off path still selects the aiohttp transport:Performance — the large-enterprise / connection-constrained case
This is where HTTP/2 matters most. Large deployments routing high request volume to a single provider host commonly run behind egress controls that cap the number of outbound TCP connections — corporate proxies, NAT Gateways, load balancers, or simply a deliberately small connection pool to conserve resources. Under HTTP/1.1 each connection carries one in-flight request, so concurrency is hard-capped at the pool size and excess requests queue. HTTP/2 multiplexes many streams over the same few connections, eliminating that queueing.
Benchmark: Claude Haiku 4.5 on Bedrock (ap-northeast-1), 60 concurrent small requests, 150 requests total, connection pool capped at 6 for both modes:
With the same 6 connections, HTTP/2 delivered ~4.9× the throughput (+388%) and cut p50 latency by ~85% (5.93s → 0.91s). The control row shows why: HTTP/1.1 with an unconstrained pool reaches the same ~0.9s p50 — so HTTP/2's win is specifically that it achieves unconstrained-level performance using a fraction of the connections. For enterprises whose outbound connection count is constrained by network policy, this is a direct, large latency/throughput improvement.
For the non-constrained, large-payload case (few big requests over many connections) HTTP/2 and HTTP/1.1 perform equivalently — HTTP/2 is not a universal speed-up, which is exactly why this ships as an opt-in flag rather than a default change.