feat(http): opt-in HTTP/2 for outbound LLM requests by KevinZhao · Pull Request #30369 · BerriAI/litellm

KevinZhao · 2026-06-13T12:44:39Z

Relevant issues

Fixes #30362 (and the earlier stale-closed #3533) — outbound requests to upstream LLM providers always used HTTP/1.1.

Type

🆕 New Feature

Changes

Adds opt-in HTTP/2 for outbound LLM requests. Default is OFF, so existing behavior is byte-for-byte unchanged.

Why

litellm's default async transport is aiohttp, which cannot speak HTTP/2, and the httpx clients were never created with http2=True. As a result every outbound provider call used HTTP/1.1, even though providers like Bedrock, Vertex and Anthropic front their APIs with HTTP/2-capable endpoints. Enabling HTTP/2 forces the httpx transport and passes http2=True to the sync/async clients.

How to enable

# config.yaml
litellm_settings:
  enable_http2: true

or LITELLM_ENABLE_HTTP2=True, or litellm.enable_http2 = True (SDK).

Optional pool tuning: http2_max_connections / http2_max_keepalive_connections (or LITELLM_HTTP2_MAX_CONNECTIONS / LITELLM_HTTP2_MAX_KEEPALIVE_CONNECTIONS).

Implementation notes

New globals: enable_http2, http2_max_connections, http2_max_keepalive_connections (with fail-fast validation of pool limits).
Central AsyncHTTPHandler / HTTPHandler + get_async_httpx_client / _get_httpx_client cache — HTTP/2 clients are isolated from HTTP/1.1 in the cache via a key suffix that also encodes the pool limits (no stale-client reuse when the config changes).
OpenAI / Azure provider client builders (openai/common_utils.py) honor the flag, so OpenAI/Azure outbound calls also negotiate HTTP/2.
force_ipv4 path applies http2/limits to the explicit transport (httpx ignores Client(http2=..., limits=...) once a transport is passed).
A caller-supplied aiohttp shared_session takes priority over HTTP/2 (it cannot speak HTTP/2) — we warn and keep HTTP/1.1 rather than silently dropping the session.
/health/readiness now reports enable_http2.
New optional extra litellm[http2] (pulls in h2); a clear ImportError is raised if HTTP/2 is requested without h2 installed.

Pre-Submission checklist

I have added meaningful tests (tests/test_litellm/llms/custom_httpx/test_http2_support.py)
My PR's scope is as isolated as possible; it only solves 1 specific problem
Default-off path verified to be unchanged (existing custom_httpx + openai suites pass)

Screenshots / Proof of Fix

Functional verification — outbound HTTP/2 negotiated end-to-end

Verified on a real EKS cluster (ap-northeast-1) using the modified handler against real upstreams. All checks passed (response.http_version == "HTTP/2" on Bedrock and Anthropic, sync + async), and the default-off path still selects the aiohttp transport:

[PASS] default-off:_should_enable_http2: HTTP/2 correctly OFF by default
[PASS] default-off:transport: default transport=LiteLLMAiohttpTransport
[PASS] on:async-pool-http2-flag: async client pool _http2=True
[PASS] async-wire:bedrock-tokyo: http_version=HTTP/2 status=404
[PASS] async-wire:anthropic:     http_version=HTTP/2 status=404
[PASS] on:sync-pool-http2-flag: sync client pool _http2=True
[PASS] sync-wire:bedrock-tokyo:  http_version=HTTP/2 status=404
=== RESULT: 9/9 checks passed ===

Performance — the large-enterprise / connection-constrained case

This is where HTTP/2 matters most. Large deployments routing high request volume to a single provider host commonly run behind egress controls that cap the number of outbound TCP connections — corporate proxies, NAT Gateways, load balancers, or simply a deliberately small connection pool to conserve resources. Under HTTP/1.1 each connection carries one in-flight request, so concurrency is hard-capped at the pool size and excess requests queue. HTTP/2 multiplexes many streams over the same few connections, eliminating that queueing.

Benchmark: Claude Haiku 4.5 on Bedrock (ap-northeast-1), 60 concurrent small requests, 150 requests total, connection pool capped at 6 for both modes:

Mode	Pool	Throughput	p50	p95	p99
HTTP/1.1	6 (constrained)	9.5 req/s	5.93s	6.41s	7.07s
HTTP/2	6 (constrained)	46.4 req/s	0.91s	1.43s	1.63s
HTTP/1.1	unconstrained (control)	50.9 req/s	0.91s	1.31s	1.71s

With the same 6 connections, HTTP/2 delivered ~4.9× the throughput (+388%) and cut p50 latency by ~85% (5.93s → 0.91s). The control row shows why: HTTP/1.1 with an unconstrained pool reaches the same ~0.9s p50 — so HTTP/2's win is specifically that it achieves unconstrained-level performance using a fraction of the connections. For enterprises whose outbound connection count is constrained by network policy, this is a direct, large latency/throughput improvement.

For the non-constrained, large-payload case (few big requests over many connections) HTTP/2 and HTTP/1.1 perform equivalently — HTTP/2 is not a universal speed-up, which is exactly why this ships as an opt-in flag rather than a default change.

Add an opt-in `litellm.enable_http2` flag (also `LITELLM_ENABLE_HTTP2` env / config.yaml) that switches the outbound transport to httpx with HTTP/2 enabled. Default is OFF, so existing behavior is unchanged. Why: aiohttp (litellm's default async transport) cannot speak HTTP/2, and the httpx clients were never created with http2=True, so all outbound provider calls used HTTP/1.1. Enabling HTTP/2 forces the httpx transport and passes http2=True to the sync/async clients. - enable_http2 + http2_max_connections / http2_max_keepalive_connections globals (+ LITELLM_HTTP2_* env vars), with fail-fast validation - central AsyncHTTPHandler / HTTPHandler + get_async/sync httpx client cache (h2 vs h1 isolated by cache-key suffix that also encodes limits) - OpenAI/Azure provider client builders honor the flag - force_ipv4 path applies http2/limits to the explicit transport (httpx ignores client http2=/limits= once a transport is passed) - a shared aiohttp session takes priority over http2 (warns, stays h1) - /health/readiness reports enable_http2 - optional `litellm[http2]` extra (h2); clear ImportError if missing Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CLAassistant · 2026-06-13T12:44:47Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

cron-bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codspeed-hq · 2026-06-13T12:47:00Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing KevinZhao:feat/outbound-http2-support (d056059) with main (343e453)}

codecov · 2026-06-13T12:47:46Z

Codecov Report

❌ Patch coverage is 95.61404% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/llms/openai/common_utils.py	58.33%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

KevinZhao · 2026-06-13T12:48:28Z

Superseded by #30370, which targets litellm_oss_branch (the required base for fork contributions) and drops the uv.lock change (fork PRs cannot modify the lockfile).

greptile-apps · 2026-06-13T12:54:08Z

Greptile Summary

Adds opt-in HTTP/2 support for outbound LLM requests, default OFF. Enabled via litellm.enable_http2 = True, LITELLM_ENABLE_HTTP2 env var, or litellm_settings.enable_http2 in config.yaml. Requires the new litellm[http2] extra (h2 package).

New helpers _should_enable_http2(), _verify_http2_available(), _get_http2_limits(), and _http2_cache_key_suffix() centralize flag resolution and cache-key isolation so HTTP/2 and HTTP/1.1 client pools never share a cache entry.
AsyncHTTPHandler, HTTPHandler, and the OpenAI/Azure client builders in common_utils.py are wired to pass http2=True and optional pool limits to httpx; the force_ipv4 transport path applies them to the explicit transport object (since httpx ignores Client(http2=...) once a transport is passed).
Grace paths: a caller-supplied aiohttp shared_session silently falls back to HTTP/1.1 with a warning; a user-supplied litellm.sync_transport is returned as-is with a warning that HTTP/2 was not applied.

Confidence Score: 4/5

Safe to merge with the sync-transport ordering fix addressed; the default-off path leaves existing behavior completely unchanged.

The implementation is well-structured and the default-off guarantee is solid. One real defect exists in HTTPHandler.init: _verify_http2_available() fires before user_transport_wins is computed, meaning a user with a custom sync_transport and HTTP/2 enabled in their environment gets a spurious ImportError about h2 even though h2 is never used for that client. The async path handles the analogous shared_session case correctly, so the fix is clear. The remaining findings are minor inconsistencies that don't affect the common deployment paths.

litellm/llms/custom_httpx/http_handler.py (HTTPHandler.init ordering of _verify_http2_available vs user_transport_wins) and litellm/llms/openai/common_utils.py (_get_sync_http_client HTTP/2 path).

Important Files Changed

Filename	Overview
litellm/llms/custom_httpx/http_handler.py	Core HTTP/2 implementation: adds flag resolution, h2 availability check, pool-limit helpers, cache-key suffix, and wires http2/limits through AsyncHTTPHandler and HTTPHandler. Logic is sound with one ordering inconsistency around _verify_http2_available vs user_transport_wins.
litellm/llms/openai/common_utils.py	Adds HTTP/2 wiring for OpenAI/Azure async and sync client builders. Async path builds the client inline (consistent with pre-existing pattern). Sync path delegates to HTTPHandler(ssl_verify=ssl_config).client — functional but extracts only .client from a throwaway HTTPHandler.
litellm/init.py	Adds three new module-level globals (enable_http2, http2_max_connections, http2_max_keepalive_connections) defaulting to False/None. Minimal and correct.
litellm/proxy/health_endpoints/_health_endpoints.py	Adds enable_http2 to /health/readiness response in both code paths. Correct and unambiguous.
tests/test_litellm/llms/custom_httpx/test_http2_support.py	490-line mock-only test suite covering flag resolution, transport selection, client construction, force_ipv4 interaction, shared_session priority, pool limits, cache isolation, and missing-h2 error. No real network calls. One test name is slightly misleading.
pyproject.toml	Adds http2 optional extra pinning h2>=4.0.0,<5.0. Clean and appropriate.

Comments Outside Diff (1)

tests/test_litellm/llms/custom_httpx/test_http2_support.py, line 661-664 (link)

Test name overclaims "global takes priority over env"

The test verifies global=True overrides env=False, which is correct. But the inverse — global=False overriding env=True — is not true: _should_enable_http2() uses OR logic, so LITELLM_ENABLE_HTTP2=True enables HTTP/2 even when litellm.enable_http2 = False. test_env_var already confirms this behavior. Naming the test "global takes priority" implies a symmetry that doesn't exist and could mislead future readers into assuming programmatic litellm.enable_http2 = False is a reliable disable switch. Consider renaming to something like test_global_true_overrides_env_false and adding a complementary test that documents the env-wins-over-global-false case explicitly.

_{Reviews (1): Last reviewed commit: "feat(http): opt-in HTTP/2 for outbound L..." | Re-trigger Greptile}

greptile-apps · 2026-06-13T12:54:11Z

        default_headers = get_default_headers() if not disable_default_headers else None

        if client is None:
-            transport = self._create_sync_transport()
+            http2_enabled = _should_enable_http2()
+            _http2_limits = _get_http2_limits() if http2_enabled else None
+            if http2_enabled:
+                _verify_http2_available()
+
+            transport = self._create_sync_transport(
+                http2=http2_enabled, limits=_http2_limits
+            )
+
+            # A user-supplied litellm.sync_transport (returned by
+            # _create_sync_transport when force_ipv4 is False) takes priority and
+            # is used as-is. httpx ignores Client(http2=...) once an explicit
+            # transport is passed, so HTTP/2 cannot be applied to it — warn rather
+            # than silently downgrade.
+            user_transport_wins = (
+                http2_enabled and not litellm.force_ipv4 and transport is not None
+            )
+            if user_transport_wins:
+                verbose_logger.warning(
+                    "litellm: HTTP/2 is enabled but a custom litellm.sync_transport "
+                    "was provided. httpx cannot apply HTTP/2 to an explicit transport "
+                    "— this client will use the provided transport as-is. Set "
+                    "http2=True on your transport to use HTTP/2."
+                )

            # Create a client with a connection pool
-            self.client = httpx.Client(
+            client_kwargs: Dict[str, Any] = dict(
                transport=transport,
                timeout=timeout,
                verify=ssl_config,
                cert=cert,
                headers=default_headers,


_verify_http2_available() called before user_transport_wins is known

_verify_http2_available() (which raises ImportError if h2 is absent) fires at line 1345 — before _create_sync_transport() is called and before user_transport_wins is computed. If the user has a custom litellm.sync_transport set, HTTP/2 will not be applied anyway (user_transport_wins=True at line 1357), but the h2 package is still required just to reach the warning. A user with a custom sync transport and LITELLM_ENABLE_HTTP2=True in their environment gets an ImportError("install h2") even though h2 is never used for that client.

The async path handles the analogous shared_session case before calling _verify_http2_available() (lines 684–694): it sets http2_enabled=False first, so the h2 check is skipped entirely. The sync path should apply the same pattern — compute transport first, derive user_transport_wins, and only then check h2 availability if HTTP/2 will actually be used.

greptile-apps · 2026-06-13T12:54:12Z

+        if _should_enable_http2():
+            return HTTPHandler(ssl_verify=ssl_config).client


Throwaway HTTPHandler created to extract .client

HTTPHandler(ssl_verify=ssl_config).client instantiates a full HTTPHandler object (including a concurrent_requests semaphore, default-header computation, and the _verify_http2_available() side-effect on _HTTP2_AVAILABLE) just to return .client. The HTTPHandler wrapper is then discarded. By contrast, the async path in this same file builds the httpx.AsyncClient inline. For consistency and to avoid the wasted allocation, the sync path could also build the httpx.Client inline (mirroring the async path), or delegate to _get_httpx_client() and return .client from the cached instance.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

KevinZhao closed this Jun 13, 2026

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(http): opt-in HTTP/2 for outbound LLM requests#30369

feat(http): opt-in HTTP/2 for outbound LLM requests#30369
KevinZhao wants to merge 1 commit into
BerriAI:mainfrom
KevinZhao:feat/outbound-http2-support

KevinZhao commented Jun 13, 2026

Uh oh!

CLAassistant commented Jun 13, 2026

Uh oh!

codspeed-hq Bot commented Jun 13, 2026

Uh oh!

codecov Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

KevinZhao commented Jun 13, 2026

Uh oh!

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if _should_enable_http2():
		return HTTPHandler(ssl_verify=ssl_config).client

Uh oh!

Conversation

KevinZhao commented Jun 13, 2026

Relevant issues

Type

Changes

Why

How to enable

Implementation notes

Pre-Submission checklist

Screenshots / Proof of Fix

Functional verification — outbound HTTP/2 negotiated end-to-end

Performance — the large-enterprise / connection-constrained case

Uh oh!

CLAassistant commented Jun 13, 2026

Uh oh!

codspeed-hq Bot commented Jun 13, 2026

Merging this PR will not alter performance

Uh oh!

codecov Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

KevinZhao commented Jun 13, 2026

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 13, 2026 •

edited

Loading

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading