Skip to content

feat(http): opt-in HTTP/2 for outbound LLM requests#30369

Closed
KevinZhao wants to merge 1 commit into
BerriAI:mainfrom
KevinZhao:feat/outbound-http2-support
Closed

feat(http): opt-in HTTP/2 for outbound LLM requests#30369
KevinZhao wants to merge 1 commit into
BerriAI:mainfrom
KevinZhao:feat/outbound-http2-support

Conversation

@KevinZhao

Copy link
Copy Markdown
Contributor

Relevant issues

Fixes #30362 (and the earlier stale-closed #3533) — outbound requests to upstream LLM providers always used HTTP/1.1.

Type

🆕 New Feature

Changes

Adds opt-in HTTP/2 for outbound LLM requests. Default is OFF, so existing behavior is byte-for-byte unchanged.

Why

litellm's default async transport is aiohttp, which cannot speak HTTP/2, and the httpx clients were never created with http2=True. As a result every outbound provider call used HTTP/1.1, even though providers like Bedrock, Vertex and Anthropic front their APIs with HTTP/2-capable endpoints. Enabling HTTP/2 forces the httpx transport and passes http2=True to the sync/async clients.

How to enable

# config.yaml
litellm_settings:
  enable_http2: true

or LITELLM_ENABLE_HTTP2=True, or litellm.enable_http2 = True (SDK).

Optional pool tuning: http2_max_connections / http2_max_keepalive_connections (or LITELLM_HTTP2_MAX_CONNECTIONS / LITELLM_HTTP2_MAX_KEEPALIVE_CONNECTIONS).

Implementation notes

  • New globals: enable_http2, http2_max_connections, http2_max_keepalive_connections (with fail-fast validation of pool limits).
  • Central AsyncHTTPHandler / HTTPHandler + get_async_httpx_client / _get_httpx_client cache — HTTP/2 clients are isolated from HTTP/1.1 in the cache via a key suffix that also encodes the pool limits (no stale-client reuse when the config changes).
  • OpenAI / Azure provider client builders (openai/common_utils.py) honor the flag, so OpenAI/Azure outbound calls also negotiate HTTP/2.
  • force_ipv4 path applies http2/limits to the explicit transport (httpx ignores Client(http2=..., limits=...) once a transport is passed).
  • A caller-supplied aiohttp shared_session takes priority over HTTP/2 (it cannot speak HTTP/2) — we warn and keep HTTP/1.1 rather than silently dropping the session.
  • /health/readiness now reports enable_http2.
  • New optional extra litellm[http2] (pulls in h2); a clear ImportError is raised if HTTP/2 is requested without h2 installed.

Pre-Submission checklist

  • I have added meaningful tests (tests/test_litellm/llms/custom_httpx/test_http2_support.py)
  • My PR's scope is as isolated as possible; it only solves 1 specific problem
  • Default-off path verified to be unchanged (existing custom_httpx + openai suites pass)

Screenshots / Proof of Fix

Functional verification — outbound HTTP/2 negotiated end-to-end

Verified on a real EKS cluster (ap-northeast-1) using the modified handler against real upstreams. All checks passed (response.http_version == "HTTP/2" on Bedrock and Anthropic, sync + async), and the default-off path still selects the aiohttp transport:

[PASS] default-off:_should_enable_http2: HTTP/2 correctly OFF by default
[PASS] default-off:transport: default transport=LiteLLMAiohttpTransport
[PASS] on:async-pool-http2-flag: async client pool _http2=True
[PASS] async-wire:bedrock-tokyo: http_version=HTTP/2 status=404
[PASS] async-wire:anthropic:     http_version=HTTP/2 status=404
[PASS] on:sync-pool-http2-flag: sync client pool _http2=True
[PASS] sync-wire:bedrock-tokyo:  http_version=HTTP/2 status=404
=== RESULT: 9/9 checks passed ===

Performance — the large-enterprise / connection-constrained case

This is where HTTP/2 matters most. Large deployments routing high request volume to a single provider host commonly run behind egress controls that cap the number of outbound TCP connections — corporate proxies, NAT Gateways, load balancers, or simply a deliberately small connection pool to conserve resources. Under HTTP/1.1 each connection carries one in-flight request, so concurrency is hard-capped at the pool size and excess requests queue. HTTP/2 multiplexes many streams over the same few connections, eliminating that queueing.

Benchmark: Claude Haiku 4.5 on Bedrock (ap-northeast-1), 60 concurrent small requests, 150 requests total, connection pool capped at 6 for both modes:

Mode Pool Throughput p50 p95 p99 Errors
HTTP/1.1 6 (constrained) 9.5 req/s 5.93s 6.41s 7.07s 0
HTTP/2 6 (constrained) 46.4 req/s 0.91s 1.43s 1.63s 0
HTTP/1.1 unconstrained (control) 50.9 req/s 0.91s 1.31s 1.71s 0

With the same 6 connections, HTTP/2 delivered ~4.9× the throughput (+388%) and cut p50 latency by ~85% (5.93s → 0.91s). The control row shows why: HTTP/1.1 with an unconstrained pool reaches the same ~0.9s p50 — so HTTP/2's win is specifically that it achieves unconstrained-level performance using a fraction of the connections. For enterprises whose outbound connection count is constrained by network policy, this is a direct, large latency/throughput improvement.

For the non-constrained, large-payload case (few big requests over many connections) HTTP/2 and HTTP/1.1 perform equivalently — HTTP/2 is not a universal speed-up, which is exactly why this ships as an opt-in flag rather than a default change.

Add an opt-in `litellm.enable_http2` flag (also `LITELLM_ENABLE_HTTP2`
env / config.yaml) that switches the outbound transport to httpx with
HTTP/2 enabled. Default is OFF, so existing behavior is unchanged.

Why: aiohttp (litellm's default async transport) cannot speak HTTP/2,
and the httpx clients were never created with http2=True, so all
outbound provider calls used HTTP/1.1. Enabling HTTP/2 forces the httpx
transport and passes http2=True to the sync/async clients.

- enable_http2 + http2_max_connections / http2_max_keepalive_connections
  globals (+ LITELLM_HTTP2_* env vars), with fail-fast validation
- central AsyncHTTPHandler / HTTPHandler + get_async/sync httpx client
  cache (h2 vs h1 isolated by cache-key suffix that also encodes limits)
- OpenAI/Azure provider client builders honor the flag
- force_ipv4 path applies http2/limits to the explicit transport
  (httpx ignores client http2=/limits= once a transport is passed)
- a shared aiohttp session takes priority over http2 (warns, stays h1)
- /health/readiness reports enable_http2
- optional `litellm[http2]` extra (h2); clear ImportError if missing

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


cron-bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq

codspeed-hq Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing KevinZhao:feat/outbound-http2-support (d056059) with main (343e453)

Open in CodSpeed

@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.61404% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/llms/openai/common_utils.py 58.33% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@KevinZhao

Copy link
Copy Markdown
Contributor Author

Superseded by #30370, which targets litellm_oss_branch (the required base for fork contributions) and drops the uv.lock change (fork PRs cannot modify the lockfile).

@KevinZhao KevinZhao closed this Jun 13, 2026
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds opt-in HTTP/2 support for outbound LLM requests, default OFF. Enabled via litellm.enable_http2 = True, LITELLM_ENABLE_HTTP2 env var, or litellm_settings.enable_http2 in config.yaml. Requires the new litellm[http2] extra (h2 package).

  • New helpers _should_enable_http2(), _verify_http2_available(), _get_http2_limits(), and _http2_cache_key_suffix() centralize flag resolution and cache-key isolation so HTTP/2 and HTTP/1.1 client pools never share a cache entry.
  • AsyncHTTPHandler, HTTPHandler, and the OpenAI/Azure client builders in common_utils.py are wired to pass http2=True and optional pool limits to httpx; the force_ipv4 transport path applies them to the explicit transport object (since httpx ignores Client(http2=...) once a transport is passed).
  • Grace paths: a caller-supplied aiohttp shared_session silently falls back to HTTP/1.1 with a warning; a user-supplied litellm.sync_transport is returned as-is with a warning that HTTP/2 was not applied.

Confidence Score: 4/5

Safe to merge with the sync-transport ordering fix addressed; the default-off path leaves existing behavior completely unchanged.

The implementation is well-structured and the default-off guarantee is solid. One real defect exists in HTTPHandler.init: _verify_http2_available() fires before user_transport_wins is computed, meaning a user with a custom sync_transport and HTTP/2 enabled in their environment gets a spurious ImportError about h2 even though h2 is never used for that client. The async path handles the analogous shared_session case correctly, so the fix is clear. The remaining findings are minor inconsistencies that don't affect the common deployment paths.

litellm/llms/custom_httpx/http_handler.py (HTTPHandler.init ordering of _verify_http2_available vs user_transport_wins) and litellm/llms/openai/common_utils.py (_get_sync_http_client HTTP/2 path).

Important Files Changed

Filename Overview
litellm/llms/custom_httpx/http_handler.py Core HTTP/2 implementation: adds flag resolution, h2 availability check, pool-limit helpers, cache-key suffix, and wires http2/limits through AsyncHTTPHandler and HTTPHandler. Logic is sound with one ordering inconsistency around _verify_http2_available vs user_transport_wins.
litellm/llms/openai/common_utils.py Adds HTTP/2 wiring for OpenAI/Azure async and sync client builders. Async path builds the client inline (consistent with pre-existing pattern). Sync path delegates to HTTPHandler(ssl_verify=ssl_config).client — functional but extracts only .client from a throwaway HTTPHandler.
litellm/init.py Adds three new module-level globals (enable_http2, http2_max_connections, http2_max_keepalive_connections) defaulting to False/None. Minimal and correct.
litellm/proxy/health_endpoints/_health_endpoints.py Adds enable_http2 to /health/readiness response in both code paths. Correct and unambiguous.
tests/test_litellm/llms/custom_httpx/test_http2_support.py 490-line mock-only test suite covering flag resolution, transport selection, client construction, force_ipv4 interaction, shared_session priority, pool limits, cache isolation, and missing-h2 error. No real network calls. One test name is slightly misleading.
pyproject.toml Adds http2 optional extra pinning h2>=4.0.0,<5.0. Clean and appropriate.

Comments Outside Diff (1)

  1. tests/test_litellm/llms/custom_httpx/test_http2_support.py, line 661-664 (link)

    P2 Test name overclaims "global takes priority over env"

    The test verifies global=True overrides env=False, which is correct. But the inverse — global=False overriding env=True — is not true: _should_enable_http2() uses OR logic, so LITELLM_ENABLE_HTTP2=True enables HTTP/2 even when litellm.enable_http2 = False. test_env_var already confirms this behavior. Naming the test "global takes priority" implies a symmetry that doesn't exist and could mislead future readers into assuming programmatic litellm.enable_http2 = False is a reliable disable switch. Consider renaming to something like test_global_true_overrides_env_false and adding a complementary test that documents the env-wins-over-global-false case explicitly.

Reviews (1): Last reviewed commit: "feat(http): opt-in HTTP/2 for outbound L..." | Re-trigger Greptile

Comment on lines 1284 to 1318
default_headers = get_default_headers() if not disable_default_headers else None

if client is None:
transport = self._create_sync_transport()
http2_enabled = _should_enable_http2()
_http2_limits = _get_http2_limits() if http2_enabled else None
if http2_enabled:
_verify_http2_available()

transport = self._create_sync_transport(
http2=http2_enabled, limits=_http2_limits
)

# A user-supplied litellm.sync_transport (returned by
# _create_sync_transport when force_ipv4 is False) takes priority and
# is used as-is. httpx ignores Client(http2=...) once an explicit
# transport is passed, so HTTP/2 cannot be applied to it — warn rather
# than silently downgrade.
user_transport_wins = (
http2_enabled and not litellm.force_ipv4 and transport is not None
)
if user_transport_wins:
verbose_logger.warning(
"litellm: HTTP/2 is enabled but a custom litellm.sync_transport "
"was provided. httpx cannot apply HTTP/2 to an explicit transport "
"— this client will use the provided transport as-is. Set "
"http2=True on your transport to use HTTP/2."
)

# Create a client with a connection pool
self.client = httpx.Client(
client_kwargs: Dict[str, Any] = dict(
transport=transport,
timeout=timeout,
verify=ssl_config,
cert=cert,
headers=default_headers,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _verify_http2_available() called before user_transport_wins is known

_verify_http2_available() (which raises ImportError if h2 is absent) fires at line 1345 — before _create_sync_transport() is called and before user_transport_wins is computed. If the user has a custom litellm.sync_transport set, HTTP/2 will not be applied anyway (user_transport_wins=True at line 1357), but the h2 package is still required just to reach the warning. A user with a custom sync transport and LITELLM_ENABLE_HTTP2=True in their environment gets an ImportError("install h2") even though h2 is never used for that client.

The async path handles the analogous shared_session case before calling _verify_http2_available() (lines 684–694): it sets http2_enabled=False first, so the h2 check is skipped entirely. The sync path should apply the same pattern — compute transport first, derive user_transport_wins, and only then check h2 availability if HTTP/2 will actually be used.

Comment on lines +277 to +278
if _should_enable_http2():
return HTTPHandler(ssl_verify=ssl_config).client

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Throwaway HTTPHandler created to extract .client

HTTPHandler(ssl_verify=ssl_config).client instantiates a full HTTPHandler object (including a concurrent_requests semaphore, default-header computation, and the _verify_http2_available() side-effect on _HTTP2_AVAILABLE) just to return .client. The HTTPHandler wrapper is then discarded. By contrast, the async path in this same file builds the httpx.AsyncClient inline. For consistency and to avoid the wasted allocation, the sync path could also build the httpx.Client inline (mirroring the async path), or delegate to _get_httpx_client() and return .client from the cached instance.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants