Skip to content

feat(http): opt-in HTTP/2 for outbound LLM requests#30370

Open
KevinZhao wants to merge 4 commits into
BerriAI:litellm_oss_branchfrom
KevinZhao:feat/http2-oss
Open

feat(http): opt-in HTTP/2 for outbound LLM requests#30370
KevinZhao wants to merge 4 commits into
BerriAI:litellm_oss_branchfrom
KevinZhao:feat/http2-oss

Conversation

@KevinZhao

Copy link
Copy Markdown
Contributor

Relevant issues

Fixes #30362 (and the earlier stale-closed #3533) — outbound requests to upstream LLM providers always used HTTP/1.1.

Type

🆕 New Feature

Changes

Adds opt-in HTTP/2 for outbound LLM requests. Default is OFF, so existing behavior is byte-for-byte unchanged.

Why

litellm's default async transport is aiohttp, which cannot speak HTTP/2, and the httpx clients were never created with http2=True. As a result every outbound provider call used HTTP/1.1, even though providers like Bedrock, Vertex and Anthropic front their APIs with HTTP/2-capable endpoints. Enabling HTTP/2 forces the httpx transport and passes http2=True to the sync/async clients.

How to enable

# config.yaml
litellm_settings:
  enable_http2: true

or LITELLM_ENABLE_HTTP2=True, or litellm.enable_http2 = True (SDK).

Optional pool tuning: http2_max_connections / http2_max_keepalive_connections (or LITELLM_HTTP2_MAX_CONNECTIONS / LITELLM_HTTP2_MAX_KEEPALIVE_CONNECTIONS).

Implementation notes

  • New globals: enable_http2, http2_max_connections, http2_max_keepalive_connections (with fail-fast validation of pool limits).
  • Central AsyncHTTPHandler / HTTPHandler + get_async_httpx_client / _get_httpx_client cache — HTTP/2 clients are isolated from HTTP/1.1 in the cache via a key suffix that also encodes the pool limits (no stale-client reuse when the config changes).
  • OpenAI / Azure provider client builders (openai/common_utils.py) honor the flag, so OpenAI/Azure outbound calls also negotiate HTTP/2.
  • force_ipv4 path applies http2/limits to the explicit transport (httpx ignores Client(http2=..., limits=...) once a transport is passed).
  • A caller-supplied aiohttp shared_session takes priority over HTTP/2 (it cannot speak HTTP/2) — we warn and keep HTTP/1.1 rather than silently dropping the session.
  • /health/readiness now reports enable_http2.
  • New optional extra litellm[http2] (pulls in h2); a clear ImportError is raised if HTTP/2 is requested without h2 installed.

Pre-Submission checklist

  • I have added meaningful tests (tests/test_litellm/llms/custom_httpx/test_http2_support.py)
  • My PR's scope is as isolated as possible; it only solves 1 specific problem
  • Default-off path verified to be unchanged (existing custom_httpx + openai suites pass)

Screenshots / Proof of Fix

Functional verification — outbound HTTP/2 negotiated end-to-end

Verified on a real EKS cluster (ap-northeast-1) using the modified handler against real upstreams. All checks passed (response.http_version == "HTTP/2" on Bedrock and Anthropic, sync + async), and the default-off path still selects the aiohttp transport:

[PASS] default-off:_should_enable_http2: HTTP/2 correctly OFF by default
[PASS] default-off:transport: default transport=LiteLLMAiohttpTransport
[PASS] on:async-pool-http2-flag: async client pool _http2=True
[PASS] async-wire:bedrock-tokyo: http_version=HTTP/2 status=404
[PASS] async-wire:anthropic:     http_version=HTTP/2 status=404
[PASS] on:sync-pool-http2-flag: sync client pool _http2=True
[PASS] sync-wire:bedrock-tokyo:  http_version=HTTP/2 status=404
=== RESULT: 9/9 checks passed ===

Performance — the large-enterprise / connection-constrained case

This is where HTTP/2 matters most. Large deployments routing high request volume to a single provider host commonly run behind egress controls that cap the number of outbound TCP connections — corporate proxies, NAT Gateways, load balancers, or simply a deliberately small connection pool to conserve resources. Under HTTP/1.1 each connection carries one in-flight request, so concurrency is hard-capped at the pool size and excess requests queue. HTTP/2 multiplexes many streams over the same few connections, eliminating that queueing.

Benchmark: Claude Haiku 4.5 on Bedrock (ap-northeast-1), 60 concurrent small requests, 150 requests total, connection pool capped at 6 for both modes:

Mode Pool Throughput p50 p95 p99 Errors
HTTP/1.1 6 (constrained) 9.5 req/s 5.93s 6.41s 7.07s 0
HTTP/2 6 (constrained) 46.4 req/s 0.91s 1.43s 1.63s 0
HTTP/1.1 unconstrained (control) 50.9 req/s 0.91s 1.31s 1.71s 0

With the same 6 connections, HTTP/2 delivered ~4.9× the throughput (+388%) and cut p50 latency by ~85% (5.93s → 0.91s). The control row shows why: HTTP/1.1 with an unconstrained pool reaches the same ~0.9s p50 — so HTTP/2's win is specifically that it achieves unconstrained-level performance using a fraction of the connections. For enterprises whose outbound connection count is constrained by network policy, this is a direct, large latency/throughput improvement.

For the non-constrained, large-payload case (few big requests over many connections) HTTP/2 and HTTP/1.1 perform equivalently — HTTP/2 is not a universal speed-up, which is exactly why this ships as an opt-in flag rather than a default change.

Add an opt-in `litellm.enable_http2` flag (also `LITELLM_ENABLE_HTTP2`
env / config.yaml) that switches the outbound transport to httpx with
HTTP/2 enabled. Default is OFF, so existing behavior is unchanged.

Why: aiohttp (litellm's default async transport) cannot speak HTTP/2,
and the httpx clients were never created with http2=True, so all
outbound provider calls used HTTP/1.1. Enabling HTTP/2 forces the httpx
transport and passes http2=True to the sync/async clients.

- enable_http2 + http2_max_connections / http2_max_keepalive_connections
  globals (+ LITELLM_HTTP2_* env vars), with fail-fast validation
- central AsyncHTTPHandler / HTTPHandler + get_async/sync httpx client
  cache (h2 vs h1 isolated by cache-key suffix that also encodes limits)
- OpenAI/Azure provider client builders honor the flag
- force_ipv4 path applies http2/limits to the explicit transport
  (httpx ignores client http2=/limits= once a transport is passed)
- a shared aiohttp session takes priority over http2 (warns, stays h1)
- /health/readiness reports enable_http2
- requires the `h2` package (pip install h2); clear ImportError if missing
@KevinZhao KevinZhao requested a review from a team June 13, 2026 12:48
@CLAassistant

CLAassistant commented Jun 13, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.15966% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/llms/openai/common_utils.py 93.75% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

HTTPHandler(ssl_verify=...) only accepts bool|str|None, but get_ssl_configuration()
can return an ssl.SSLContext. HTTPHandler resolves SSL config internally, so pass no
ssl_verify — fixes a mypy arg-type error.
@KevinZhao

Copy link
Copy Markdown
Contributor Author

Status / CI notes

The two remaining red checks — documentation and code-quality — both fail on the same item:

Keys not documented in 'environment settings - Reference': {'LITELLM_ENABLE_HTTP2'}

documentation_test_env_keys validates env vars against config_settings.md, which now lives in the separate BerriAI/litellm-docs repo (CI checks it out into docs/my-website). Documenting the three new env vars (LITELLM_ENABLE_HTTP2, LITELLM_HTTP2_MAX_CONNECTIONS, LITELLM_HTTP2_MAX_KEEPALIVE_CONNECTIONS) therefore requires a companion PR in litellm-docs — happy to open that, just let me know the preferred section.

All code-owned issues are resolved (the earlier mypy ssl_verify arg-type error is fixed). The core-utils / auth-and-jwt failures earlier were pre-existing tokenizer tests hitting HuggingFace 429 Too Many Requests, unrelated to this change.

This PR deliberately does not modify uv.lock (fork PRs can't), so HTTP/2 requires pip install h2; a clear ImportError is raised otherwise.

@greptileai

@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds opt-in HTTP/2 support for outbound LLM requests (litellm.enable_http2, LITELLM_ENABLE_HTTP2, or litellm_settings.enable_http2: true in config). The default remains HTTP/1.1 via aiohttp, so existing behavior is unchanged.

  • Core transport wiring (http_handler.py): Five new helpers resolve the HTTP/2 flag, validate the h2 package, compute pool limits, and generate cache-key suffixes so HTTP/2 and HTTP/1.1 clients are isolated in the shared LRU cache. AsyncHTTPHandler.create_client and HTTPHandler.__init__ thread http2/limits through transport construction, with explicit handling of force_ipv4, caller-supplied shared_session (warns + falls back to HTTP/1.1), and user-supplied sync_transport.
  • OpenAI/Azure path (common_utils.py): _get_async_http_client passes http2/limits through _create_async_transport and sets matching kwargs on the resulting AsyncClient; _get_sync_http_client now returns the cached _get_httpx_client().client when HTTP/2 is enabled, sharing the pool rather than creating a fresh HTTPHandler per call; the shared_session downgrade warning is emitted directly from this builder since it bypasses AsyncHTTPHandler.create_client.
  • Tests (test_http2_support.py, test_openai_common_utils.py): 504 lines of unit tests cover flag resolution, transport selection, client construction, force_ipv4 + HTTP/2 limits, shared-session priority, cache isolation, and the missing-h2 error path — all without real network calls.

Confidence Score: 5/5

Safe to merge. The feature is default-off and all new code paths are gated behind the opt-in flag, leaving existing HTTP/1.1 behavior byte-for-byte unchanged.

The implementation is well-scoped: HTTP/2 is strictly opt-in, the aiohttp default path is untouched, and every non-trivial edge case (force_ipv4, shared_session, user sync_transport, missing h2 package, runtime config changes) is explicitly handled and tested. The previously identified issues in the OpenAI/Azure client builder are correctly resolved in this diff.

tests/test_litellm/llms/custom_httpx/test_http2_support.py — several test classes construct live HTTP/2 clients and will raise a bare ImportError rather than skip cleanly if the h2 package is absent from the test environment.

Important Files Changed

Filename Overview
litellm/init.py Adds three new module-level globals: enable_http2 (bool, default False), http2_max_connections (Optional[int]), and http2_max_keepalive_connections (Optional[int]). Clean default-off addition that doesn't touch existing settings.
litellm/llms/custom_httpx/http_handler.py Core implementation: adds helpers to resolve the HTTP/2 flag, validate h2, compute pool limits, and generate cache-key suffixes; threads http2/limits through AsyncHTTPHandler and HTTPHandler construction; handles force_ipv4, shared_session, and user sync_transport edge cases correctly.
litellm/llms/openai/common_utils.py Updates _get_async_http_client to pass http2/limits through _create_async_transport; emits shared-session downgrade warning inline; updates _get_sync_http_client to return cached _get_httpx_client().client when HTTP/2 is enabled.
litellm/proxy/health_endpoints/_health_endpoints.py Minimal change: imports _should_enable_http2 and adds enable_http2 to both readiness-response branches.
tests/test_litellm/llms/custom_httpx/test_http2_support.py 504 lines covering flag logic, transport selection, client construction, force_ipv4, shared-session priority, pool limits, cache isolation, and missing-h2 error path — all without real network calls. Tests constructing HTTP/2 clients will raise ImportError instead of skipping cleanly if h2 is absent.
tests/test_litellm/llms/openai/test_openai_common_utils.py Adds TestOpenAIClientHttp2 covering async/sync client construction, shared-session warning, and sync-client caching under HTTP/2.

Reviews (4): Last reviewed commit: "fix(http): restore 'Using AiohttpTranspo..." | Re-trigger Greptile

Comment thread litellm/llms/openai/common_utils.py Outdated
Comment thread litellm/llms/openai/common_utils.py Outdated
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds opt-in HTTP/2 support for outbound LLM requests behind a litellm.enable_http2 flag (default False). When enabled it switches the transport from aiohttp to httpx with http2=True, applies configurable connection-pool limits, isolates HTTP/2 clients in the existing LRU cache via a key suffix, and honors the flag across AsyncHTTPHandler, HTTPHandler, and the OpenAI/Azure provider client builders.

  • http_handler.py gains _should_enable_http2, _verify_http2_available, _get_http2_limits, _http2_cache_key_suffix, and extends _create_async_transport / _create_sync_transport to thread http2 and limits through the force_ipv4 path correctly.
  • common_utils.py updates both async and sync OpenAI client builders; the sync path delegates to HTTPHandler() when HTTP/2 is on, but creates an uncached instance per call (see inline comment).
  • A new 490-line test file covers flag resolution, transport selection, pool-limit validation, cache isolation, force_ipv4 interaction, shared_session priority, and the missing-h2 error path.

Confidence Score: 4/5

The default-off path is unchanged and the new code is well-isolated behind the opt-in flag, but the sync OpenAI HTTP/2 path creates a fresh connection pool per call rather than reusing the cached pool.

The implementation is thorough and the default (HTTP/1.1) behavior is byte-for-byte preserved. The one functional concern is that _get_sync_http_client() with HTTP/2 enabled instantiates a new HTTPHandler() per call instead of going through _get_httpx_client(), meaning sync OpenAI calls do not share a connection pool across requests if the upstream OpenAI client object is not long-lived. This could silently undercut the connection-reuse benefit that motivates the feature.

litellm/llms/openai/common_utils.py — the sync HTTP/2 branch in _get_sync_http_client() and the dead ssl_config assignment above it.

Important Files Changed

Filename Overview
litellm/init.py Adds three new module-level globals (enable_http2, http2_max_connections, http2_max_keepalive_connections), all defaulting to off/None — clean, backward-compatible addition.
litellm/llms/custom_httpx/http_handler.py Core HTTP/2 logic: adds helper functions, extends transport creation for async and sync handlers, adds cache-key suffix for HTTP/2 isolation. Logic is sound; the uncached-pool concern is in the common_utils.py caller.
litellm/llms/openai/common_utils.py _get_async_http_client() correctly threads HTTP/2 settings through transport creation; _get_sync_http_client() with HTTP/2 delegates to HTTPHandler().client, bypassing the _get_httpx_client() cache, and leaves ssl_config as dead code in the HTTP/2 branch.
litellm/proxy/health_endpoints/_health_endpoints.py Minimal, correct addition of enable_http2 field to both readiness detail response paths.
tests/test_litellm/llms/custom_httpx/test_http2_support.py Comprehensive unit test coverage with proper fixture-based global/env restoration; tests that construct real HTTP/2 clients need a pytest.importorskip("h2") guard to avoid opaque ImportError when h2 is absent.

Comments Outside Diff (1)

  1. tests/test_litellm/llms/custom_httpx/test_http2_support.py, line 685-706 (link)

    P2 Tests that construct HTTP/2 clients will raise ImportError instead of skipping when h2 is absent

    TestClientConstruction, TestForceIpv4WithHttp2, TestHttp2Limits, and TestCacheIsolation all call AsyncHTTPHandler() or HTTPHandler() with enable_http2=True, which triggers _verify_http2_available()import h2. If h2 is not installed the tests crash with a bare ImportError rather than being marked as skipped. Adding a module-level guard would produce a clean skip:

    h2 = pytest.importorskip("h2", reason="h2 package required for HTTP/2 tests")

    This is especially important because the PR explicitly states that uv.lock was not modified and h2 must be installed separately (pip install h2).

Reviews (2): Last reviewed commit: "fix(http): drop SSLContext arg to HTTPHa..." | Re-trigger Greptile

Comment thread litellm/llms/openai/common_utils.py Outdated
Comment thread litellm/llms/openai/common_utils.py Outdated
…ders

- async: emit a warning when a shared aiohttp session is combined with
  enable_http2 (this builder calls the static _create_async_transport
  directly, so it must warn itself rather than relying on
  AsyncHTTPHandler.create_client). Previously it silently downgraded to HTTP/1.1.
- sync: route the HTTP/2 path through the cached _get_httpx_client() instead
  of instantiating a fresh HTTPHandler() per call, so sync OpenAI requests
  share a single connection pool (the connection-reuse benefit HTTP/2 is for).
  Also drop the dead ssl_config assignment on the HTTP/2 branch.
- tests: cover both builders' default-off, http2-on, shared-session-warning,
  and sync cache-reuse behavior.
@KevinZhao

Copy link
Copy Markdown
Contributor Author

Addressed both Greptile findings in common_utils.py (commit pushed):

  1. async shared_session warning gap_get_async_http_client now emits an explicit verbose_logger.warning when enable_http2 is combined with a shared_session, instead of silently downgrading to HTTP/1.1. Fixed the misleading comment too (this builder calls the static _create_async_transport directly, so it must warn itself).

  2. sync untracked connection pool_get_sync_http_client now routes the HTTP/2 path through the cached _get_httpx_client() instead of HTTPHandler() per call, so sync OpenAI requests share one connection pool (the reuse benefit HTTP/2 is meant to provide). Also removed the dead ssl_config assignment on that branch.

Added tests in test_openai_common_utils.py covering both builders: default-off (aiohttp / HTTP/1.1), http2-on, the shared-session warning + HTTP/1.1 fallback, and sync cache reuse.

@greptileai

@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Both fixes look correct and complete.

Async shared-session warning (_get_async_http_client): The explicit verbose_logger.warning + http2_enabled = False path is correct. The comment explaining why this builder must warn itself (it calls _create_async_transport directly, bypassing create_client's guard) makes the intent clear for future readers.

Sync pool reuse (_get_sync_http_client): Routing the HTTP/2 path through _get_httpx_client().client is the right call — it reuses the same cached HTTPHandler (and thus the same connection pool) across all sync OpenAI requests, which is exactly the multiplexing benefit HTTP/2 is meant to provide. The early return before the default ssl_config / httpx.Client path is clean.

The test coverage in test_openai_common_utils.py is solid:

  • test_async_shared_session_overrides_http2_and_warns validates both the transport fallback ("Aiohttp" in tname) and that the warning was actually emitted.
  • test_sync_client_http2_enabled_and_cached correctly asserts c1 is c2 — since _get_sync_http_client returns _get_httpx_client().client, the identity check confirms the same underlying HTTPHandler (and its pool) is being reused across calls.

No further concerns on these two items.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

@KevinZhao

Copy link
Copy Markdown
Contributor Author

recheck

@KevinZhao

Copy link
Copy Markdown
Contributor Author

The companion docs PR is ready: BerriAI/litellm-docs#346

It adds the three new env vars (LITELLM_ENABLE_HTTP2, LITELLM_HTTP2_MAX_CONNECTIONS, LITELLM_HTTP2_MAX_KEEPALIVE_CONNECTIONS) to the environment variables - Reference table in config_settings.md.

The two remaining red checks here (documentation + code-quality) are the same documentation_test_env_keys check, which reads config_settings.md from the litellm-docs repo. Once litellm-docs#346 is merged, those checks will go green on a re-run.

CLA is signed ✅ ("All committers have signed the CLA"). All other CI is green.

@Sameerlite

Copy link
Copy Markdown
Collaborator

@KevinZhao can you get the greptile score to 5/5? Thanks

@KevinZhao

Copy link
Copy Markdown
Contributor Author

Thanks for the review! Status update:

greptile feedback — all addressed in 35d35e5505:

  • P1 silent HTTP/1.1 downgrade (shared_session + HTTP/2): _get_async_http_client now explicitly emits verbose_logger.warning and drops to HTTP/1.1 when a shared aiohttp session is supplied (it calls the static _create_async_transport directly, so it warns itself rather than relying on create_client).
  • P1 uncached sync HTTP/2 pool: _get_sync_http_client now returns _get_httpx_client().client, routing through the in_memory_llm_clients_cache LRU. The cache key carries an HTTP/2 suffix (_http2_cache_key_suffix) so HTTP/2 clients and distinct pool limits stay isolated from HTTP/1.1 ones — multiplexing / connection reuse now works.
  • P2 untracked HTTPHandler per init: resolved by the same change — the wrapper and its pool are now cached/shared instead of discarded per call.
  • P2 ssl_config dead code: the sync HTTP/2 branch now early-returns the cached client (SSL resolved internally in HTTPHandler.__init__), and ssl_config is only computed/used in the HTTP/1.1 else branch.

CI: the failing documentation / code-quality checks are due to test_env_keys.py requiring the new LITELLM_ENABLE_HTTP2 env var to be documented. Docs now live in the litellm-docs repo, so I opened BerriAI/litellm-docs#355 to add it. Once that merges, re-running these two checks (which check out litellm-docs@main) should turn them green.

…ort path

The HTTP/2 refactor moved transport selection out of
_should_use_aiohttp_transport() (now global-config only) into
_create_async_transport(), which dropped the 'Using AiohttpTransport...'
verbose debug log on the path that actually builds the aiohttp transport.
Restore it there and add a regression test.
@KevinZhao

Copy link
Copy Markdown
Contributor Author

@greptileai review

@Sameerlite

Copy link
Copy Markdown
Collaborator

Thanks for the contribution!

We're triggering a Greptile code review on this PR — we'll take a closer look once the results are in!

@greptileai

KevinZhao added a commit to KevinZhao/litellm-docs that referenced this pull request Jun 18, 2026
Register LITELLM_ENABLE_HTTP2, LITELLM_HTTP2_MAX_CONNECTIONS and
LITELLM_HTTP2_MAX_KEEPALIVE_CONNECTIONS in the environment variables
reference table so the documentation_tests/test_env_keys.py doc-lint
(run in BerriAI/litellm CI) passes for the outbound HTTP/2 feature
(BerriAI/litellm#30370).
@KevinZhao

Copy link
Copy Markdown
Contributor Author

Heads-up on the two failing checks (documentation / code-quality)

Both jobs fail on the same doc-lint (tests/documentation_tests/test_env_keys.py):

Keys not documented in 'environment settings - Reference': {'LITELLM_ENABLE_HTTP2'}

The check enforces that every os.getenv(...) key is documented in config_settings.md. That file now lives in the separate BerriAI/litellm-docs repo — both workflows check it out into docs/my-website from litellm-docs@main before running the test, so the fix can't go in this PR.

I've opened the docs-side change here: BerriAI/litellm-docs#346 — it registers LITELLM_ENABLE_HTTP2, LITELLM_HTTP2_MAX_CONNECTIONS, and LITELLM_HTTP2_MAX_KEEPALIVE_CONNECTIONS in the environment-variables reference table (append-only, no existing rows touched). Verified locally against test_env_keys.py → exit 0, All keys are documented.

Once litellm-docs#346 is merged into litellm-docs@main, a re-run of documentation + code-quality here will go green. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants