feat: add H3 transport infrastructure, sandbox URL resolution, and header refresh#112
feat: add H3 transport infrastructure, sandbox URL resolution, and header refresh#112devin-ai-integration[bot] wants to merge 18 commits into
Conversation
- Add h3warm.py utility with aioquic-based QUIC connection warming - Warm H3 connections to regional edge domains in SandboxInstance.create() and SyncSandboxInstance.create() in parallel with sandbox creation API call - Warm H3 connection to api.blaxel.ai during SDK autoload - Add proper H3 session cleanup in delete() for both async and sync sandboxes - Export close_api_h3_session() for manual API session cleanup - Add aioquic>=1.2.0 as core dependency Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
DevinAI fix the code to pass the tests cases please |
- Introduced AsyncH3FallbackTransport and SyncH3FallbackTransport classes to handle automatic downgrading from H3 to HTTP/2 on connection failures. - Updated H3Pool to remember failed hosts for a specified TTL to optimize connection attempts. - Enhanced sandbox actions to utilize HTTP/2 when H3 transport is unavailable. This change improves resilience and performance in scenarios where H3 connections may fail.
- Make all h3transport/aioquic imports lazy (inside functions) so that aioquic is never loaded unless sandbox operations actually need it. This prevents the H3Pool background event loop from interfering with pytest-asyncio during integration tests. - Use httpx event hooks to inject fresh settings.headers on every request instead of baking them at client creation time. This ensures token refreshes are picked up automatically without recreating the client. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The _warm_api_h3() background thread creates a persistent event loop via H3Pool._ensure_bg_loop() that interferes with pytest-asyncio in integration tests. The API endpoint (api.blaxel.ai) is never accessed through h3transport anyway — only sandbox data-plane endpoints use H3. Sandbox edge-domain warming already happens in SandboxInstance.create(). Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…on tests The H3 warming task (asyncio.create_task for QUIC connection) was interfering with the MCP client's streamablehttp_client during integration tests, causing 'Session terminated' errors. The warming imported aioquic and created QUIC protocol handlers on the event loop that disrupted anyio task groups used by the MCP library. The h3transport module and fallback transports remain available for future use. Sandbox data-plane calls still benefit from HTTP/2 via the event-hook headers pattern. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ding http-stream Sandboxes now resolve their direct data-plane URL from metadata, then probe that URL for transport type (websocket vs http-stream) dynamically. This preserves backward compatibility with sandboxes that may still use the gateway URL or WebSocket transport. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
DevinAI, address mendral comments please. |
1. tools/__init__.py: probe_url now respects use_fallback_url so fallback probes the gateway URL instead of the unreachable direct sandbox URL 2. h3transport.py: move _async_transports.pop() inside the lock in _mark_failed() to fix data race with _get_or_connect() 3. pyproject.toml: move aioquic to optional [h3] extras group instead of core dependency — users who don't need H3 transport avoid the heavy transitive deps (pyopenssl, pylsqpack, service-identity) Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
Addressed all three Mendral comments in
|
Wrap aioquic imports in try/except so h3transport module can be imported even when aioquic is not installed. The pool singleton and helper functions gracefully degrade to HTTP/2 or HTTP/1.1 when aioquic is unavailable. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
There was a problem hiding this comment.
🤖 Code Review
Assessment ✅
All three previously flagged issues are now resolved in commits caf8e5a and 8f55cd7:
-
aioquicoptional dependency ✅ — moved from coredependenciesto[project.optional-dependencies]underh3 = ["aioquic>=1.2.0"]. Imports are wrapped intry/except ImportErrorwithAIOQUIC_AVAILABLEguard, and the pool singleton isNonewhen aioquic is absent. -
_mark_faileddata race ✅ —_async_transports.pop()is now inside thewith self._lock:block alongside_connect_contexts.pop(), making the cleanup atomic. -
probe_urlfallback correctness ✅ —_get_transport_type()now uses(self._resolved_url if self._resolved_url and not self.use_fallback_url else None) or self._url, so wheninitialize()setsuse_fallback_url = Trueafter a failed connection, the probe correctly targets the gateway URL rather than the unreachable direct sandbox URL._get_transport()uses the same guard, keeping URL selection consistent.
No new issues introduced by the two new commits. The PR is ready to merge.
Note
Tag @mendral-app with feedback or questions. View session
When aioquic is not installed (optional dependency), class definitions that inherit from QuicConnectionProtocol or reference aioquic types caused NameError at import time. Now all H3 transport classes, fallback transports, connection pool, and the pool singleton are inside the AIOQUIC_AVAILABLE guard. Helper functions gracefully return None. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
There was a problem hiding this comment.
🤖 Code Review
Assessment ✅
All three previously flagged high-severity issues are now resolved:
-
_mark_faileddata race — ✅ Fixed incaf8e5a:_async_transports.pop()is now inside thewith self._lock:block alongside_connect_contexts.pop(). -
aioquicas mandatory core dependency — ✅ Fixed incaf8e5a+8f55cd7: moved to[project.optional-dependencies]under theh3extras group; all aioquic imports are now conditional viatry/except, and all aioquic-dependent classes are guarded insideif AIOQUIC_AVAILABLE:. -
Probe URL using wrong URL on fallback — ✅ Fixed in
cbfb12f:probe_urlnow correctly evaluatesself._resolved_url if self._resolved_url and not self.use_fallback_url else None) or self._url, so whenuse_fallback_url=True, the probe falls through toself._urlwhich returns the gateway fallback URL.
The code is in good shape. No new issues found in the latest commits.
Note
Tag @mendral-app with feedback or questions. View session
cploujoux
left a comment
There was a problem hiding this comment.
Please review AI comments before assigning me issue
|
DevinAI, can you address the comments of the review? |
…n in _get_or_connect 1. default/sandbox.py & sync/sandbox.py: Apply extracted region back to sandbox.spec.region in the else branch (was dead code before). Also replaced getattr() with direct attribute access. 2. h3transport.py: Hold async lock across the entire check+connect+store sequence in _get_or_connect() to prevent duplicate QUIC connections for the same (host, port) that would leak connection contexts. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
| sandbox.spec.runtime.memory = sandbox.spec.runtime.memory or default_memory | ||
|
|
||
| # Extract region from existing Sandbox spec and apply it | ||
| region = sandbox.spec.region or settings.region |
There was a problem hiding this comment.
Fixed in a10bb3f — region is now applied back to sandbox.spec.region in the else branch, matching the behavior of the if branch (line 224). Also replaced getattr(sandbox.spec, "region", None) with direct attribute access sandbox.spec.region.
| sandbox.spec.runtime.memory = sandbox.spec.runtime.memory or default_memory | ||
|
|
||
| # Extract region from existing Sandbox spec and apply it | ||
| region = sandbox.spec.region or settings.region |
There was a problem hiding this comment.
Fixed in a10bb3f — same fix as async variant. Region is now applied back to sandbox.spec.region in the sync else branch.
| url = f"{self.url}/watch/filesystem/{path}" | ||
| headers = {**settings.headers, **self.sandbox_config.headers} | ||
| async with httpx.AsyncClient() as client_instance: | ||
| from ...common.h3transport import get_async_transport_for_url |
|
|
||
| try: | ||
| async with httpx.AsyncClient() as client_instance: | ||
| from ...common.h3transport import get_async_transport_for_url |
| sandbox.spec.runtime.memory = sandbox.spec.runtime.memory or default_memory | ||
|
|
||
| # Extract region from existing Sandbox spec and apply it | ||
| region = sandbox.spec.region or settings.region |
| python -m tests.benchmarks.bench_cold_call --iterations 5 --warmup 0 | ||
| """ | ||
|
|
||
| import argparse |
There was a problem hiding this comment.
Make sure this benchmark is not started on test
| core = [] | ||
|
|
||
| # HTTP/3 (QUIC) transport support | ||
| h3 = ["aioquic>=1.2.0"] |
There was a problem hiding this comment.
So we have an optimisation but it is going to be optionnal for now ?
…efactor) Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
| def _get_async_lock(self) -> asyncio.Lock: | ||
| if self._async_lock is None: | ||
| self._async_lock = asyncio.Lock() | ||
| return self._async_lock |
There was a problem hiding this comment.
🔴 asyncio.Lock shared across different event loops provides no mutual exclusion
The H3Pool uses a single asyncio.Lock (via _get_async_lock) that is shared between coroutines running on different event loops. get_async_transport calls _get_or_connect on the caller's application event loop, while get_sync_transport schedules _get_or_connect on a separate background thread's event loop (self._bg_loop). An asyncio.Lock is designed for use within a single event loop — when a coroutine on loop A holds the lock and a coroutine on loop B tries to acquire it, the waiter's Future belongs to loop B but release() on loop A would call set_result() on that foreign-loop Future, which is not thread-safe. This defeats the stated purpose of preventing duplicate QUIC connections for the same (host, port), and can lead to leaked connections or crashes when sync and async transports are used concurrently.
Prompt for agents
In src/blaxel/core/common/h3transport.py, the H3Pool._get_async_lock() method (lines 283-286) returns a single asyncio.Lock that is used from two different event loops: the application loop (via get_async_transport) and the background thread loop (via get_sync_transport). asyncio.Lock does not work correctly across event loops.
To fix this, either:
1. Use separate asyncio.Lock instances per event loop (e.g., a dict keyed by loop id), or
2. Use a threading.Lock instead of asyncio.Lock for cross-loop synchronization (but be careful not to block the event loop while holding it — you may need to use asyncio.to_thread or run_in_executor), or
3. Ensure _get_or_connect is always called on the same event loop (e.g., always on the background loop) so the asyncio.Lock is only ever used from one loop.
Additionally, the _get_async_lock method has a race condition: the check-then-assign (if self._async_lock is None: self._async_lock = asyncio.Lock()) is not protected by any lock, so two threads could both create different Lock instances, with one overwriting the other. This should be protected by self._lock (the threading.Lock).
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Needs attention — 1 issue in 1 file
All three previously flagged issues are resolved: _mark_failed data race fixed, aioquic moved to optional extras, and the probe-URL/fallback issue is moot after the tools/__init__.py refactor from main. One new memory leak in _H3Transport needs fixing before H3 is wired into the hot path.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<assessment>
All three previously flagged issues are resolved: `_mark_failed` data race fixed, `aioquic` moved to optional extras, and the probe-URL/fallback issue is moot after the `tools/__init__.py` refactor from main. One new memory leak in `_H3Transport` needs fixing before H3 is wired into the hot path.
</assessment>
<file name="src/blaxel/core/common/h3transport.py">
<issue location="src/blaxel/core/common/h3transport.py:143">
`_read_queue` and `_read_ready` entries are never deleted after a stream completes, leaking memory on long-lived QUIC connections. Each request adds two dict entries that grow unboundedly.
</issue>
</file>
Tag @mendral-app with feedback or questions. View session
| async def _receive_response_data( | ||
| self, stream_id: int, stream_ended: bool | ||
| ) -> AsyncIterator[bytes]: | ||
| while not stream_ended: | ||
| event = await self._wait_for_http_event(stream_id) | ||
| if isinstance(event, DataReceived): | ||
| stream_ended = event.stream_ended | ||
| yield event.data | ||
| elif isinstance(event, HeadersReceived): | ||
| stream_ended = event.stream_ended | ||
|
|
||
| async def _wait_for_http_event(self, stream_id: int) -> H3Event: | ||
| if not self._read_queue[stream_id]: | ||
| await self._read_ready[stream_id].wait() | ||
| event = self._read_queue[stream_id].popleft() | ||
| if not self._read_queue[stream_id]: | ||
| self._read_ready[stream_id].clear() | ||
| return event |
There was a problem hiding this comment.
bug (P2): _read_queue and _read_ready entries are never deleted after a stream completes, leaking memory on long-lived QUIC connections. Each request adds two dict entries that grow unboundedly.
Suggested change
| async def _receive_response_data( | |
| self, stream_id: int, stream_ended: bool | |
| ) -> AsyncIterator[bytes]: | |
| while not stream_ended: | |
| event = await self._wait_for_http_event(stream_id) | |
| if isinstance(event, DataReceived): | |
| stream_ended = event.stream_ended | |
| yield event.data | |
| elif isinstance(event, HeadersReceived): | |
| stream_ended = event.stream_ended | |
| async def _wait_for_http_event(self, stream_id: int) -> H3Event: | |
| if not self._read_queue[stream_id]: | |
| await self._read_ready[stream_id].wait() | |
| event = self._read_queue[stream_id].popleft() | |
| if not self._read_queue[stream_id]: | |
| self._read_ready[stream_id].clear() | |
| return event | |
| async def _receive_response_data( | |
| self, stream_id: int, stream_ended: bool | |
| ) -> AsyncIterator[bytes]: | |
| try: | |
| while not stream_ended: | |
| event = await self._wait_for_http_event(stream_id) | |
| if isinstance(event, DataReceived): | |
| stream_ended = event.stream_ended | |
| yield event.data | |
| elif isinstance(event, HeadersReceived): | |
| stream_ended = event.stream_ended | |
| finally: | |
| self._read_queue.pop(stream_id, None) | |
| self._read_ready.pop(stream_id, None) | |
| async def _wait_for_http_event(self, stream_id: int) -> H3Event: | |
| if not self._read_queue[stream_id]: | |
| await self._read_ready[stream_id].wait() | |
| event = self._read_queue[stream_id].popleft() | |
| if not self._read_queue[stream_id]: | |
| self._read_ready[stream_id].clear() | |
| return event |
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/blaxel/core/common/h3transport.py, line 143:
<issue>
`_read_queue` and `_read_ready` entries are never deleted after a stream completes, leaking memory on long-lived QUIC connections. Each request adds two dict entries that grow unboundedly.
</issue>
cploujoux
left a comment
There was a problem hiding this comment.
Need fix conflict
and test everything works, t4est also with openai agent sdk
feat: add H3 transport infrastructure, sandbox URL resolution, and header refresh
Summary
Adds HTTP/3 (QUIC) transport infrastructure to the Python SDK and fixes several issues with sandbox creation and header caching.
What's included
H3 transport module (
h3transport.py,h3warm.py): Full QUIC/H3 transport layer usingaioquic— includesH3Poolwith connection pooling,AsyncH3FallbackTransport/SyncH3FallbackTransportthat auto-downgrade to HTTP/2 on QUIC failure, and a negative cache with 300s TTL. This infrastructure is available for future use but not yet wired into the hot path (see note below).Fresh headers via event hooks (
default/action.py,sync/action.py):SandboxAction.get_client()now uses httpxevent_hooksto injectsettings.headerson every request, fixing a bug where the H2 client cached headers at creation time and missed token refreshes.Region apply-back fix (
default/sandbox.py,sync/sandbox.py): Theelsebranch increate()(for rawSandboxobjects) now applies the extracted region back tosandbox.spec.region. Previously, the fallback tosettings.regionwas computed but never written back.Race condition fix (
h3transport.py):H3Pool._get_or_connect()now holds the async lock across the entire check+connect+store sequence to prevent duplicate QUIC connections for the same(host, port).Cold-call benchmark (
tests/benchmarks/bench_cold_call.py): Measures end-to-end latency of sandbox lifecycle operations (create → call → delete) with per-phase breakdown.Important context
SandboxInstance.create()but was removed because the background QUIC connections interfered withanyiotask groups used by the MCP library. The H3 transport modules remain available for future integration.aioquicis an optional dependency under[h3]extras (pip install blaxel[h3]). All H3 classes are guarded withif AIOQUIC_AVAILABLE:so the module imports cleanly without aioquic.tools/__init__.pysandbox URL resolution changes from earlier revisions have been superseded by main's refactor (merged in commit4c32ad1). Main introduced_fetch_metadata_url/_resolve_url/ retry-basedinitializewhich replaces our earlier_resolved_url/_get_transport_type/ fallback approach. This PR now carries no diff intools/__init__.py.Updates since last revision
tools/__init__.py. Accepted main's refactored URL resolution (_metadata_url,_fetch_metadata_url,_resolve_url, retry-basedinitialize) which supersedes the sandbox URL resolution code previously in this PR.Changed files:
h3transport.py(new): Full async/sync H3 transport with connection pooling, HTTP/2 fallback, andAIOQUIC_AVAILABLEguardsh3warm.py(new): QUIC connection warming utilitydefault/action.py,sync/action.py: httpx event hook for dynamic header injectiondefault/sandbox.py,sync/sandbox.py: Region extraction and apply-backsandbox/types.py:h3_transportfield added toSandboxConfigurationautoload.py: Minor comment cleanuppyproject.toml: Adds[h3]optional extras withaioquic>=1.2.0default/filesystem.py,sync/filesystem.py,default/process.py,sync/process.py: H3 transport scaffolding (currently unused)tests/benchmarks/bench_cold_call.py(new): Cold-call latency benchmarktests/integration/openai/test_tools.py: Addswait_for_sandbox_deployedguardReview & Testing Checklist for Human
Sandboxobjects withoutspec.regionwill now inheritsettings.region. Verify this doesn't break code that expects sandboxes without regions._get_or_connect()serializes all connection attempts (not just same-host), which may impact performance if H3 warming is re-enabled for multiple hosts concurrently.import blaxel.coreworks whenaioquicis not installed and that H3 gracefully falls back to HTTP/2.Test Plan
import blaxel.coreworks whenaioquicis not installed.Sandboxobject withoutspec.regionand verify it inheritssettings.region.Notes
Requested by @Joffref
Devin Session: https://app.devin.ai/sessions/f158717eb819492e8e4307d1ca9a658f
Originally intended to mirror the TypeScript SDK's H2 connection warming (sdk-typescript#260), but background QUIC tasks interfered with MCP library internals. The H3 transport infrastructure remains for future integration. Since aioquic is now an optional dependency, users can experiment with H3 via
pip install blaxel[h3].Note
This PR adds HTTP/3 (QUIC) transport infrastructure as an optional dependency (
pip install blaxel[h3]), fixes sandbox MCP URL resolution by fetching direct data-plane URLs from the management API, and replaces static header baking with httpx event hooks for automatic token refresh. The H3 transport is available for future use but not yet wired into the hot path. The last commit merges main'stools/__init__.pyrefactor (PR #119), which replaced the probe-based transport detection with a simpler retry loop.Written by Mendral for commit 4c32ad1.