Conversation
- Add BasilicaSandboxManager as thin wrapper around basilica-sdk-python - Add comprehensive test script with concurrent evaluation support - Remove basilica-sdk from optional dependencies (use local uv install) - Supports both warm pool (fast) and cold start sandbox creation Setup: cd ridges && uv pip install -e ../basilica/crates/basilica-sdk-python export BASILICA_API_URL=http://localhost:9080 export BASILICA_API_TOKEN=dev-token Usage: python test_basilica_sandbox.py --concurrent 50 --eval
- Reduce from 587 to 207 lines (~65% smaller) - Move imports to top level - Consolidate duplicate worker functions into single worker() - Add header() helper for consistent formatting - Simplify result tracking and error handling
- Add --dx flag for DX-focused showcase test - Use context managers for automatic sandbox cleanup - Use namespaced API (sandbox.files, sandbox.process) - Use python_sandbox() factory function - Demonstrate global configuration with basilica.configure() - Use improved concurrent test with context managers
- Reorganize tests around SDK capabilities, not test numbers - Showcase modern API patterns throughout (context managers, namespaced API) - Simplify CLI: --full for all tests, --scale N for stress test - Remove redundant DX showcase section (entire file is now the showcase) - Cleaner output with section headers and consistent formatting - Global basilica.configure() at top for cleaner test code
- Focus on testing ridges + Basilica integration, not SDK demos - Clean structure: SDK, SandboxManager, Polyglot, Concurrent - Simple CLI: --full for Polyglot, --scale N for stress test - Uses new SDK conventions (context managers, namespaced API) throughout
- Rename test_concurrent to test_concurrent_evals - Run real Polyglot evaluations instead of simple computations - Show test pass/fail breakdown for each evaluation - Include Polyglot test as standard (not just with --full) - --full now includes concurrent evals, --scale N for custom count
- Use basilica.configure() for global SDK configuration - Use python_sandbox() factory function - Use namespaced API (sandbox.files, sandbox.process) - Cleaner code with better comments
- basilica_sandbox_manager.py: 156 → 76 lines (-51%) - test_basilica_sandbox.py: 318 → 130 lines (-59%) - Same functionality, less boilerplate
WalkthroughIntroduces a Basilica Sandbox adapter ( Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Manager as BasilicaSandboxManager
participant Basilica as basilica.Sandbox
participant Container as Container FS
Client->>Manager: initialize_sandbox(script_path, input_data, on_mount)
Manager->>Basilica: create sandbox instance
Basilica->>Container: allocate container
opt on_mount callback
Manager->>Manager: call on_mount(target_path)
Manager->>Container: mount files via callback
end
Manager->>Container: write script to /sandbox
Manager->>Container: write input.json to /sandbox
Manager->>Client: return SandboxHandle
Client->>Manager: run_sandbox(handle)
Manager->>Basilica: execute script in sandbox
Basilica->>Container: run Python script
Container-->>Basilica: stdout/stderr/exit code
Manager->>Container: read output.json from /sandbox
Manager->>Basilica: cleanup sandbox
Manager->>Client: return SandboxResultWithLogs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @evaluator/sandbox/basilica_sandbox_manager.py:
- Around line 49-52: The current try/except around sandbox.files.write (using
open(local).read() and os.path.relpath) silently drops files on
UnicodeDecodeError/IOError; update the logic in basilica_sandbox_manager.py to
detect and handle binary/unreadable files instead of passing: attempt a text
read first but on UnicodeDecodeError fall back to binary read and call
sandbox.files.write with bytes if the API supports it, and always log a warning
(including the file path from os.path.relpath(local, tmp) and the exception)
when a file is skipped or an IO error occurs so users can see which files were
not mounted.
In @test_basilica_sandbox.py:
- Around line 29-32: Module-level basilica.configure is called with a possibly
empty BASILICA_API_TOKEN at import time, causing silent invalid configuration
for tests; instead, create a helper like _configure_basilica() that reads
BASILICA_API_TOKEN and BASILICA_API_URL and only calls basilica.configure when
the token is present (or raise/exit if required), then remove the top-level
basilica.configure call and invoke _configure_basilica() at the start of main()
and at the start of each test that needs Basilica so imports no longer configure
with an empty token.
🧹 Nitpick comments (4)
evaluator/sandbox/basilica_sandbox_manager.py (4)
35-35: Avoid mutable default argument.Using
{}as a default argument is a well-known Python pitfall—the same dict instance is shared across all calls.♻️ Proposed fix
def initialize_sandbox( self, *, name: str, script_path: str, input_data: Any = None, - env_vars: Dict[str, str] = {}, on_mount: Callable[[str], None] = None, - timeout_seconds: int = None + env_vars: Dict[str, str] | None = None, on_mount: Callable[[str], None] | None = None, + timeout_seconds: int | None = None ) -> SandboxHandle: script_name = os.path.basename(script_path) - sandbox = python_sandbox(runtime="container", env={**env_vars, "PYTHONUNBUFFERED": "1"}, + sandbox = python_sandbox(runtime="container", env={**(env_vars or {}), "PYTHONUNBUFFERED": "1"}, timeout_seconds=timeout_seconds or 3600)
49-55: Use context managers for file I/O to avoid resource leaks.
open(local).read()andopen(script_path).read()leave file handles unclosed until garbage collection. In concurrent scenarios, this could exhaust file descriptors.♻️ Proposed fix
local = os.path.join(root, f) try: - sandbox.files.write(f"/sandbox/{os.path.relpath(local, tmp)}", open(local).read()) + with open(local) as fh: + sandbox.files.write(f"/sandbox/{os.path.relpath(local, tmp)}", fh.read()) except (UnicodeDecodeError, IOError): pass shutil.rmtree(tmp, ignore_errors=True) - sandbox.files.write(f"/sandbox/{script_name}", open(script_path).read()) + with open(script_path) as fh: + sandbox.files.write(f"/sandbox/{script_name}", fh.read())
81-87: Consider explicitNonetype hints for PEP 484 compliance.The function works correctly, but explicit type hints improve clarity and static analysis.
♻️ Proposed fix
-def get_sandbox_manager(inference_gateway_url: str = None, backend: str = None): +def get_sandbox_manager(inference_gateway_url: str | None = None, backend: str | None = None): """Factory: returns BasilicaSandboxManager or SandboxManager."""
26-31: Document or utilize the unusedinference_gateway_urlparameter.The parameter exists for interface parity with
SandboxManagerbut is silently ignored. Consider adding a docstring note or using it as a fallback forBASILICA_API_URL.📝 Option: Add documentation
def __init__(self, inference_gateway_url: str = None): + """Initialize BasilicaSandboxManager. + + Args: + inference_gateway_url: Unused; kept for interface parity with SandboxManager. + Basilica uses BASILICA_API_URL env var instead. + """ api_url = os.environ.get("BASILICA_API_URL", "http://localhost:9080")
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
evaluator/sandbox/basilica_sandbox_manager.pypyproject.tomltest_basilica_sandbox.py
🧰 Additional context used
🧬 Code graph analysis (2)
evaluator/sandbox/basilica_sandbox_manager.py (2)
evaluator/models.py (2)
Sandbox(9-15)SandboxResultWithLogs(27-28)evaluator/sandbox/sandbox_manager.py (1)
SandboxManager(21-188)
test_basilica_sandbox.py (3)
evaluator/sandbox/basilica_sandbox_manager.py (3)
BasilicaSandboxManager(23-78)initialize_sandbox(33-59)run_sandbox(61-78)models/problem.py (1)
ProblemTestResultStatus(18-21)evaluator/problem_suites/problem_suite.py (2)
has_problem_name(36-37)get_problem(39-40)
🪛 Ruff (0.14.10)
evaluator/sandbox/basilica_sandbox_manager.py
26-26: Unused method argument: inference_gateway_url
(ARG002)
26-26: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
30-30: Avoid specifying long messages outside the exception class
(TRY003)
35-35: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
35-35: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
36-36: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
69-69: Do not use bare except
(E722)
77-77: Multiple statements on one line (colon)
(E701)
78-78: Do not use bare except
(E722)
78-78: try-except-pass detected, consider logging the exception
(S110)
78-78: Multiple statements on one line (colon)
(E701)
81-81: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
81-81: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
test_basilica_sandbox.py
34-34: Do not assign a lambda expression, use a def
Rewrite ok as a def
(E731)
35-35: Do not assign a lambda expression, use a def
Rewrite fail as a def
(E731)
36-36: Do not assign a lambda expression, use a def
Rewrite section as a def
(E731)
51-51: Consider moving this statement to an else block
(TRY300)
52-52: Do not catch blind exception: Exception
(BLE001)
53-53: Multiple statements on one line (semicolon)
(E702)
53-53: Multiple statements on one line (semicolon)
(E702)
62-62: Multiple statements on one line (semicolon)
(E702)
70-70: Consider moving this statement to an else block
(TRY300)
71-71: Do not catch blind exception: Exception
(BLE001)
72-72: Multiple statements on one line (semicolon)
(E702)
72-72: Multiple statements on one line (semicolon)
(E702)
81-81: Multiple statements on one line (semicolon)
(E702)
93-93: Consider moving this statement to an else block
(TRY300)
94-94: Do not catch blind exception: Exception
(BLE001)
95-95: Multiple statements on one line (semicolon)
(E702)
95-95: Multiple statements on one line (semicolon)
(E702)
110-110: Multiple statements on one line (colon)
(E701)
113-113: Multiple statements on one line (colon)
(E701)
122-122: Multiple statements on one line (colon)
(E701)
123-123: Do not catch blind exception: Exception
(BLE001)
125-125: Multiple statements on one line (semicolon)
(E702)
126-126: Multiple statements on one line (colon)
(E701)
149-149: Multiple statements on one line (semicolon)
(E702)
157-157: Multiple statements on one line (colon)
(E701)
🔇 Additional comments (7)
pyproject.toml (1)
28-30: LGTM - Clear setup instructions.The comments provide helpful guidance for developers who need to set up the Basilica SDK locally. Consider adding these instructions to a README or CONTRIBUTING.md for better discoverability.
evaluator/sandbox/basilica_sandbox_manager.py (1)
15-20: LGTM - Clean dataclass design.The
SandboxHandleprovides a clear abstraction for holding sandbox state between initialization and execution.test_basilica_sandbox.py (5)
39-53: LGTM - Good SDK integration test.The test covers connection, code execution, and file I/O with proper cleanup via context manager.
56-74: LGTM - Exercises manager end-to-end.The test validates initialization, script execution, and result extraction. The
finallyblock ensures cleanup of the temporary file.
77-95: LGTM - Good integration with problem suite.The test validates the full evaluation pipeline using the existing
POLYGLOT_PY_SUITEinfrastructure.
98-138: LGTM - Solid concurrent test with proper synchronization.The thread-safe stats collection and bounded thread pool are well-implemented. The 80% success threshold is a reasonable heuristic for integration tests.
One minor readability note: the list comprehension on line 101 is quite dense; consider extracting to a separate line for clarity.
141-165: LGTM - Clean CLI implementation.The Click-based CLI provides good options for full test runs, scaling, and verbosity control. The final summary and exit code handling are appropriate.
| try: | ||
| sandbox.files.write(f"/sandbox/{os.path.relpath(local, tmp)}", open(local).read()) | ||
| except (UnicodeDecodeError, IOError): | ||
| pass |
There was a problem hiding this comment.
Silent failure on binary/unreadable files may hide issues.
Silently passing on UnicodeDecodeError/IOError means binary files or files with encoding issues won't be mounted, and users won't know. Consider logging a warning or supporting binary file uploads.
🔧 Option: Log skipped files
try:
- sandbox.files.write(f"/sandbox/{os.path.relpath(local, tmp)}", open(local).read())
+ with open(local) as fh:
+ sandbox.files.write(f"/sandbox/{os.path.relpath(local, tmp)}", fh.read())
except (UnicodeDecodeError, IOError):
- pass
+ # Consider logging: logger.warning(f"Skipped non-text file: {local}")
+ pass📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| sandbox.files.write(f"/sandbox/{os.path.relpath(local, tmp)}", open(local).read()) | |
| except (UnicodeDecodeError, IOError): | |
| pass | |
| try: | |
| with open(local) as fh: | |
| sandbox.files.write(f"/sandbox/{os.path.relpath(local, tmp)}", fh.read()) | |
| except (UnicodeDecodeError, IOError): | |
| # Consider logging: logger.warning(f"Skipped non-text file: {local}") | |
| pass |
🤖 Prompt for AI Agents
In @evaluator/sandbox/basilica_sandbox_manager.py around lines 49 - 52, The
current try/except around sandbox.files.write (using open(local).read() and
os.path.relpath) silently drops files on UnicodeDecodeError/IOError; update the
logic in basilica_sandbox_manager.py to detect and handle binary/unreadable
files instead of passing: attempt a text read first but on UnicodeDecodeError
fall back to binary read and call sandbox.files.write with bytes if the API
supports it, and always log a warning (including the file path from
os.path.relpath(local, tmp) and the exception) when a file is skipped or an IO
error occurs so users can see which files were not mounted.
| try: | ||
| output = json.loads(handle.sandbox.files.read("/sandbox/output.json")) | ||
| except: | ||
| output = {"success": False, "error": "Failed to read output.json"} |
There was a problem hiding this comment.
Avoid bare except: — catch specific exceptions.
Bare except: catches SystemExit, KeyboardInterrupt, and other exceptions that typically shouldn't be swallowed. This also masks the actual failure reason when output.json is missing or malformed.
🐛 Proposed fix
try:
output = json.loads(handle.sandbox.files.read("/sandbox/output.json"))
- except:
+ except (json.JSONDecodeError, FileNotFoundError, Exception) as e:
output = {"success": False, "error": "Failed to read output.json"}
+ # Consider logging e for debugging🧰 Tools
🪛 Ruff (0.14.10)
69-69: Do not use bare except
(E722)
| basilica.configure( | ||
| api_url=os.environ.get("BASILICA_API_URL", "http://localhost:9080"), | ||
| api_key=os.environ.get("BASILICA_API_TOKEN", ""), | ||
| ) |
There was a problem hiding this comment.
Module-level basilica.configure with empty token may cause confusing failures.
If BASILICA_API_TOKEN is unset, this configures basilica with an empty API key at import time. The main() function checks for the token later (line 148), but tests imported without running main() would silently use an invalid configuration.
🐛 Option: Move configuration inside test functions or guard it
-basilica.configure(
- api_url=os.environ.get("BASILICA_API_URL", "http://localhost:9080"),
- api_key=os.environ.get("BASILICA_API_TOKEN", ""),
-)
+def _configure_basilica():
+ api_key = os.environ.get("BASILICA_API_TOKEN")
+ if not api_key:
+ raise ValueError("BASILICA_API_TOKEN required")
+ basilica.configure(
+ api_url=os.environ.get("BASILICA_API_URL", "http://localhost:9080"),
+ api_key=api_key,
+ )Then call _configure_basilica() at the start of each test or in main().
🤖 Prompt for AI Agents
In @test_basilica_sandbox.py around lines 29 - 32, Module-level
basilica.configure is called with a possibly empty BASILICA_API_TOKEN at import
time, causing silent invalid configuration for tests; instead, create a helper
like _configure_basilica() that reads BASILICA_API_TOKEN and BASILICA_API_URL
and only calls basilica.configure when the token is present (or raise/exit if
required), then remove the top-level basilica.configure call and invoke
_configure_basilica() at the start of main() and at the start of each test that
needs Basilica so imports no longer configure with an empty token.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.