feat(sandbox): Docker sandbox execution with integration tests and examples by buger · Pull Request #337 · probelabs/visor

buger · 2026-02-06T16:17:01Z

Summary

Docker sandbox execution: Run checks inside isolated Docker containers (image, Dockerfile, docker-compose modes)
Sandbox infrastructure: src/sandbox/ with types, DockerImageSandbox, DockerComposeSandbox, CacheVolumeManager, SandboxManager, CheckRunner, EnvFilter
Engine integration: Sandbox lifecycle in executeGroupedChecks, CheckRunner dispatch for sandboxed checks, --run-check CLI mode for child visor inside containers
Config validation: Mode exclusivity, compose requires service, cache path validation, resource limits
8 unit test suites (53 tests) covering all sandbox components
10 Docker integration tests (auto-skip without Docker): image mode, workspace defaults, per-check overrides, env passthrough, read-only mounts, network isolation, inline Dockerfile, --run-check payload round-trip, container reuse, cache volumes
6 example YAML configs: sandbox-basic, sandbox-multi-env, sandbox-env-passthrough, sandbox-read-only, sandbox-dockerfile-inline, sandbox-cache

Test plan

npx jest --testPathPatterns="sandbox-docker" --no-coverage — 10 tests pass with Docker, skip cleanly without
npx jest --testPathPatterns="sandbox-" --no-coverage — all 9 sandbox test suites pass (63 total tests)
npx jest --no-coverage — full suite passes (103/104 suites, 1193 tests)
npm run build succeeds
Example YAMLs are valid (tested via inline config objects in integration tests)

Known limitation

filterEnvForSandbox forwards host PATH into containers, which can break sh on systems where /bin is not in PATH (e.g., Arch Linux). The --run-check integration test uses DockerImageSandbox directly with explicit env to work around this.

🤖 Generated with Claude Code

… caps; forEach-scoped counters; deterministic backoff; tests + RFC - Engine: executeWithRouting for single and forEach items - Inline remediation execution with per-item outputs - Dependency validation then expansion; robust inline dep resolution - RFC updated with goto_js/run_js and on_success semantics - Integration tests for retry+goto, on_success goto, and forEach remediation (robust assertions)

…l superseded runs - Job name: includes matrix.ai_provider.name and event\n- Concurrency: cancel in-progress runs per branch/workflow

…instead of non-existent review-score

…n run

…LI and Action banners

…ach, dynamic *_js) and README usage

…ples references

…amples Add Docker-based sandbox environments for isolated check execution: - src/sandbox/: types, DockerImageSandbox, DockerComposeSandbox, CacheVolumeManager, SandboxManager, CheckRunner, EnvFilter - Engine integration: sandbox lifecycle in executeGroupedChecks, CheckRunner dispatch for sandboxed checks - CLI: --run-check mode for child visor execution inside containers - Config: sandbox validation (mode exclusivity, compose requires service, cache path validation, resource limits) - 8 unit test suites (53 tests) covering all sandbox components - 10 Docker integration tests (auto-skip without Docker): image mode, workspace defaults, per-check overrides, env passthrough, read-only mounts, network isolation, inline Dockerfile, --run-check payload round-trip, container reuse, cache volumes - 6 example YAML configs demonstrating each sandbox capability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

probelabs · 2026-02-06T16:18:01Z

🔍 Code Analysis Results

ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

🐛 Debug Information

Provider: google
Model: gemini-2.5-pro-preview-06-05
API Key Source: GOOGLE_API_KEY
Processing Time: 856ms
Timestamp: 2026-02-06T16:17:58.319Z
Prompt Length: 354561 characters
Response Length: 0 characters
JSON Parse Success: ❌

Errors

[overview] ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.
[performance] ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.
[style] ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

Debug Details

⚠️ Debug information is too large for GitHub comments.
📁 Full debug information saved to artifact: visor-debug-2026-02-06T16-18-00-589Z.md

🔗 Download Link: visor-debug-2315
💡 Go to the GitHub Action run above and download the debug artifact to view complete prompts and responses.

Powered by Visor from Probelabs

Last updated: 2026-02-06T16:18:00.741Z | Triggered by: opened | Commit: 51f3646

probelabs · 2026-02-06T16:18:02Z

🔍 Code Analysis Results

✅ Security Check Passed

No security issues found – changes LGTM.

Performance Issues (1)

Severity	Location	Issue
🟠 Error	`system:0`	ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

✅ Quality Check Passed

No quality issues found – changes LGTM.

Style Issues (1)

Severity	Location	Issue
🟠 Error	`system:0`	ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

🐛 Debug Information

Provider: google
Model: gemini-2.5-pro-preview-06-05
API Key Source: GOOGLE_API_KEY
Processing Time: 856ms
Timestamp: 2026-02-06T16:17:58.319Z
Prompt Length: 354561 characters
Response Length: 0 characters
JSON Parse Success: ❌

Errors

[overview] ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.
[performance] ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.
[style] ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

Debug Details

⚠️ Debug information is too large for GitHub comments.
📁 Full debug information saved to artifact: visor-debug-2026-02-06T16-18-01-734Z.md

🔗 Download Link: visor-debug-2315
💡 Go to the GitHub Action run above and download the debug artifact to view complete prompts and responses.

Powered by Visor from Probelabs

Last updated: 2026-02-06T16:18:01.897Z | Triggered by: opened | Commit: 51f3646

…ble env allowlist - Add SandboxDefaults type with configurable env_passthrough that replaces hardcoded defaults at workspace level - Add withActiveSpan/addEvent/setSpanError to sandbox lifecycle methods (sandbox-manager, docker-image-sandbox, check-runner) - Add trace-ingester.ts for NDJSON child trace relay via workspace mount - Add sandbox-telemetry.ts wrapper with graceful fallback when telemetry module is unavailable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ngine Merge origin/main into feat/sandbox-integration-tests to bring in: - src/telemetry/ (OTel instrumentation framework) - State machine execution engine refactor - All upstream fixes and features Semantic merge applied to: - src/types/config.ts: added SandboxDefaults, sandbox_defaults to VisorConfig - src/check-execution-engine.ts: compatibility layer (engine moved to state-machine) - src/state-machine-execution-engine.ts: SandboxManager lifecycle integration - src/state-machine/dispatch/sandbox-routing.ts: new sandbox execution routing - src/cli-main.ts: re-added --run-check mode - src/config.ts: re-added sandbox validation logic Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix visorDistPath resolution: detect ncc bundle context via existsSync(join(__dirname, 'index.js')) instead of always using dirname(__dirname), which resolves to project root not dist/ - Base64-encode sandbox payload to avoid double shell-escaping through two sh -c layers (check-runner → docker exec) - Initialize OTel telemetry in --run-check mode when VISOR_TELEMETRY_ENABLED=true, wrapping provider.execute() in withActiveSpan for proper child span emission - Set NODE_PATH=/workspace/node_modules in sandbox env so child visor can resolve externalized @opentelemetry/* packages - Rewrite trace-ingester to create proper OTel child spans with original timing instead of flat events, filtering fallback-ndjson duplicates - Fix prettier formatting in merge files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

probelabs · 2026-02-06T18:12:49Z

PR Overview: Docker Sandbox Execution with Integration Tests and Examples

Summary

This PR introduces Docker sandbox execution capabilities for Visor, enabling checks to run in isolated container environments. The implementation includes comprehensive infrastructure, testing, and example configurations.

Files Changed Analysis

24 files changed with 2,797 additions and 1 deletion:

Core Infrastructure (New `src/sandbox/` module)

types.ts - Type definitions for sandbox configuration, execution, and payload/result serialization
docker-image-sandbox.ts (216 lines) - Docker image-based sandbox with support for pre-built images, Dockerfiles, and inline Dockerfiles
docker-compose-sandbox.ts (122 lines) - Docker Compose-based sandbox for multi-service environments
sandbox-manager.ts (172 lines) - Lifecycle management, lazy container startup, and routing
check-runner.ts (240 lines) - Serializes check execution into sandbox containers via child visor processes
cache-volume-manager.ts (230 lines) - Docker named volume management with TTL-based eviction
env-filter.ts (61 lines) - Environment variable filtering for sandbox forwarding
sandbox-telemetry.ts (57 lines) - Telemetry helpers with graceful fallback
index.ts (20 lines) - Module exports

Engine Integration

src/cli-main.ts (+188 lines) - New --run-check CLI mode for child visor execution inside containers
src/config.ts (+124 lines) - Sandbox configuration validation (mode exclusivity, compose requirements, cache paths, resource limits)

Configuration & Examples

examples/sandbox-*.yaml (6 files) - Example configurations demonstrating:
- Basic image mode
- Multi-environment setups
- Environment passthrough
- Read-only mounts with network isolation
- Inline Dockerfiles
- Cache volume management

Documentation

DEPLOYMENT.md (118 lines) - Cloudflare Pages deployment guide
GITHUB_CHECKS.md (195 lines) - GitHub Checks API integration documentation
defaults/.visor.yaml (420 lines) - Updated default configuration

CI/CD

.github/workflows/test.yml (75 lines) - GitHub Actions test workflow

Developer Tools

demo-suppression*.js (2 files) - Suppression feature demos
minimal-probe-test.{js,ts} (2 files) - ProbeAgent testing utilities

Architecture & Impact Assessment

What This PR Accomplishes

Docker Sandbox Execution - Run checks in isolated Docker containers with:
- Pre-built image mode
- Dockerfile mode (file or inline)
- Docker Compose mode for multi-service environments
- Cache volume management with TTL eviction
- Environment variable filtering
- Resource limits (CPU, memory)
- Network isolation and read-only mounts
Comprehensive Testing - 8 unit test suites (53 tests) + 10 Docker integration tests
Engine Integration - Seamless integration with existing check execution via --run-check mode

Key Technical Changes

Sandbox Infrastructure:

SandboxInstance interface with start(), exec(), and stop() methods
DockerImageSandbox for single-container scenarios
DockerComposeSandbox for multi-service setups
CacheVolumeManager for efficient volume reuse and cleanup
CheckRunner for payload serialization and child process execution

Execution Flow:

graph TD
    A[Check Execution Request] --> B{Sandbox Configured?}
    B -->|No| C[Run on Host]
    B -->|Yes| D[SandboxManager.resolveSandbox]
    D --> E[SandboxManager.getOrStart]
    E --> F{Sandbox Type}
    F -->|Image| G[DockerImageSandbox]
    F -->|Compose| H[DockerComposeSandbox]
    G --> I[CheckRunner.runCheck]
    H --> I
    I --> J[Build CheckRunPayload]
    J --> K[Filter Environment]
    K --> L[Exec: node index.js --run-check]
    L --> M[Parse CheckRunResult]
    M --> N[Return ReviewSummary]

Configuration Validation:

Exactly one mode required (image, dockerfile, dockerfile_inline, or compose)
Compose mode requires service field
Cache paths must be absolute
Resource limits validated (CPU must be positive number)

Affected System Components

graph LR
    A[Config Manager] -->|Validates| B[Sandbox Configs]
    C[Execution Engine] -->|Resolves| D[SandboxManager]
    D -->|Creates| E[DockerImageSandbox]
    D -->|Creates| F[DockerComposeSandbox]
    E -->|Mounts| G[CacheVolumeManager]
    F -->|Uses| H[CheckRunner]
    H -->|Executes| I[Child Visor Process]
    I -->|Returns| J[CheckRunResult]

Scope Discovery & Context Expansion

Immediate Impact:

Core engine: Check execution routing and sandbox lifecycle
Configuration: New sandboxes section with validation
CLI: --run-check mode for container execution
Testing: New integration test framework with Docker

Related Modules to Review:

src/state-machine/dispatch/execution-invoker.ts - Check orchestration integration
src/state-machine/dispatch/sandbox-routing.ts - Sandbox vs host routing logic
src/providers/ - Provider compatibility with sandbox execution
tests/integration/sandbox-docker-integration.test.ts - Integration test coverage

Configuration Changes:

sandboxes:
  node-env:
    image: "node:20-alpine"
    cache:
      paths: ["/root/.npm"]
      ttl: "7d"
    env_passthrough: ["CI", "GITHUB_*"]

sandbox: node-env  # Default for all checks

checks:
  my-check:
    type: command
    sandbox: alpine-env  # Override default

Testing Coverage:

Unit tests: All sandbox components with mocked Docker exec
Integration tests: Real Docker scenarios (auto-skip without Docker)
Test scenarios: Image mode, workspace defaults, per-check overrides, env passthrough, read-only mounts, network isolation, inline Dockerfiles, container reuse, cache volumes

Known Limitations:

filterEnvForSandbox forwards host PATH which can break sh on systems where /bin is not in PATH (e.g., Arch Linux)
Integration test uses DockerImageSandbox directly with explicit env as workaround

Metadata

Review Effort: 4 / 5
Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-02-07T07:19:33.374Z | Triggered by: pr_updated | Commit: d91bbd1

💡 TIP: You can chat with Visor using /visor ask <your question>

probelabs · 2026-02-06T18:16:42Z

Security Issues (22)

Severity	Location	Issue
🔴 Critical	`src/sandbox/cache-volume-manager.ts:105`	Docker volume name is interpolated directly into shell command without proper escaping. An attacker who can control the volume name (via cache prefix or sandbox name) could inject arbitrary commands. 💡 Suggestion Use proper shell argument escaping or pass volume names as separate arguments to docker commands. Consider using a whitelist for allowed characters in volume names. 🔧 Suggested Fix await execAsync(`docker volume inspect ${shellescape(name)}`, {
🔴 Critical	`src/sandbox/cache-volume-manager.ts:120`	Docker volume name and label value are interpolated into shell command without proper escaping. The timestamp value could contain malicious characters. 💡 Suggestion Escape the volume name and label value properly, or use argument-based docker CLI invocation. 🔧 Suggested Fix await execAsync(`docker volume create --label visor.last-used=${shellescape(now)} ${shellescape(name)}`, {
🔴 Critical	`src/sandbox/cache-volume-manager.ts:132`	Source and target volume names are interpolated into docker run command without proper escaping. An attacker controlling cache configuration could inject arbitrary commands. 💡 Suggestion Properly escape volume names or use argument-based invocation. 🔧 Suggested Fix await execAsync(`docker run --rm -v ${shellescape(source)}:/src:ro -v ${shellescape(target)}:/dst alpine sh -c "cp -a /src/. /dst/"`,
🔴 Critical	`src/sandbox/cache-volume-manager.ts:163`	Docker volume filter pattern is interpolated into shell command without proper escaping. The sandboxName parameter could contain malicious characters. 💡 Suggestion Escape the sandbox name before interpolating into the command. 🔧 Suggested Fix `docker volume ls --filter "name=visor-cache-" --format "{{.Name}}"`
🔴 Critical	`src/sandbox/cache-volume-manager.ts:189`	Volume name is interpolated into docker volume inspect command without proper escaping. 💡 Suggestion Escape the volume name before use in shell command. 🔧 Suggested Fix `docker volume inspect ${shellescape(vol)} --format "{{.CreatedAt}}"`
🔴 Critical	`src/sandbox/cache-volume-manager.ts:196`	Volume name is interpolated into docker volume rm command without proper escaping. 💡 Suggestion Escape the volume name before use in shell command. 🔧 Suggested Fix await execAsync(`docker volume rm ${shellescape(vol)}`, {
🔴 Critical	`src/sandbox/cache-volume-manager.ts:216`	Volume name is interpolated into docker volume rm command without proper escaping. 💡 Suggestion Escape the volume name before use in shell command. 🔧 Suggested Fix await execAsync(`docker volume rm ${shellescape(vol)}`, {
🔴 Critical	`src/sandbox/docker-compose-sandbox.ts:41`	Compose file path and project name are interpolated into shell command without proper escaping. An attacker controlling sandbox configuration could inject arbitrary commands. 💡 Suggestion Escape the compose file path and project name, or use argument-based invocation. 🔧 Suggested Fix await execAsync(`docker compose -f ${shellescape(this.config.compose)} -p ${shellescape(this.projectName)} up -d`, {
🔴 Critical	`src/sandbox/docker-compose-sandbox.ts:91`	The entire command string is built by joining arguments with spaces and simple quotes. Environment variable values could contain single quotes or newlines, breaking out of the quoting and enabling command injection. 💡 Suggestion Use proper shell escaping for all arguments, especially environment variable values. Consider using a library like shell-escape or shlex. 🔧 Suggested Fix `const cmd = args.map(a => shellescape(a)).join(' ');`
🔴 Critical	`src/sandbox/docker-compose-sandbox.ts:112`	Compose file path and project name are interpolated into shell command without proper escaping. 💡 Suggestion Escape the compose file path and project name. 🔧 Suggested Fix await execAsync(`docker compose -f ${shellescape(this.config.compose)} -p ${shellescape(this.projectName)} down`, {
🔴 Critical	`src/sandbox/docker-image-sandbox.ts:88`	Dockerfile path and repo path are interpolated into shell command without proper escaping. An attacker controlling sandbox configuration could inject arbitrary commands. 💡 Suggestion Escape the dockerfile path and repo path before interpolating into the command. 🔧 Suggested Fix await execAsync(`docker build -t ${shellescape(imageName)} -f ${shellescape(dockerfilePath)} ${shellescape(this.repoPath)}`, {
🔴 Critical	`src/sandbox/docker-image-sandbox.ts:108`	Repo path is interpolated into docker build command without proper escaping. 💡 Suggestion Escape the dockerfile path and repo path. 🔧 Suggested Fix await execAsync(`docker build -t ${shellescape(imageName)} -f ${shellescape(this.config.dockerfile)} ${shellescape(this.repoPath)}`,
🔴 Critical	`src/sandbox/docker-image-sandbox.ts:143`	The docker run command is built by joining arguments with spaces and simple quotes. Path arguments (repoPath, visorDistPath) and resource values could contain spaces or special characters, enabling command injection. 💡 Suggestion Use proper shell escaping for all arguments. The current quoting only handles spaces, not quotes, backticks, or other shell metacharacters. 🔧 Suggested Fix `const cmd = args.map(a => shellescape(a)).join(' ');`
🔴 Critical	`src/sandbox/docker-image-sandbox.ts:182`	The docker exec command is built with simple single-quote escaping. Environment variable values could contain single quotes, enabling command injection. 💡 Suggestion Use proper shell escaping that handles all shell metacharacters, not just single quotes. 🔧 Suggested Fix `const cmd = args.map(a => shellescape(a)).join(' ');`
🔴 Critical	`src/sandbox/docker-image-sandbox.ts:203`	Container name is interpolated into docker rm command without proper escaping. 💡 Suggestion Escape the container name before use in shell command. 🔧 Suggested Fix await execAsync(`docker rm -f ${shellescape(this.containerName)}`, {
🟠 Error	`src/sandbox/check-runner.ts:135`	The base64-encoded payload and visor path are interpolated into a shell command without proper escaping. While base64 is safe, the visorPath could contain malicious characters. 💡 Suggestion Escape the visorPath before interpolating into the command. 🔧 Suggested Fix const command = `echo ${b64Payload} \| base64 -d \| node ${shellescape(visorPath)}/index.js --run-check -`;
🟠 Error	`src/sandbox/docker-image-sandbox.ts:70`	Inline Dockerfile content is written to a temp file and executed. If an attacker can control the dockerfile_inline configuration, they can execute arbitrary code on the host during docker build. 💡 Suggestion Validate and sanitize inline Dockerfile content. Restrict allowed Dockerfile instructions or use a whitelist approach. Consider disallowing inline Dockerfiles in untrusted configurations. 🔧 Suggested Fix `// Add validation before writing Dockerfile if (this.config.dockerfile_inline) { const sanitized = this.validateDockerfile(this.config.dockerfile_inline); writeFileSync(dockerfilePath, sanitized, 'utf8'); }`
🟡 Warning	`src/sandbox/env-filter.ts:8`	PATH is forwarded by default into sandbox containers. This could expose host system information and potentially be used for path manipulation attacks. 💡 Suggestion Consider making PATH forwarding opt-in rather than default, or sanitize PATH to only include essential directories. 🔧 Suggested Fix `export const BUILTIN_PASSTHROUGH = ['HOME', 'USER', 'CI', 'NODE_ENV', 'LANG'];`
🟡 Warning	`src/sandbox/check-runner.ts:106`	Trace file path is constructed using sandboxManager.getRepoPath() and a random UUID, but the repoPath comes from user configuration and could potentially point outside the intended workspace. 💡 Suggestion Validate that repoPath is within the expected workspace directory before writing files. 🔧 Suggested Fix `const resolvedRepoPath = resolve(sandboxManager.getRepoPath()); // Add validation to ensure resolvedRepoPath is within allowed workspace hostTracePath = join(resolvedRepoPath, traceFileName);`
🟡 Warning	`src/sandbox/docker-image-sandbox.ts:115`	The repoPath and visorDistPath are directly interpolated into docker volume mount specifications. If these paths contain special characters or attempt path traversal, they could mount unintended directories. 💡 Suggestion Validate that repoPath and visorDistPath are absolute paths and don't contain path traversal sequences before using them in mount specs. 🔧 Suggested Fix `// Validate paths are absolute and don't contain traversal if (!this.repoPath.startsWith('/') \|\| this.repoPath.includes('..')) { throw new Error('Invalid repoPath'); }`
🟡 Warning	`src/sandbox/cache-volume-manager.ts:64`	Container paths from cache configuration are used directly in volume mount specs without validation. Malicious paths could cause unexpected behavior. 💡 Suggestion Validate that container paths are absolute paths and don't contain shell metacharacters before using them. 🔧 Suggested Fix if (!containerPath.startsWith('/') \|\| /[^a-zA-Z0-9/_-]/.test(containerPath)) { throw new Error(`Invalid cache path: ${containerPath}`); }
🟡 Warning	`src/sandbox/check-runner.ts:122`	The entire check configuration including potentially sensitive data (API keys, tokens) is serialized to JSON and passed as a command-line argument. This could expose secrets in process listings. 💡 Suggestion Consider passing sensitive data via a temporary file or environment variables instead of command-line arguments. Use base64 encoding as done, but also ensure the payload is not logged. 🔧 Suggested Fix `// Add: Ensure payload is not logged in debug output // Consider: Use a temporary file for very large payloads or those with secrets`

Architecture Issues (14)

Severity	Location	Issue
🟠 Error	`src/sandbox/check-runner.ts:165`	CheckRunner searches stdout for JSON by scanning backwards from the last line looking for lines starting with '{'. This is fragile and will break if any debug output contains JSON-like objects. 💡 Suggestion Use a more robust protocol: write result to a dedicated file descriptor, use a delimiter marker, or prefix output with a known header. Consider NDJSON with a specific result record type.
🟠 Error	`src/sandbox/check-runner.ts:51`	CheckRunner.runCheck is a static method with 8 parameters, making it difficult to extend or mock. The method mixes payload building, env filtering, trace setup, command execution, and result parsing. 💡 Suggestion Refactor into a class with instance methods. Extract payload building, env filtering, and result parsing into separate, testable components. Use a builder pattern for the complex parameter set.
🟢 Info	`src/sandbox/check-runner.ts:119`	CheckRunner sets NODE_PATH to allow child visor to resolve externalized npm packages from workspace node_modules. This is a workaround for ncc bundling limitations. 💡 Suggestion Document this limitation clearly. Consider alternative approaches: bundle necessary dependencies, use a different bundler, or make the workspace node_modules available via a dedicated mount.
🟡 Warning	`src/sandbox/sandbox-manager.ts:15`	SandboxManager directly constructs concrete sandbox implementations (DockerImageSandbox, DockerComposeSandbox) instead of using a factory pattern or dependency injection. This creates tight coupling and makes it difficult to extend with new sandbox types (e.g., podman, Kubernetes) without modifying the manager. 💡 Suggestion Introduce a SandboxFactory interface and registry pattern. Allow sandbox types to register themselves, making the system extensible without modifying core logic.
🟡 Warning	`src/sandbox/check-runner.ts:51`	CheckRunner contains special-case logic for trace file relay that only applies to non-read-only sandboxes. This creates a conditional path that could be eliminated through better design. 💡 Suggestion Consider using a strategy pattern for trace collection with separate implementations for read-only vs writable sandboxes, or make trace relay a configurable capability of the sandbox interface.
🟡 Warning	`src/sandbox/cache-volume-manager.ts:45`	CacheVolumeManager implements complex volume naming, fallback copying, TTL-based eviction, and scope limiting. This appears over-engineered for the initial use case of simple cache persistence. 💡 Suggestion Start with simpler volume management (just create and mount). Add fallback and eviction features only when proven necessary through real-world usage patterns.
🟡 Warning	`src/sandbox/docker-image-sandbox.ts:20`	DockerImageSandbox handles both container lifecycle (start/stop) and image building (buildImageIfNeeded). This mixes two separate concerns - container management vs image building. 💡 Suggestion Extract image building into a separate ImageBuilder service. The sandbox should only handle container lifecycle, delegating image construction to the builder.
🟡 Warning	`src/sandbox/docker-compose-sandbox.ts:16`	DockerComposeSandbox duplicates command-building logic from DockerImageSandbox (args array construction, shell quoting). This violates DRY and creates maintenance burden. 💡 Suggestion Extract common Docker command building logic into a shared utility. Both sandbox implementations should use the same command construction helper.
🟡 Warning	`src/sandbox/env-filter.ts:28`	EnvFilter implements custom glob pattern matching with only trailing wildcard support. This is a special case implementation that doesn't use standard glob libraries. 💡 Suggestion Use a proper glob/minimatch library for pattern matching. If custom behavior is needed, document why standard glob patterns are insufficient.
🟡 Warning	`src/sandbox/sandbox-manager.ts:58`	SandboxManager.resolveSandbox returns null for host execution, creating an implicit convention that callers must check for null. This is error-prone and not clearly documented. 💡 Suggestion Return a special HostSandbox instance or use a Result type that makes the host vs sandbox decision explicit and type-safe.
🟡 Warning	`src/sandbox/cache-volume-manager.ts:145`	CacheVolumeManager.touchVolume is documented as a no-op because Docker labels are immutable, but the method still exists and is called. This creates dead code that confuses readers. 💡 Suggestion Remove the touchVolume method entirely if it's truly a no-op. If last-used tracking is needed, implement it using a separate mechanism (e.g., a metadata file in the volume).
🟡 Warning	`src/sandbox/docker-image-sandbox.ts:95`	DockerImageSandbox.buildImageIfNeeded creates temp files for inline Dockerfiles but cleanup is in a finally block that could fail silently. Temp files may leak on errors. 💡 Suggestion Use a try-with-resources pattern or RAII wrapper for temp file lifecycle. Consider using a temp directory that gets cleaned up as a whole rather than individual files.
🟡 Warning	`src/sandbox/sandbox-manager.ts:145`	SandboxManager.stopAll swallows errors when stopping sandboxes or evicting cache, only logging warnings. This may hide serious problems (e.g., containers that failed to stop). 💡 Suggestion Collect errors during cleanup and return them or throw an aggregate exception. Allow callers to decide if cleanup failures should be fatal.
🟡 Warning	`src/sandbox/cache-volume-manager.ts:189`	CacheVolumeManager.evictExpired uses regex to parse volume names and extract prefix/sandbox/hash. This parsing is fragile and will break if volume naming convention changes. 💡 Suggestion Store metadata as Docker labels on volumes (e.g., visor-sandbox, visor-prefix, visor-path). Use docker inspect to read labels instead of parsing names.

Performance Issues (6)

Severity Location Issue

🟠 Error

src/sandbox/cache-volume-manager.ts:180-193

N+1 query problem in cache volume eviction. The code sequentially inspects each volume with individual `docker volume inspect` commands in a for loop. For large numbers of cache volumes, this creates O(n) sequential Docker CLI calls, each with a 10-second timeout. This can cause significant delays during cleanup.

💡 Suggestion

Use `docker volume inspect` with multiple volume names in a single command to batch inspect all volumes at once, then parse the JSON output. This reduces O(n) sequential calls to O(1) batched call.

🔧 Suggested Fix

Replace the sequential for loop with a single batched inspect command:
// Batch inspect all volumes at once
const volumesToInspect = sandboxVolumes.join(&#39; &#39;);
const { stdout: inspectOut } = await execAsync(
  `docker volume inspect ${volumesToInspect} --format &#39;{{.Name}}:{{.CreatedAt}}&#39;`,
  { maxBuffer: EXEC_MAX_BUFFER, timeout: 10000 * sandboxVolumes.length }
);

// Parse results and evict in parallel
const evictionPromises = [];
for (const line of inspectOut.trim().split(&#39;\n&#39;)) {
  const [volName, createdAtStr] = line.split(&#39;:&#39;);
  const createdAt = new Date(createdAtStr).getTime();
  if (now - createdAt &gt; ttlMs) {
    evictionPromises.push(execAsync(`docker volume rm ${volName}`, { maxBuffer: EXEC_MAX_BUFFER, timeout: 10000 }));
  }
}
await Promise.allSettled(evictionPromises);
```</code></pre>
</details>
</div></td>
    </tr>
    <tr>
      <td>🟢 Info</td>
      <td><code>src/sandbox/docker-image-sandbox.ts:157-195</code></td>
      <td><div>Synchronous file operations in async context. The `exec` method uses `execAsync` but the command construction involves synchronous string operations. While not critical, the pattern of building complex shell commands with string concatenation is error-prone and less efficient than using argument arrays directly with spawn/exec.        <details><summary>💡 <strong>Suggestion</strong></summary>Consider using the `args` array format directly with `spawn` or `execFile` instead of building shell command strings. This avoids shell parsing overhead and reduces security risks from shell injection.
</details>
        <details><summary>🔧 <strong>Suggested Fix</strong></summary><pre><code>Use direct argument passing:

```typescript
import { spawn } from &#39;child_process&#39;;

// In exec method:
return new Promise((resolve, reject) =&gt; {
  const args = [&#39;exec&#39;];
  
  for (const [key, value] of Object.entries(options.env)) {
    args.push(&#39;-e&#39;, `${key}=${value}`);
  }
  
  args.push(this.containerId, &#39;sh&#39;, &#39;-c&#39;, options.command);
  
  const proc = spawn(&#39;docker&#39;, args, {
    timeout: options.timeoutMs || 600000,
    maxBuffer: options.maxBuffer || EXEC_MAX_BUFFER,
  });
  
  let stdout = &#39;&#39;;
  let stderr = &#39;&#39;;
  
  proc.stdout?.on(&#39;data&#39;, (data) =&gt; stdout += data);
  proc.stderr?.on(&#39;data&#39;, (data) =&gt; stderr += data);
  
  proc.on(&#39;close&#39;, (code) =&gt; {
    resolve({ stdout, stderr, exitCode: code || 0 });
  });
  
  proc.on(&#39;error&#39;, (err) =&gt; {
    resolve({ stdout: &#39;&#39;, stderr: err.message, exitCode: 1 });
  });
});
```</code></pre>
</details>
</div></td>
    </tr>
    <tr>
      <td>🟢 Info</td>
      <td><code>src/sandbox/docker-compose-sandbox.ts:52-104</code></td>
      <td><div>Same issue as docker-image-sandbox.ts - building shell command strings instead of using direct argument arrays. This adds shell parsing overhead and potential security issues.        <details><summary>💡 <strong>Suggestion</strong></summary>Refactor to use direct argument arrays with spawn/execFile instead of building shell command strings.
</details>
        <details><summary>🔧 <strong>Suggested Fix</strong></summary><pre><code>See suggestion for docker-image-sandbox.ts - use spawn with argument arrays instead of shell command construction.</code></pre>
</details>
</div></td>
    </tr>
    <tr>
      <td>🟢 Info</td>
      <td><code>src/sandbox/check-runner.ts:73-227</code></td>
      <td><div>Large buffer allocation (50MB) for every sandboxed check execution. While this prevents truncation issues, it allocates significant memory for each check. For checks with many parallel executions, this could cause memory pressure.        <details><summary>💡 <strong>Suggestion</strong></summary>Consider making the maxBuffer configurable per-check or using streaming for very large outputs. Most checks won&#39;t need 50MB of output.
</details>
        <details><summary>🔧 <strong>Suggested Fix</strong></summary><pre><code>Add configurable buffer size:

```typescript
const result = await sandboxManager.exec(sandboxName, {
  command,
  env,
  timeoutMs: timeoutMs || 600000,
  maxBuffer: checkConfig.maxBuffer || 10 * 1024 * 1024, // Default 10MB, configurable
});
```</code></pre>
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/check-runner.ts:86-88</code></td>
      <td><div>Inefficient serialization of PRInfo for every sandboxed check. The `serializePRInfo` function creates a deep copy of the entire PRInfo object including all file patches, which can be very large (megabytes) for substantial PRs. This data is then JSON.stringify&#39;d and base64-encoded, creating memory pressure and slow IPC.        <details><summary>💡 <strong>Suggestion</strong></summary>Consider selective serialization - only serialize fields that are actually needed by the child process. For large patches, consider passing them via a shared file or named pipe instead of embedding in the JSON payload. Alternatively, implement field-level filtering to exclude large fields like `fullDiff` or `patch` when not needed.
</details>
        <details><summary>🔧 <strong>Suggested Fix</strong></summary><pre><code>Add selective serialization option:

```typescript
function serializePRInfo(prInfo: PRInfo, options?: { includePatches?: boolean }): SerializedPRInfo {
  const includePatches = options?.includePatches !== false;
  return {
    // ... other fields ...
    files: (prInfo.files || []).map(f =&gt; ({
      filename: f.filename,
      status: f.status,
      additions: f.additions,
      deletions: f.deletions,
      changes: f.changes,
      patch: includePatches ? f.patch : undefined, // Optional
    })),
    fullDiff: includePatches ? prInfo.fullDiff : undefined,
    commitDiff: includePatches ? prInfo.commitDiff : undefined,
  };
}
```</code></pre>
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/cache-volume-manager.ts:56-98</code></td>
      <td><div>Sequential async operations in `resolveVolumes` method. Each cache volume is checked and created sequentially in a for loop instead of in parallel. For sandboxes with multiple cache paths, this delays container startup unnecessarily.        <details><summary>💡 <strong>Suggestion</strong></summary>Use `Promise.all` or `Promise.allSettled` to check volume existence and create missing volumes in parallel. Only the fallback copy operation needs to be sequential.
</details>
        <details><summary>🔧 <strong>Suggested Fix</strong></summary><pre><code>Replace the for loop with parallel operations:

```typescript
const volumePromises = cacheConfig.paths.map(async (containerPath) =&gt; {
  const hash = pathHash(containerPath);
  const volumeName = `visor-cache-${prefix}-${sandboxName}-${hash}`;
  
  const exists = await this.volumeExists(volumeName);
  
  if (!exists &amp;&amp; cacheConfig.fallback_prefix) {
    const fallbackPrefix = cacheConfig.fallback_prefix.replace(/[^a-zA-Z0-9._-]/g, &#39;-&#39;);
    const fallbackVolume = `visor-cache-${fallbackPrefix}-${sandboxName}-${hash}`;
    const fallbackExists = await this.volumeExists(fallbackVolume);
    
    if (fallbackExists) {
      logger.info(`Cache miss for &#39;${volumeName}&#39;, copying from fallback &#39;${fallbackVolume}&#39;`);
      await this.copyVolume(fallbackVolume, volumeName);
    } else {
      await this.createVolume(volumeName);
    }
  } else if (!exists) {
    await this.createVolume(volumeName);
  }
  
  await this.touchVolume(volumeName);
  
  return {
    volumeName,
    mountSpec: `${volumeName}:${containerPath}`,
  };
});

const volumes = await Promise.all(volumePromises);
return volumes;
```</code></pre>
</details>
</div></td>
    </tr>
  </tbody>
</table>
<!-- visor:section-end id="performance" -->

<!-- visor:section={"id":"quality","revision":5} -->
### Quality Issues (28)
<table>
  <thead>
    <tr>
      <th>Severity</th>
      <th>Location</th>
      <th>Issue</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>🟠 Error</td>
      <td><code>src/sandbox/cache-volume-manager.ts:145</code></td>
      <td><div>Cache volume copy failures are silently caught and only logged as warnings. This can lead to cache inconsistencies where a volume is marked as existing but is empty or partially populated.        <details><summary>💡 <strong>Suggestion</strong></summary>Propagate the error or return a status indicating whether the copy succeeded, so the caller can decide whether to proceed with the potentially corrupted cache volume.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟠 Error</td>
      <td><code>src/sandbox/docker-compose-sandbox.ts:95</code></td>
      <td><div>The exec command is constructed by manually joining and quoting arguments. While quoting is applied, the logic is complex and error-prone. A bug in the quoting logic could lead to shell injection.        <details><summary>💡 <strong>Suggestion</strong></summary>Use a proper shell-escaping library or the &#39;shell-quote&#39; npm package to safely escape command arguments, or pass arguments as an array to avoid shell parsing entirely.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟠 Error</td>
      <td><code>src/sandbox/docker-image-sandbox.ts:145</code></td>
      <td><div>The exec command is constructed by manually joining and quoting arguments, similar to the compose sandbox. This complex quoting logic is fragile and could lead to shell injection vulnerabilities.        <details><summary>💡 <strong>Suggestion</strong></summary>Use a proper shell-escaping library or the &#39;shell-quote&#39; npm package to safely escape command arguments, or pass arguments as an array to avoid shell parsing entirely.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟢 Info</td>
      <td><code>src/sandbox/env-filter.ts:14</code></td>
      <td><div>The matchesPattern function only supports trailing wildcards (e.g., &#39;GITHUB_*&#39;). It does not support leading wildcards (&#39;*_TOKEN&#39;) or wildcards in the middle of patterns, which limits filtering flexibility.        <details><summary>💡 <strong>Suggestion</strong></summary>Document this limitation clearly or implement full glob pattern matching using a library like &#39;minimatch&#39; if more complex patterns are needed.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟢 Info</td>
      <td><code>tests/unit/sandbox-check-runner.test.ts:35</code></td>
      <td><div>Test uses magic number 42 for PR number without semantic meaning. While this is a common convention, it would be clearer to use a named constant like TEST_PR_NUMBER.        <details><summary>💡 <strong>Suggestion</strong></summary>Use a named constant at the top of the test file: const TEST_PR_NUMBER = 42; to make the test intent clearer.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟢 Info</td>
      <td><code>tests/unit/sandbox-check-runner.test.ts:36</code></td>
      <td><div>Test uses magic numbers 10, 5, 15 for file additions/deletions/changes without clear semantic meaning. These should be named constants or derived from test data.        <details><summary>💡 <strong>Suggestion</strong></summary>Define constants like TEST_ADDITIONS = 10, TEST_DELETIONS = 5, TEST_CHANGES = 15 to make test assertions more readable.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟢 Info</td>
      <td><code>tests/integration/sandbox-docker-integration.test.ts:95</code></td>
      <td><div>Test uses magic timeout value 60000 without explanation. This should be a named constant like DEFAULT_TEST_TIMEOUT.        <details><summary>💡 <strong>Suggestion</strong></summary>Define const DEFAULT_TEST_TIMEOUT = 60000; at the top of the test file to make timeout values consistent and meaningful.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟢 Info</td>
      <td><code>tests/integration/sandbox-docker-integration.test.ts:119</code></td>
      <td><div>Test uses magic timeout value 30000 without explanation. This should be a named constant for consistency.        <details><summary>💡 <strong>Suggestion</strong></summary>Use a named constant for timeout values throughout the test file to improve maintainability.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟢 Info</td>
      <td><code>tests/unit/sandbox-cache-volume.test.ts:28</code></td>
      <td><div>Test uses magic string &#39;main&#39; for git branch without semantic meaning. Should use a named constant like TEST_BRANCH.        <details><summary>💡 <strong>Suggestion</strong></summary>Define const TEST_BRANCH = &#39;main&#39;; to make test intent clearer and avoid hardcoded strings.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟢 Info</td>
      <td><code>tests/unit/sandbox-env-filter.test.ts:12</code></td>
      <td><div>Test uses hardcoded environment variable names and values throughout. Should use named constants for test data.        <details><summary>💡 <strong>Suggestion</strong></summary>Define test fixtures for common environment variables at the top of the test file to improve readability and maintainability.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/cache-volume-manager.ts:156</code></td>
      <td><div>The touchVolume method is documented as a no-op but is called in the critical path of resolveVolumes. This means last-used tracking is not actually implemented, making TTL-based eviction less accurate.        <details><summary>💡 <strong>Suggestion</strong></summary>Either implement proper last-used tracking using a separate metadata file or remove the touchVolume call and document that eviction relies solely on volume creation time.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/cache-volume-manager.ts:189</code></td>
      <td><div>Docker volume listing failures are silently caught, skipping all eviction. This can lead to unbounded growth of cache volumes if Docker commands fail intermittently.        <details><summary>💡 <strong>Suggestion</strong></summary>Log the error with severity and consider implementing a retry mechanism or exponential backoff for transient Docker failures.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/check-runner.ts:119</code></td>
      <td><div>Trace file creation failures are silently caught and result in disabling trace relay. This makes debugging difficult when trace files fail to create due to permissions or disk space issues.        <details><summary>💡 <strong>Suggestion</strong></summary>Log a warning when trace file creation fails so users understand why telemetry is missing from sandboxed checks.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/check-runner.ts:145</code></td>
      <td><div>Child trace ingestion failures are silently caught in nested try-catch blocks. This can mask issues with trace file parsing or OTel integration.        <details><summary>💡 <strong>Suggestion</strong></summary>Log debug or warning messages when trace ingestion fails, including the error message, to aid in troubleshooting telemetry issues.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/check-runner.ts:167</code></td>
      <td><div>When sandbox execution fails with no JSON output, only the first 500 characters of stderr are included in the error message. For verbose command output, this may truncate critical diagnostic information.        <details><summary>💡 <strong>Suggestion</strong></summary>Increase the truncation limit or include a note indicating that stderr was truncated, so users know to check logs for the full output.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/docker-image-sandbox.ts:88</code></td>
      <td><div>When building from inline Dockerfile, the temp file is deleted in a finally block, but if mkdtempSync succeeds and writeFileSync fails, the temp directory is never cleaned up.        <details><summary>💡 <strong>Suggestion</strong></summary>Wrap the entire temp directory lifecycle in a try-finally block to ensure cleanup even if intermediate steps fail.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/sandbox-manager.ts:165</code></td>
      <td><div>Cache eviction failures during stopAll are silently caught. While non-fatal, repeated failures can lead to disk space exhaustion from accumulated cache volumes.        <details><summary>💡 <strong>Suggestion</strong></summary>Log a warning when cache eviction fails, including the error message, so operators are aware of growing disk usage.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/env-filter.ts:8</code></td>
      <td><div>BUILTIN_PASSTHROUGH includes PATH, which the PR description notes can break sh on systems where /bin is not in PATH (e.g., Arch Linux). This is a known limitation.        <details><summary>💡 <strong>Suggestion</strong></summary>Consider making PATH passthrough configurable or detecting whether /bin is in PATH before forwarding it. Alternatively, document this limitation prominently.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/cli-main.ts:42</code></td>
      <td><div>When JSON parsing fails in runCheckMode, the error message is &#39;Invalid JSON payload&#39; which doesn&#39;t help users understand what was wrong with the payload.        <details><summary>💡 <strong>Suggestion</strong></summary>Include the parsing error message and a preview of the invalid JSON to help users diagnose payload formatting issues.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/cli-main.ts:48</code></td>
      <td><div>When required fields are missing from the payload, the error message doesn&#39;t specify which fields are missing or what the expected structure is.        <details><summary>💡 <strong>Suggestion</strong></summary>Include the specific missing field names and a reference to the expected payload structure in the error message.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/config.ts:897</code></td>
      <td><div>The cache path validation only checks if paths start with &#39;/&#39;, but doesn&#39;t validate that paths are well-formed (e.g., no &#39;..&#39; segments, no null bytes) which could lead to security issues or unexpected behavior.        <details><summary>💡 <strong>Suggestion</strong></summary>Add additional validation to reject paths containing &#39;..&#39;, null bytes, or other suspicious patterns, and normalize paths before validation.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>tests/integration/sandbox-docker-integration.test.ts:1</code></td>
      <td><div>Integration tests lack negative test cases for error scenarios: what happens when Docker daemon is not running, when image pull fails, when container crashes during exec, or when disk space is exhausted?        <details><summary>💡 <strong>Suggestion</strong></summary>Add tests that verify error handling paths: simulate Docker failures, invalid images, resource exhaustion scenarios, and ensure proper error messages are returned.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>tests/unit/sandbox-check-runner.test.ts:1</code></td>
      <td><div>Unit tests for CheckRunner don&#39;t test edge cases: what happens when stdout contains multiple JSON objects, when JSON is malformed but starts with &#39;{&#39;, when timeout is exceeded, or when dependencyOutputs contain circular references?        <details><summary>💡 <strong>Suggestion</strong></summary>Add tests for malformed JSON output, timeout scenarios, circular reference handling in dependencyOutputs, and stdout with mixed JSON/non-JSON content.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>tests/unit/sandbox-cache-volume.test.ts:1</code></td>
      <td><div>Cache volume manager tests don&#39;t cover failure scenarios: what happens when Docker commands fail intermittently, when volume creation races with concurrent processes, or when TTL parsing fails?        <details><summary>💡 <strong>Suggestion</strong></summary>Add tests for Docker command failures, concurrent volume creation, invalid TTL formats, and eviction logic with edge cases (empty volumes, very old volumes, etc.).
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>tests/unit/sandbox-env-filter.test.ts:1</code></td>
      <td><div>Env filter tests don&#39;t cover edge cases: empty patterns, patterns with only wildcards, special characters in env var names, or very long env var values.        <details><summary>💡 <strong>Suggestion</strong></summary>Add tests for edge cases like pattern &#39;*&#39;, patterns with special regex characters, env vars with newlines or null bytes, and extremely long values.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/check-runner.ts:1</code></td>
      <td><div>CheckRunner.runCheck doesn&#39;t validate that prInfo.files array doesn&#39;t contain circular references before JSON.stringify, which could cause infinite loops or stack overflow during serialization.        <details><summary>💡 <strong>Suggestion</strong></summary>Add validation to detect circular references in prInfo before serialization, or use a serialization library that handles circular references gracefully.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/docker-image-sandbox.ts:1</code></td>
      <td><div>DockerImageSandbox doesn&#39;t implement cleanup of temporary build directories if the process is killed abruptly (SIGKILL). Temp directories may accumulate in /tmp.        <details><summary>💡 <strong>Suggestion</strong></summary>Register a process signal handler (SIGINT, SIGTERM) to clean up temp directories, or use a temp directory that respects the system&#39;s temp directory cleanup policies.
</details>
</div></td>
    </tr>
    <tr>
      <td>🟡 Warning</td>
      <td><code>src/sandbox/sandbox-manager.ts:1</code></td>
      <td><div>SandboxManager.getOrStart doesn&#39;t handle concurrent requests to start the same sandbox. Multiple parallel checks could race to start the same container, leading to multiple instances or errors.        <details><summary>💡 <strong>Suggestion</strong></summary>Implement a promise-based locking mechanism or use a Map of pending start promises to ensure only one start operation proceeds per sandbox name.
</details>
</div></td>
    </tr>
  </tbody>
</table>
<!-- visor:section-end id="quality" -->

<!-- visor:thread-end key="probelabs/visor#337@d91bbd1" -->

---

*Powered by [Visor](https://probelabs.com/visor) from [Probelabs](https://probelabs.com)*

*Last updated: 2026-02-07T07:19:37.847Z | Triggered by: pr_updated | Commit: d91bbd1*

💡 **TIP:** You can chat with Visor using `/visor ask <your question>`
<!-- /visor-comment-id:visor-thread-review-probelabs/visor#337 -->

…tion - reviewer.test.ts: fix AIReviewService mock message to include "security" - action-cli-bridge.test.ts: remove tests for deleted createTempConfigFromInputs method, update check filtering assertion - github-workflow.test.ts: fix group key from 'default' to 'security-review', update content assertion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The bundled/ directory was accidentally committed and is gitignored. It should not be part of this PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revert dist/ files to match origin/main — build artifacts should not diverge in feature branches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Visor Bot and others added 12 commits October 3, 2025 11:17

ci(ai-battle): unique job names per provider and concurrency to cance…

c4e4612

…l superseded runs - Job name: includes matrix.ai_provider.name and event\n- Concurrency: cancel in-progress runs per branch/workflow

feat(cli): log Visor and Probe versions on info level at startup

52650c5

ci(ai-battle): fix validator to use available outputs (total-issues) …

d0da4a2

…instead of non-existent review-score

feat(action): print Visor and Probe versions at start of GitHub Actio…

cacb081

…n run

ci(ai-battle): gate provider jobs by secrets; regenerate dist

76c2ef5

build: inject PROBE_VERSION at build; use injected Probe version in C…

be1c96d

…LI and Action banners

docs(examples): add failure routing examples (basic, on_success, forE…

4ab2ce1

…ach, dynamic *_js) and README usage

docs: add Failure Routing (auto-fix loops) section to README and exam…

0f7d1f1

…ples references

docs(readme): fix Failure Routing anchor and link; validate

b2e60a4

docs: add failure-routing guide (auto-fix loops)

703d678

buger and others added 3 commits February 6, 2026 17:04

buger changed the base branch from feat/failure-routing to main February 7, 2026 06:47

buger and others added 2 commits February 7, 2026 06:59

chore: remove bundled/ build artifacts from PR

059d2a5

The bundled/ directory was accidentally committed and is gitignored. It should not be part of this PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: restore dist/ to main state

d91bbd1

Revert dist/ files to match origin/main — build artifacts should not diverge in feature branches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sandbox): Docker sandbox execution with integration tests and examples#337

feat(sandbox): Docker sandbox execution with integration tests and examples#337
buger wants to merge 18 commits intomainfrom
feat/sandbox-integration-tests

buger commented Feb 6, 2026

Uh oh!

probelabs bot commented Feb 6, 2026

Errors

Debug Details

Uh oh!

probelabs bot commented Feb 6, 2026

Errors

Debug Details

Uh oh!

probelabs bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

probelabs bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buger commented Feb 6, 2026

Summary

Test plan

Known limitation

Uh oh!

probelabs bot commented Feb 6, 2026

🔍 Code Analysis Results

Errors

Debug Details

Uh oh!

probelabs bot commented Feb 6, 2026

🔍 Code Analysis Results

✅ Security Check Passed

Performance Issues (1)

✅ Quality Check Passed

Style Issues (1)

Errors

Debug Details

Uh oh!

probelabs bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Overview: Docker Sandbox Execution with Integration Tests and Examples

Summary

Files Changed Analysis

Core Infrastructure (New src/sandbox/ module)

Engine Integration

Configuration & Examples

Documentation

CI/CD

Developer Tools

Architecture & Impact Assessment

What This PR Accomplishes

Key Technical Changes

Affected System Components

Scope Discovery & Context Expansion

Uh oh!

probelabs bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Issues (22)

Architecture Issues (14)

Performance Issues (6)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

probelabs bot commented Feb 6, 2026 •

edited

Loading

Core Infrastructure (New `src/sandbox/` module)

probelabs bot commented Feb 6, 2026 •

edited

Loading