Skip to content

feat: implement static asset placement for sandbox executions#292

Merged
ArthurCRodrigues merged 13 commits intomainfrom
sandbox-file-injection
Apr 14, 2026
Merged

feat: implement static asset placement for sandbox executions#292
ArthurCRodrigues merged 13 commits intomainfrom
sandbox-file-injection

Conversation

@jaoppb
Copy link
Copy Markdown
Collaborator

@jaoppb jaoppb commented Apr 14, 2026

Pull Request: Secure Sandbox Asset Injection and gVisor Compatibility

Objective

Resolve issues with sandbox asset injection when running under high-security isolation. The primary goal was to ensure that assets fetched from S3 are correctly injected into sandboxes using the gVisor (runsc) runtime, which previously failed due to filesystem driver incompatibilities with Docker's put_archive.

Changes

Sandbox Injection (gVisor Compatibility)

  • Refactored SandboxContainer.inject_assets to use a Base64-encoded exec_run approach instead of put_archive.
  • Re-enabled gVisor (runsc) runtime and restored the noexec flag on the /tmp mount for enhanced security.
  • Removed the now-obsolete autograder/services/assets/tar_utils.py.

Asset Pipeline Refactor

  • Updated AssetProvider and S3AssetProvider to return raw content bytes instead of pre-packaged tar archives.
  • Updated ResolvedAsset dataclass to support raw content and per-asset read_only flags.
  • Updated AssetSourceResolver to handle the new raw byte flow.

Grader and Template Fixes

  • CommandResolver: Added support for direct string commands (e.g., "python solution.py") in addition to the dictionary format.
  • InputOutputTemplate: Fixed a NameError where kwargs was not passed to check_for_base_errors, preventing crashes on runtime failures.

Infrastructure and Config

  • Added internal S3 endpoint configuration (S3_ENDPOINT_URL) to the api service in docker-compose.yml.

Testing and QA Results

Performed a full "Green Run" in the local stack to verify these changes:

  1. S3 Setup: Uploaded a test file to the local autograder-assets bucket.
  2. Configuration: Created a new assignment (test-asset-assignment-v2) mapping the S3 asset to /tmp/test_asset.txt.
  3. Execution: Sent a Python submission that reads and prints the content of the injected asset.
  4. Verification:
    • Pre-flight: Confirmed "All preflight checks passed" (Asset injection successful).
    • Security: Verified the container was running under gVisor with /tmp set to noexec.
    • Score: Received a 100/100 score, confirming the asset was correctly read and matched the expected output.

Risks and Considerations

  • Performance: For very large assets (e.g., >50MB), the Base64 encoding step adds a ~33% transfer overhead. However, this is the most robust way to ensure compatibility with gVisor's strong isolation.
  • Legacy Support: Removed tar_utils.py. Any custom plugins relying on this utility will need to be updated to handle raw bytes.

This commit introduces the ability to inject grader-owned static assets
(e.g., datasets, fixtures) from an S3 provider into the sandbox before
setup commands and grading run.

Key changes:
- Added `ministack` to `docker-compose.yml` for local S3 emulation.
- Created `SetupConfig` and `AssetConfig` Pydantic models for validation.
- Implemented `AssetSourceResolver` with disk/memory LRU caching of
  pre-compressed tar archives.
- Added `inject_assets` to `SandboxContainer` using Docker SDK's `put_archive`.
- Refactored `PreFlightStep` and `PreFlightService` to orchestrate asset
  injection.
- Updated documentation with the new `assets` configuration schema.

Co-authored-by: gemini-cli <218195315+gemini-cli@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 14, 2026 19:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a pipeline feature to fetch grader-owned static assets from an S3-compatible provider and inject them into the execution sandbox during preflight, before setup commands and grading run.

Changes:

  • Introduces SetupConfig / AssetConfig Pydantic models and updates preflight config resolution.
  • Implements an S3-backed asset resolver with memory/disk caching and tar-archive generation for Docker put_archive.
  • Adds sandbox-side asset injection plus docs, docker-compose local S3 emulation, and tests.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
tests/unit/test_asset_injection.py Unit coverage for config parsing, resolver orchestration, and sandbox injection call shape.
tests/integration/test_asset_preflight_integration.py Integration-style coverage that preflight triggers resolve + injection.
sandbox_manager/sandbox_container.py Adds inject_assets() using Docker exec_run + put_archive.
docs/features/setup_config_feature.md Documents new root-level assets schema and behavior.
docker-compose.yml Adds ministack service for local S3 emulation.
autograder/steps/pre_flight_step.py Orchestrates asset resolution + injection during preflight.
autograder/services/pre_flight_service.py Refactors to accept SetupConfig and return a typed language config.
autograder/services/assets/tar_utils.py Builds an in-memory tar archive for Docker put_archive.
autograder/services/assets/s3_provider.py S3 provider to fetch objects and cache tar archives.
autograder/services/assets/resolver.py Resolves AssetConfigResolvedAsset via provider.
autograder/services/assets/provider.py Provider interface for asset backends.
autograder/services/assets/cache_manager.py In-memory LRU + disk cache for tar bytes.
autograder/models/dataclass/asset.py Adds ResolvedAsset dataclass.
autograder/models/config/setup.py Defines the new setup/asset config models.
.env.example Documents environment variables for the asset S3 provider.

Comment on lines +30 to +44
for asset in assets_config:
source = asset.source
target = asset.target
read_only = asset.read_only

# Resolve asset
tar_content = self.provider.get_asset_tar(source, target, read_only)

if not tar_content:
raise RuntimeError(f"Failed to resolve asset: source={source}, target={target}")

resolved_assets.append(ResolvedAsset(
target=target,
tar_content=tar_content
))
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs/stateful intent is that assets are placed under /tmp when the configured target doesn’t start with /tmp/, but the resolver currently passes asset.target through unchanged when generating the tar. Since SandboxContainer.inject_assets() uploads the tar verbatim, the actual extraction path will follow whatever was used when creating the tar (not the later rewritten target_path). Normalize/enforce the final target path here (or in AssetConfig validation) before calling get_asset_tar(), and store that normalized path in ResolvedAsset.target so tar creation and injection agree.

Copilot uses AI. Check for mistakes.
Comment thread autograder/models/config/setup.py Outdated
Comment on lines +70 to +87
if not sandbox:
error_msg = t("preflight.error.setup_command_missing_sandbox", locale=pipeline_exec.locale)
return pipeline_exec.add_step_result(StepResult.fail(
step=self.step_name,
error=error_msg
))

logger.info("Injecting assets into sandbox (external_user_id=%s)", pipeline_exec.submission.user_id)
try:
resolved_assets = self._asset_resolver.resolve_assets(self._setup_config.assets)
sandbox.inject_assets(resolved_assets)
except Exception as e:
error_msg = f"Failed to inject assets: {str(e)}"
logger.error("Asset injection failed (external_user_id=%s): %s", pipeline_exec.submission.user_id, error_msg)
return pipeline_exec.add_step_result(StepResult.fail(
step=self.step_name,
error=error_msg
))
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The injected-assets missing-sandbox path reuses preflight.error.setup_command_missing_sandbox, which is misleading when only assets are configured. Also, on injection failure the step returns a raw str(e) in the student-facing error, which can be noisy and inconsistent with other preflight failures that return structured error_data. Add a dedicated translated error key for asset injection failures/missing sandbox and consider attaching structured details (e.g., source/target) instead of leaking raw exception text.

Copilot uses AI. Check for mistakes.
Comment thread tests/integration/test_asset_preflight_integration.py Outdated
Comment on lines +16 to +31
# Environment variables
self.bucket_name = os.getenv("CRITERIA_ASSETS_BUCKET_NAME")
self.access_key = os.getenv("AWS_ACCESS_ID")
self.secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
self.region = os.getenv("AWS_REGION", "us-east-1")
self.endpoint_url = os.getenv("S3_ENDPOINT_URL")

# Initialize boto3 client
self.s3 = boto3.client(
's3',
endpoint_url=self.endpoint_url,
aws_access_key_id=self.access_key,
aws_secret_access_key=self.secret_key,
region_name=self.region,
config=Config(signature_version='s3v4')
)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3AssetProvider will happily initialize with bucket_name=None / missing credentials, and then later get_object() will fail with a less clear boto3 error. Consider validating required env vars up-front (bucket name at minimum; and credentials depending on endpoint) and raising a clear, actionable exception so preflight failures are easier to diagnose.

Copilot uses AI. Check for mistakes.
Comment thread autograder/services/assets/tar_utils.py Outdated
Comment on lines +1 to +7
import io
import tarfile
import os
import time

def create_tar_archive(filename: str, content: bytes, target_path: str, read_only: bool = True) -> bytes:
"""
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module has unused imports/args that will fail CI linting (pylint enables unused-import/unused-argument on changed files): os is unused and the filename parameter is never referenced. Either remove them or use filename (e.g., to derive the tar entry name) and drop the unused import.

Copilot uses AI. Check for mistakes.
Comment thread autograder/services/assets/s3_provider.py Outdated
Comment on lines +33 to +41
def test_invalid_asset_target(self):
"""Test validation of absolute target path."""
with pytest.raises(Exception):
AssetConfig(source="s", target="relative/path")

def test_asset_path_traversal(self):
"""Test rejection of path traversal in source."""
with pytest.raises(Exception):
AssetConfig(source="../secret", target="/tmp/s")
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These validation tests use pytest.raises(Exception), which is overly broad and can mask unrelated failures (and makes intent unclear). Since AssetConfig validation errors are deterministic, assert on the specific exception type raised by Pydantic (e.g., pydantic.ValidationError) or ValueError as appropriate.

Copilot uses AI. Check for mistakes.
Comment thread docker-compose.yml Outdated
Comment thread autograder/services/assets/resolver.py Outdated
@ArthurCRodrigues
Copy link
Copy Markdown
Member

@jaoppb had to refactor some of your implementation:

I discovered that using Docker's native put_archive method has compatibility issues with the gVisor (runsc) runtime. Under high isolation, put_archive would often bypass the sandboxed VFS layer, resulting in "file not found"
errors when the student code tried to access the injected assets, even if the operation reported success.

To maintain our security posture while ensuring reliable asset injection, we switched to a Base64-encoded exec_run approach. By encoding the file content and piping it through base64 -d inside an exec session, we ensure that
the file is created directly within the gVisor-managed guest kernel. This method allows us to keep the /tmp directory secured with the noexec flag while still supporting dynamic file injection during the Pre-Flight stage.

Copy link
Copy Markdown
Member

@ArthurCRodrigues ArthurCRodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@ArthurCRodrigues ArthurCRodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@ArthurCRodrigues ArthurCRodrigues merged commit acc0957 into main Apr 14, 2026
3 checks passed
@ArthurCRodrigues ArthurCRodrigues deleted the sandbox-file-injection branch April 14, 2026 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants