feat: implement static asset placement for sandbox executions by jaoppb · Pull Request #292 · webtech-network/autograder

jaoppb · 2026-04-14T19:23:50Z

Pull Request: Secure Sandbox Asset Injection and gVisor Compatibility

Objective

Resolve issues with sandbox asset injection when running under high-security isolation. The primary goal was to ensure that assets fetched from S3 are correctly injected into sandboxes using the gVisor (runsc) runtime, which previously failed due to filesystem driver incompatibilities with Docker's put_archive.

Changes

Sandbox Injection (gVisor Compatibility)

Refactored SandboxContainer.inject_assets to use a Base64-encoded exec_run approach instead of put_archive.
Re-enabled gVisor (runsc) runtime and restored the noexec flag on the /tmp mount for enhanced security.
Removed the now-obsolete autograder/services/assets/tar_utils.py.

Asset Pipeline Refactor

Updated AssetProvider and S3AssetProvider to return raw content bytes instead of pre-packaged tar archives.
Updated ResolvedAsset dataclass to support raw content and per-asset read_only flags.
Updated AssetSourceResolver to handle the new raw byte flow.

Grader and Template Fixes

CommandResolver: Added support for direct string commands (e.g., "python solution.py") in addition to the dictionary format.
InputOutputTemplate: Fixed a NameError where kwargs was not passed to check_for_base_errors, preventing crashes on runtime failures.

Infrastructure and Config

Added internal S3 endpoint configuration (S3_ENDPOINT_URL) to the api service in docker-compose.yml.

Testing and QA Results

Performed a full "Green Run" in the local stack to verify these changes:

S3 Setup: Uploaded a test file to the local autograder-assets bucket.
Configuration: Created a new assignment (test-asset-assignment-v2) mapping the S3 asset to /tmp/test_asset.txt.
Execution: Sent a Python submission that reads and prints the content of the injected asset.
Verification:
- Pre-flight: Confirmed "All preflight checks passed" (Asset injection successful).
- Security: Verified the container was running under gVisor with /tmp set to noexec.
- Score: Received a 100/100 score, confirming the asset was correctly read and matched the expected output.

Risks and Considerations

Performance: For very large assets (e.g., >50MB), the Base64 encoding step adds a ~33% transfer overhead. However, this is the most robust way to ensure compatibility with gVisor's strong isolation.
Legacy Support: Removed tar_utils.py. Any custom plugins relying on this utility will need to be updated to handle raw bytes.

This commit introduces the ability to inject grader-owned static assets (e.g., datasets, fixtures) from an S3 provider into the sandbox before setup commands and grading run. Key changes: - Added `ministack` to `docker-compose.yml` for local S3 emulation. - Created `SetupConfig` and `AssetConfig` Pydantic models for validation. - Implemented `AssetSourceResolver` with disk/memory LRU caching of pre-compressed tar archives. - Added `inject_assets` to `SandboxContainer` using Docker SDK's `put_archive`. - Refactored `PreFlightStep` and `PreFlightService` to orchestrate asset injection. - Updated documentation with the new `assets` configuration schema. Co-authored-by: gemini-cli <218195315+gemini-cli@users.noreply.github.com>

Copilot

Pull request overview

Adds a pipeline feature to fetch grader-owned static assets from an S3-compatible provider and inject them into the execution sandbox during preflight, before setup commands and grading run.

Changes:

Introduces SetupConfig / AssetConfig Pydantic models and updates preflight config resolution.
Implements an S3-backed asset resolver with memory/disk caching and tar-archive generation for Docker put_archive.
Adds sandbox-side asset injection plus docs, docker-compose local S3 emulation, and tests.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
`tests/unit/test_asset_injection.py`	Unit coverage for config parsing, resolver orchestration, and sandbox injection call shape.
`tests/integration/test_asset_preflight_integration.py`	Integration-style coverage that preflight triggers resolve + injection.
`sandbox_manager/sandbox_container.py`	Adds `inject_assets()` using Docker `exec_run` + `put_archive`.
`docs/features/setup_config_feature.md`	Documents new root-level `assets` schema and behavior.
`docker-compose.yml`	Adds `ministack` service for local S3 emulation.
`autograder/steps/pre_flight_step.py`	Orchestrates asset resolution + injection during preflight.
`autograder/services/pre_flight_service.py`	Refactors to accept `SetupConfig` and return a typed language config.
`autograder/services/assets/tar_utils.py`	Builds an in-memory tar archive for Docker `put_archive`.
`autograder/services/assets/s3_provider.py`	S3 provider to fetch objects and cache tar archives.
`autograder/services/assets/resolver.py`	Resolves `AssetConfig` → `ResolvedAsset` via provider.
`autograder/services/assets/provider.py`	Provider interface for asset backends.
`autograder/services/assets/cache_manager.py`	In-memory LRU + disk cache for tar bytes.
`autograder/models/dataclass/asset.py`	Adds `ResolvedAsset` dataclass.
`autograder/models/config/setup.py`	Defines the new setup/asset config models.
`.env.example`	Documents environment variables for the asset S3 provider.

Copilot · 2026-04-14T19:30:14Z

+        for asset in assets_config:
+            source = asset.source
+            target = asset.target
+            read_only = asset.read_only
+
+            # Resolve asset
+            tar_content = self.provider.get_asset_tar(source, target, read_only)
+
+            if not tar_content:
+                raise RuntimeError(f"Failed to resolve asset: source={source}, target={target}")
+
+            resolved_assets.append(ResolvedAsset(
+                target=target,
+                tar_content=tar_content
+            ))


The docs/stateful intent is that assets are placed under /tmp when the configured target doesn’t start with /tmp/, but the resolver currently passes asset.target through unchanged when generating the tar. Since SandboxContainer.inject_assets() uploads the tar verbatim, the actual extraction path will follow whatever was used when creating the tar (not the later rewritten target_path). Normalize/enforce the final target path here (or in AssetConfig validation) before calling get_asset_tar(), and store that normalized path in ResolvedAsset.target so tar creation and injection agree.

Copilot · 2026-04-14T19:30:15Z

+            if not sandbox:
+                error_msg = t("preflight.error.setup_command_missing_sandbox", locale=pipeline_exec.locale)
+                return pipeline_exec.add_step_result(StepResult.fail(
+                    step=self.step_name,
+                    error=error_msg
+                ))
+
+            logger.info("Injecting assets into sandbox (external_user_id=%s)", pipeline_exec.submission.user_id)
+            try:
+                resolved_assets = self._asset_resolver.resolve_assets(self._setup_config.assets)
+                sandbox.inject_assets(resolved_assets)
+            except Exception as e:
+                error_msg = f"Failed to inject assets: {str(e)}"
+                logger.error("Asset injection failed (external_user_id=%s): %s", pipeline_exec.submission.user_id, error_msg)
+                return pipeline_exec.add_step_result(StepResult.fail(
+                    step=self.step_name,
+                    error=error_msg
+                ))


The injected-assets missing-sandbox path reuses preflight.error.setup_command_missing_sandbox, which is misleading when only assets are configured. Also, on injection failure the step returns a raw str(e) in the student-facing error, which can be noisy and inconsistent with other preflight failures that return structured error_data. Add a dedicated translated error key for asset injection failures/missing sandbox and consider attaching structured details (e.g., source/target) instead of leaking raw exception text.

Copilot · 2026-04-14T19:30:16Z

+        # Environment variables
+        self.bucket_name = os.getenv("CRITERIA_ASSETS_BUCKET_NAME")
+        self.access_key = os.getenv("AWS_ACCESS_ID")
+        self.secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
+        self.region = os.getenv("AWS_REGION", "us-east-1")
+        self.endpoint_url = os.getenv("S3_ENDPOINT_URL")
+
+        # Initialize boto3 client
+        self.s3 = boto3.client(
+            's3',
+            endpoint_url=self.endpoint_url,
+            aws_access_key_id=self.access_key,
+            aws_secret_access_key=self.secret_key,
+            region_name=self.region,
+            config=Config(signature_version='s3v4')
+        )


S3AssetProvider will happily initialize with bucket_name=None / missing credentials, and then later get_object() will fail with a less clear boto3 error. Consider validating required env vars up-front (bucket name at minimum; and credentials depending on endpoint) and raising a clear, actionable exception so preflight failures are easier to diagnose.

Copilot · 2026-04-14T19:30:18Z

+import io
+import tarfile
+import os
+import time
+
+def create_tar_archive(filename: str, content: bytes, target_path: str, read_only: bool = True) -> bytes:
+    """


This module has unused imports/args that will fail CI linting (pylint enables unused-import/unused-argument on changed files): os is unused and the filename parameter is never referenced. Either remove them or use filename (e.g., to derive the tar entry name) and drop the unused import.

Copilot · 2026-04-14T19:30:19Z

+    def test_invalid_asset_target(self):
+        """Test validation of absolute target path."""
+        with pytest.raises(Exception):
+            AssetConfig(source="s", target="relative/path")
+
+    def test_asset_path_traversal(self):
+        """Test rejection of path traversal in source."""
+        with pytest.raises(Exception):
+            AssetConfig(source="../secret", target="/tmp/s")


These validation tests use pytest.raises(Exception), which is overly broad and can mask unrelated failures (and makes intent unclear). Since AssetConfig validation errors are deterministic, assert on the specific exception type raised by Pydantic (e.g., pydantic.ValidationError) or ValueError as appropriate.

…mp tmpfs

… instead of tar

…ompatible implementation

…lution implementation

ArthurCRodrigues · 2026-04-14T23:12:45Z

@jaoppb had to refactor some of your implementation:

I discovered that using Docker's native put_archive method has compatibility issues with the gVisor (runsc) runtime. Under high isolation, put_archive would often bypass the sandboxed VFS layer, resulting in "file not found"
errors when the student code tried to access the injected assets, even if the operation reported success.

To maintain our security posture while ensuring reliable asset injection, we switched to a Base64-encoded exec_run approach. By encoding the file content and piping it through base64 -d inside an exec session, we ensure that
the file is created directly within the gVisor-managed guest kernel. This method allows us to keep the /tmp directory secured with the noexec flag while still supporting dynamic file injection during the Pre-Flight stage.

ArthurCRodrigues

OK

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

ArthurCRodrigues

ok

Copilot AI review requested due to automatic review settings April 14, 2026 19:23

Copilot started reviewing on behalf of jaoppb April 14, 2026 19:24 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

ArthurCRodrigues added 11 commits April 14, 2026 19:46

feat(grader): support string commands in CommandResolver

c897944

fix(templates): pass kwargs to check_for_base_errors in input_output

8f67c40

config: add S3 environment variables to api service

fb3b98b

fix(sandbox): ensure asset injection works by disabling gVisor and /t…

8006aba

…mp tmpfs

refactor(assets): change asset resolution pipeline to use raw content…

555436a

… instead of tar

feat(sandbox): implement base64-based asset injection via exec_run

76e2736

security: re-enable gVisor and /tmp noexec mount

4af4c79

refactor(assets): remove obsolete tar_utils

0f62d66

Merge branch 'main' into sandbox-file-injection and resolve conflicts

1997521

docs: update asset injection documentation to reflect secure gVisor-c…

dbc2c4b

…ompatible implementation

test: update unit tests to match new asset injection and command reso…

ab9fae0

…lution implementation

ArthurCRodrigues reviewed Apr 14, 2026

View reviewed changes

Apply suggestions from code review

c288985

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

ArthurCRodrigues reviewed Apr 14, 2026

View reviewed changes

ArthurCRodrigues approved these changes Apr 14, 2026

View reviewed changes

ArthurCRodrigues merged commit acc0957 into main Apr 14, 2026
3 checks passed

ArthurCRodrigues deleted the sandbox-file-injection branch April 14, 2026 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement static asset placement for sandbox executions#292

feat: implement static asset placement for sandbox executions#292
ArthurCRodrigues merged 13 commits intomainfrom
sandbox-file-injection

jaoppb commented Apr 14, 2026 •

edited by ArthurCRodrigues

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Uh oh!

Uh oh!

ArthurCRodrigues commented Apr 14, 2026

Uh oh!

ArthurCRodrigues left a comment

Uh oh!

ArthurCRodrigues left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jaoppb commented Apr 14, 2026 • edited by ArthurCRodrigues Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request: Secure Sandbox Asset Injection and gVisor Compatibility

Objective

Changes

Sandbox Injection (gVisor Compatibility)

Asset Pipeline Refactor

Grader and Template Fixes

Infrastructure and Config

Testing and QA Results

Risks and Considerations

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ArthurCRodrigues commented Apr 14, 2026

Uh oh!

ArthurCRodrigues left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurCRodrigues left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jaoppb commented Apr 14, 2026 •

edited by ArthurCRodrigues

Loading