Status: Active Date: June 2026
This policy applies to Eval Containers — the eval-containers CLI, the Docker
images and Compose/Helm artifacts it produces, and this repository's build and
test tooling. It defines how to report a vulnerability and the supply-chain
standards the project holds itself to. The standards below are enforced in the
tree; see .agents/ for the governing rules.
Do not open a public issue for a security vulnerability.
Use GitHub's private vulnerability reporting: the "Report a vulnerability" button under the Security tab of the repository. This opens a private advisory channel between you and the maintainers.
If you cannot use GitHub, the issue tracker's per-repo private contact applies; never disclose details in a public issue, PR, or discussion until a fix ships.
- A clear description and the impact you believe it has.
- Steps to reproduce (a minimal Compose file, image tag, or CLI invocation).
- The affected artifact — CLI version (
eval-containers --version), image tag (EVAL_*_TAG), or commit. - Any suggested remediation.
- Acknowledgement within 5 business days.
- An assessment and, if accepted, an estimated remediation timeline.
- Credit in the advisory and release notes, unless you ask to remain anonymous.
- Give us a reasonable window to ship a fix before any public disclosure.
- Do not access, modify, or exfiltrate data that is not yours; do not run denial-of-service or spam against shared infrastructure.
- An evaluation container runs untrusted agent code by design — see Scope for what is and is not a vulnerability in that model.
Eval Containers versions the whole fleet with one SemVer (see
.agents/RULES.md principle 9). Security fixes land on the
most recent release and ship as a patch bump — principle 9 names
"base-image/CVE updates" as patch-worthy. Older lines are not maintained; if you
run a pinned tag, upgrade to the latest patch to pick up a fix.
The threat the controls below address: a malicious or vulnerable dependency entering the CLI's Rust tree, an image's base/packages, or the build itself.
- Lockfile is committed and authoritative.
Cargo.lockis in the tree; the published crate is built withcargo publish --lockedso releases are reproducible. - CVE scanning with
cargo audit. Every PR that touchesCargo.toml/Cargo.lock, and a weekly schedule, runcargo auditagainst the RustSec advisory database in.github/workflows/audit.yml. A known advisory fails the check. The weekly run catches advisories published against already-pinned versions. The same scan is step 22a of release verification (VERIFY.md, Phase 5). - Automated, age-gated updates.
.github/dependabot.ymlproposescargoupdates with a 14-day cooldown (andgithub-actionswith 7) — a version is not proposed until it has been public that long, which blunts day-zero malicious-publish attacks while still keeping dependencies current. Review dependency-update PRs for the changelog and any advisory before merge.
- Secrets never enter the tree, images, or history. Three independent
layers scan every PR, each with a different strategy so a miss in one is
caught by another:
gitleaks(rule-based, config.github/.gitleaks.toml);detect-secrets(entropy + per-vendor keyword plugins, baseline.github/.secrets.baseline); andtrivy config(tests/static/security/trivy.sh) for build-arg secrets and IaC misconfiguration. Build-time credentials use--mount=type=secret, neverCOPY(.agents/RULES.mdprinciple 10c). - Dockerfile policy.
hadolint(generic hygiene) plus a conftest/OPA policy (tests/static/policy/) enforce the eval LABEL contract, the upstream-pin allowlist, and image hygiene on every PR. - All scans run in CI on every PR via
.github/workflows/pre-commit.yml, scoped to the changed files, and again whole-tree onmain.
Eval Containers exists to run untrusted agent code inside a container against a benchmark. The isolation boundary is the container and the network policy around it.
In scope (please report):
- Anything that lets agent code escape its container, reach the host, or read another run's data.
- A path for the agent to observe or tamper with the model proxy / trajectory
(
.agents/RULES.mdprinciple 5). - Secret exposure in a published image, layer, or log.
- A known CVE in a shipped dependency that
cargo audit/ image scanning does not catch.
Out of scope (by design, not a vulnerability):
- The agent executing arbitrary code inside its own container — that is the product.
- Resource use by a running evaluation within its configured limits.
- Findings that require a compromised host or registry to begin with.