TraceCore’s source of truth is the specification under /spec/. The Python code in this repository implements that specification, but any language or runtime can replicate the same behavior by honoring the same contracts.
- Inputs frozen by the spec —
agent_ref,task_ref,seed,budgets, andspec_versionare resolved before execution. The runtime loads these values, records them verbatim, and refuses to mutate them mid-episode. - GuardedEnv + task harness — Tasks declare filesystem/network policies and validators inside
task.toml. The runtime simply enforces these declarations; new implementations must respect the same manifest contract. - Validator + taxonomy — Validators emit structured payloads that the runtime maps into the canonical termination taxonomy (
success,budget_exhausted,logic_failure, etc.) defined by the spec. - Artifact serializer — After termination, the runtime serializes the run using
/spec/artifact-schema-v0.1.json. The schema—not Python internals—is the normative definition of the output format.
Agent ──▶ Runner ──▶ Artifact (.json)
├─▶ Baseline bundle (manifest/tool_calls/validator)
└─▶ FastAPI Dashboard & APIs
- Artifacts are immutable and portable. CI, dashboards, and external services all read the same JSON.
- Bundles add integrity hashes so third parties can verify provenance without running the CLI.
- Because the schema is language-neutral, another team could build a Rust or TypeScript runner that emits the same format and remain TraceCore-compliant.
| Layer | Reference behavior | Spec requirement |
|---|---|---|
CLI (tracecore, agent-bench) |
Provides user-friendly commands, convenience flags, dashboard launchers. | Optional. Other runtimes can expose different CLIs if artifacts/spec are satisfied. |
| Runner | Enforces budgets, sandbox rules, validator lifecycle, and emits termination taxonomy. | Required. Any compliant runtime must implement the same lifecycle. |
| Artifact serializer | Uses Pydantic/typing helpers to guarantee schema compliance. | Required output format, implementation details optional. |
| Dashboard / APIs | FastAPI UI driven entirely by stored artifacts. | Optional; showcases how artifacts unlock replay + diffs. |
tracecore run --timeoutandtracecore run batch --timeoutboth enforce wall-clock limits viaagent_bench.runner.isolation.run_isolated()rather than in-process thread joins or platform-specific signals.- The timeout manager runs the episode in a spawned child process, waits on a result queue, and kills the child if the wall-clock budget is exceeded.
- This keeps timeout behavior cross-platform and prevents hung agent/task code from leaving the parent CLI or batch worker in a partially blocked state.
- Timeout exits remain structured: single-run CLI paths raise a non-zero exit with the existing timeout message, while batch jobs return
_ok=Falsepayloads that preserve timeout failures in summaries.
To create a new implementation (e.g., Rust service, JS agent host):
- Consume
/spec/tracecore-spec-v0.1.mdfor lifecycle and compliance rules. - Validate artifacts against
/spec/artifact-schema-v0.1.json. - Follow
/spec/determinism.mdfor seeded runs, mocks, and model pinning. - Use
/spec/compliance-checklist-v0.1.mdto gate releases or CI runs. - Publish the runtime identity (name + version) and declared spec version inside every artifact.
If all artifacts validate and the lifecycle is honored, the implementation is considered spec-compliant even if it shares zero code with this repository.
tracecore run --strict-specvalidates artifacts post-run against/spec/artifact-schema-v0.1.json, checks required metadata (spec_version,runtime_identity,task_hash,artifact_hash), and verifiesfailure_typeis within the canonical taxonomy. Exits non-zero on any violation.- Every run artifact now embeds
spec_version,runtime_identity,task_hash,agent_ref,budgets, andartifact_hashautomatically — compliance checks are additive post-run assertions, not separate execution modes. - CI workflows can gate merges with
agent-bench run --agent ... --task ... --strict-spec. - Future deliverables may include hosted validators that accept artifacts over HTTPS and respond with compliance reports, leveraging
/spec/artifact-schema-v0.1.jsonas the portable contract.