Use this checklist before tagging or announcing a public BioProtocolBench snapshot.
- Keep the public benchmark name as BioProtocolBench and the v0.1.x installable
distribution name as
labcraft. - Treat direct
src.*imports as internal compatibility paths for v0.1.x. Avoid introducing a second public import namespace in a patch release. - Keep multi-task execution on scripts/run_portfolio_eval.sh
presets:
snapshot,current,discovery,safety_case, andall. - Keep
labcraft_suite()as a single-task smoke alias unless a future breaking release introduces a real cross-task Inspect orchestration layer.
uv run pytest
uv run pytest tests/test_citations.py tests/test_scope_compliance.py tests/test_inspect_task.py- Confirm CITATION.cff has the intended version and release date.
- Confirm README.md, NOTICE, LICENSE, and LICENSE-DATA describe the same licensing split.
- Confirm pyproject.toml metadata points to the current repository and issue tracker.
- Include the commit SHA and log/result directory when reporting benchmark numbers.
- Frozen snapshot results should stay tied to
results/logs,results/results.md, and the top-level scorecard plots. - Newer wet-lab task bundles should remain in their
results/current_*directories unless intentionally promoted. - Discovery Decision Track bundles should remain in
results/discovery_*. - Do not overwrite existing
.evallogs when extending a seed range; useSEED_STARTand a separateLOG_DIRwhen needed.