This document outlines the versioning strategy and release process for τ²-bench.
We follow Semantic Versioning (SemVer) with the format MAJOR.MINOR.PATCH:
- MAJOR: Incompatible API changes, breaking changes to domain policies, or fundamental architecture changes
- MINOR: New features, new domains, backwards-compatible functionality additions
- PATCH: Bug fixes, documentation updates, performance improvements
1.0.0→1.0.1: Bug fix in evaluation metrics1.0.0→1.1.0: New domain added (e.g., healthcare)1.0.0→2.0.0: Breaking API change in agent interface
- Significant architectural changes
- Breaking changes to APIs
- Major new capabilities
- Released when breaking changes accumulate (typically annually)
- New domains or evaluation scenarios
- New CLI commands or major features
- Backwards-compatible API extensions
- Released as needed (every few months based on feature readiness)
- Bug fixes and security patches
- Documentation improvements
- Performance optimizations
- Released as needed (typically within days of fixes)
We use automated releases via Release Please for consistency and efficiency, with manual releases as a fallback option.
Our GitHub Actions workflow automatically handles releases when conventional commits are pushed to main:
-
Development with Conventional Commits
git commit -m "feat(domains): add healthcare domain" git commit -m "fix(cli): resolve Unicode handling" git commit -m "feat!: redesign agent API (BREAKING CHANGE)"
-
Automatic Release PR Creation
- Release Please analyzes commits since last release
- Creates/updates a release PR with:
- Updated
CHANGELOG.md - Version bump in
pyproject.toml - Generated release notes
- Updated
-
Review and Merge
- Review the automated release PR
- Merge when ready to release
- GitHub release is automatically created
-
Optional: Manual Release Notes Enhancement
- Update
RELEASE_NOTES.mdwith user-friendly content - Add migration guides for breaking changes
- Update
For urgent releases or when automation isn't available:
-
Pre-Release Testing
make test # Run full test suite tau2 run --domain mock --num-tasks 1 # Quick integration test
-
Update Files
- Update
versioninpyproject.toml - Add entry to
CHANGELOG.md - Update
RELEASE_NOTES.md
- Update
-
Create Release
git add . git commit -m "chore: prepare release v1.1.0" git tag -a v1.1.0 -m "Release version 1.1.0" git push origin main git push origin v1.1.0
-
Create GitHub Release
- Use content from
RELEASE_NOTES.md - Attach any relevant artifacts
- Use content from
-
Automated Actions (handled by workflow)
- GitHub release creation
- Package building (ready for PyPI)
- Tag creation
-
Manual Follow-up
- Update
RELEASE_NOTES.mdwith user-friendly content - Update leaderboard at tau-bench.com if needed
- Social media announcements for major releases
- Blog posts for significant features
- Update
-
Optional: PyPI Publishing
# Uncomment PyPI section in .github/workflows/release.yml # Add PYPI_API_TOKEN to GitHub secrets
Our Release Please workflow automatically generates CHANGELOG.md entries from conventional commits:
| Commit Type | Changelog Section | Example |
|---|---|---|
feat: |
Added | feat(domains): add healthcare domain |
fix: |
Fixed | fix(cli): resolve Unicode handling |
perf: |
Performance | perf: optimize concurrent execution |
docs: |
Documentation | docs: update installation guide |
When manually updating CHANGELOG.md, use these standardized categories:
- Added: New features, domains, or capabilities
- Changed: Changes in existing functionality
- Deprecated: Soon-to-be removed features
- Removed: Features removed in this version
- Fixed: Bug fixes
- Security: Vulnerability fixes
- Performance: Performance improvements
## [1.1.0] - 2025-02-15
### Added
- New healthcare domain with 50 evaluation tasks
- Support for streaming responses in CLI
- Agent performance visualization dashboard
### Changed
- Improved error messages in submission validation
- Updated default LLM timeout from 30s to 60s
### Fixed
- Fixed memory leak in concurrent evaluations
- Resolved issue with Unicode characters in task descriptionsFor ongoing development, we use -dev suffix in pyproject.toml:
- Development:
0.2.1-dev- Active development after v0.2.0 release - Pre-release:
0.3.0-alpha.1- Early testing (manual release) - Release Candidate:
0.3.0-rc.1- Final testing before release
- Update to
x.y.z-devimmediately after releasingx.y.z - Use conventional commits during development
- Let Release Please handle version bumping for releases
- Major versions (x.0.0) receive security updates for 1 year
- Critical bug fixes backported to last 2 minor versions
- Security patches backported to all supported versions
- Security vulnerabilities: Always backported
- Critical bugs: Backported to supported versions
- Data corruption issues: Immediate backport
- Performance regressions: Case-by-case basis
- Announcement: Feature marked as deprecated
- Warning Period: 2 minor versions with warnings
- Removal: Remove in next major version
- Add deprecation warnings to code
- Document in CHANGELOG.md
- Include migration guide in RELEASE_NOTES.md
- Announce in GitHub discussions
- Conventional Commits: Standardize commit messages
- Release Please: Automate changelog generation
- Semantic Release: Automatic version bumping
- GitHub Actions: Automate testing and releases
Our actual workflow (.github/workflows/release.yml):
name: Release
on:
push:
branches: [main]
permissions:
contents: write
pull-requests: write
jobs:
release-please:
runs-on: ubuntu-latest
steps:
- uses: google-github-actions/release-please-action@v4
with:
release-type: python
package-name: tau2
version-file: pyproject.toml
include-v-in-tag: true
changelog-types: |
[
{"type":"feat","section":"Added","hidden":false},
{"type":"fix","section":"Fixed","hidden":false},
{"type":"perf","section":"Performance","hidden":false},
{"type":"docs","section":"Documentation","hidden":false}
]This workflow:
- ✅ Automatically creates release PRs
- ✅ Generates changelog from conventional commits
- ✅ Handles version bumping in
pyproject.toml - ✅ Creates GitHub releases
- 🔲 PyPI publishing (ready but commented out)
For critical security or data corruption issues:
- Immediate Response: Fix on main branch
- Fast Track: Skip normal review process
- Hotfix Release: Increment patch version
- Communication: Immediate notification to users
- Post-Mortem: Document incident and prevention
This versioning strategy ensures predictable, reliable releases while maintaining backwards compatibility and clear communication with users.