We welcome contributions from the community. TwinBench is meant to be approachable to runtime builders, researchers, and advanced indie teams, not just internal operators.
The most impactful contribution is running TwinBench against your runtime and submitting verified results.
- Run the harness:
python3.10 -m harness.runner --url YOUR_URL --token YOUR_TOKEN --user-id 1 --name "Your Runtime" --output results/your-runtime.json --markdown results/your-runtime.md --html results/your-runtime.html - Open an issue using the Submit Results template
- Attach your
results/your-runtime.json
We publish all verified results in the leaderboard.
Required v0.2 fields for leaderboard eligibility:
verified_composite_scoreprojected_composite_scoremeasured_coveragecoverage_adjusted_verified_score
Recommended submission attachments:
results/your-runtime.mdorresults/your-runtime.html- Runtime version and commit SHA
- Harness commit SHA
/internal/diagnosticssnapshot/metricssnapshot when available- brief incident note if the runtime, network, or model provider degraded during the run
- Optional run manifest based on
docs/run-manifest-v0.2.example.json
Please also read docs/TRUST_MODEL.md before submitting. The goal is not just a high score; it is a result other builders will trust.
If you are new to the repo, start with:
docs/GETTING_STARTED.mddocs/PREFLIGHT.mddocs/HOW_TO_SUBMIT.md
If you believe an important personal AI assistant runtime capability is not covered by the current 10 dimensions, open an issue with:
- What the dimension measures
- Why it matters for persistent personal AI assistant runtimes
- A proposed test protocol
- How it would be scored
Bug fixes, better test coverage, and improved scoring logic are welcome.
- Fork the repo
- Create a feature branch
- Submit a PR with a clear description
- Include test output showing the change works
Weight discussions are welcome. Open an issue with:
- Which weights you propose changing
- Your rationale
- How it affects the leaderboard
We treat weight calibration as a community decision, not an editorial one.
- Python 3.10+
- Type hints where practical
- No external dependencies beyond
requirements.txt - Keep scripts self-contained (one file per dimension)
By contributing, you agree that your contributions will be licensed under Apache-2.0.