A curated list of awesome projects in the Harbor ecosystem.
- terminal-bench-2 - Measures agent ability to complete tasks in a terminal
- terminal-bench-pro - Extension of terminal-bench by Alibaba
- skillsbench - Measures agent ability to use skills
- otel-bench - Measures agent ability to instrument code with OpenTelemetry across multiple languages
- CompileBench - Measures agent ability to build a working binary from source
- harbor-datasets - Popular benchmarks (e.g. SWE-bench verified) ported to run in Harbor.
- SWE-gen-Java - 1000 JVM tasks generated from 16 open-source GitHub repos using SWE-gen
- SWE-gen-JS - 1000 JS/TS tasks generated from 30 open-source GitHub repos using SWE-gen
- SWE-gen-Rust - 1000 Rust SWE tasks generated using SWE-gen
- SWE-gen-Go - 1000 Go SWE tasks generated using SWE-gen
- SWE-gen-Cpp - 1000 C++ SWE tasks generated using SWE-gen
- Nemotron-Terminal-Synthetic-Tasks - Synthetic terminal tasks by NVIDIA
- seta-env - Scaling Environments for Terminal Agents: fully automated Harbor task synthesis and verification
- OpenThoughts-Agent - Generating Harbor tasks, distilling trajectories with SFT, and training with SkyRL
- endless-terminals - Procedurally generates terminal-use tasks and trains terminal agents with SkyRL
- Ares - Framework for online RL training of LLM agents, built on Harbor and SkyRL
- SkyRL Harbor Integration - Guide for RL training of agents with SkyRL and Harbor
- harbor-bot - GitHub bot automating QA on Harbor tasks
- Benchmark Template - Template for building benchmarks on Harbor with automated QA in CI
- SWE-gen - Convert GitHub PRs into Harbor tasks
- Oddish - Eval scheduler for running Harbor tasks with provider-aware queuing and automatic retries
- TerminalBenchTaskGenerator - Desktop app for chat-driven authoring of Harbor benchmark tasks
Contributions welcome! Open a PR to add a project you have created or love using.