-
Notifications
You must be signed in to change notification settings - Fork 3
Issues
is:issue state:open
is:issue state:open
Issue creation is restricted in this repository
Search results
Track Harness/Codecov acquisition impact on vera-bench CI
infrastructureBuild, CI, tooling, storageBuild, CI, tooling, storagequalityCode quality, testing, coverageCode quality, testing, coverageStatus: Open.#80 In aallan/vera-bench;- Status: Open.#72 In aallan/vera-bench;
Prompt caching for other providers (OpenAI instrumentation, Moonshot Context Caching)
enhancementNew feature or requestNew feature or requestStatus: Open.#61 In aallan/vera-bench;Add MoonBit as a comparison language
languagesLanguage support (Python, TypeScript, Go, etc.)Language support (Python, TypeScript, Go, etc.)Status: Open.#49 In aallan/vera-bench;Refactor models.py to a provider registry
enhancementNew feature or requestNew feature or requestStatus: Open.#45 In aallan/vera-bench;Automated scheduled benchmark runs with structured storage
infrastructureBuild, CI, tooling, storageBuild, CI, tooling, storageStatus: Open.#31 In aallan/vera-bench;Results dashboard (GitHub Pages or veralang.dev)
publishingPaper, dataset export, citation, dashboardPaper, dataset export, citation, dashboardStatus: Open.#30 In aallan/vera-bench;Multi-turn and agentic evaluation modes
evaluationBenchmark evaluation modes and model runsBenchmark evaluation modes and model runsStatus: Open.#29 In aallan/vera-bench;Hugging Face dataset export
publishingPaper, dataset export, citation, dashboardPaper, dataset export, citation, dashboardStatus: Open.#27 In aallan/vera-bench;Generate paper-quality figures (matplotlib/seaborn)
analysisFigures, reports, data analysisFigures, reports, data analysisStatus: Open.#26 In aallan/vera-bench;Expand to 75+ problems (15 per tier)
problemsProblem definitions and canonical solutionsProblem definitions and canonical solutionsStatus: Open.#25 In aallan/vera-bench;Run benchmark against multiple models (Opus, GPT-4o, DeepSeek, Gemini)
evaluationBenchmark evaluation modes and model runsBenchmark evaluation modes and model runsStatus: Open.#24 In aallan/vera-bench;