Skip to content

feat: add Every Eval Ever format to batch publish#39

Open
elronbandel wants to merge 1 commit into
mainfrom
batch-publish-eee
Open

feat: add Every Eval Ever format to batch publish#39
elronbandel wants to merge 1 commit into
mainfrom
batch-publish-eee

Conversation

@elronbandel

Copy link
Copy Markdown
Contributor

Summary

  • Extends exgentic batch publish with --format eee to generate results in the Every Eval Ever schema format (v0.2.2)
  • Supports --output-dir for saving EEE JSON files locally (for submitting PRs to the EEE datastore)
  • Supports --repo for pushing EEE data to HuggingFace
  • Includes agent metadata (name, framework), cost details, and uncertainty from session counts
  • Validated against the official EEE pydantic schema

Usage

# Save EEE JSON files locally
exgentic batch publish \
  --config '/path/to/*/config.json' \
  --format eee \
  --output-dir data/exgentic-results

# Push to HF in EEE format
exgentic batch publish \
  --config '/path/to/*/config.json' \
  --format eee \
  --repo Exgentic/eee-results

# Original exgentic format (unchanged)
exgentic batch publish \
  --config '/path/to/*/config.json' \
  --repo Exgentic/open-agent-leaderboard-results

Test plan

  • Generated EEE JSON files pass official EvaluationLog.model_validate()
  • Tested with appworld, swebench configs — correct developer/model extraction
  • Pre-commit checks pass
  • Test full 77-config publish in EEE format

Extends `exgentic batch publish` with `--format eee` option to generate
evaluation results in the Every Eval Ever schema format (v0.2.2).

- Converts exgentic results to EEE EvaluationLog JSON files
- Supports `--output-dir` for saving files locally (for PR submissions)
- Supports `--repo` for pushing EEE data to HuggingFace
- Validated against the official EEE pydantic schema
- Includes agent metadata, cost details, and uncertainty from session counts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant