Add experimental live endpoint collection helper#137
Conversation
📝 WalkthroughWalkthroughThis pull request adds experimental live collection functionality to the adapter SDK. It introduces configuration models, input loading from CSV/JSONL, orchestrated response collection from OpenAI-compatible endpoints with retry logic, and atomic JSONL output with manifest metadata. Six symbols are exported from the adapter module. ChangesLive endpoint collection feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/evalhub/adapter/live_collection.py`:
- Around line 198-205: The current code in live_collection.py stringifies
non-string values with question = str(row.get(config.question_field,
"")).strip(), which turns None and numbers into valid prompts; change validation
so only actual strings are accepted: read raw_value =
row.get(config.question_field), check isinstance(raw_value, str) and then
perform .strip() to produce question, otherwise set output["error"] =
{"type":"missing_question", "message": f"Missing or empty question field:
{config.question_field}"} and yield output then continue (preserve existing
output variable and flow).
- Around line 93-123: Add a model-level validation on LiveCollectionConfig to
reject identical input_path and output_path: implement a
`@model_validator`(mode="after") method named e.g. validate_paths that compares
self.input_path.resolve() and self.output_path.resolve() (and raises
ValueError("input_path and output_path must be different") if they are equal) so
the config cannot be created when input and output point to the same file;
reference LiveCollectionConfig, input_path, output_path and manifest_path to
locate where to add the validator.
- Around line 86-90: The api_key method currently returns None when api_key_env
is set but the environment variable is missing/empty; change api_key (and use
the api_key_env attribute) to fail fast by checking os.getenv(self.api_key_env),
trimming whitespace, and raising a clear configuration error (e.g., ValueError
or RuntimeError) if the value is None or blank so callers never send
unauthenticated requests silently; otherwise return the non-empty string.
In `@tests/unit/test_live_collection.py`:
- Around line 9-17: Add a module-level pytest marker to this test file by
defining pytestmark = pytest.mark.unit at the top of
tests/unit/test_live_collection.py (the file already imports pytest), so the
module is consistently selectable in CI; place the assignment near the existing
imports (which include LiveCollectionConfig, LiveEndpointConfig,
load_input_rows, load_live_collection_config, run_live_collection) so no
per-test decorators are required.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f52b2cd3-53e2-4ef6-b051-6e1254a263f0
📒 Files selected for processing (3)
src/evalhub/adapter/__init__.pysrc/evalhub/adapter/live_collection.pytests/unit/test_live_collection.py
|
Addressed the CodeRabbit review in aa6d48e:
Validated locally with ruff format/check, pytest for tests/unit/test_live_collection.py (13 passed), mypy on the changed files, git diff --check, and a Claude Code blocker-only review of the follow-up diff. |
What and why
Refs #135.
This adds an experimental adapter-side live endpoint collection helper for the SDK-helper-first path discussed in the issue. It lets adapters collect model responses during loading by reading CSV/JSONL question rows, calling an OpenAI-compatible
/chat/completionsendpoint, and writing normalized JSONL rows plus a small manifest.The helper preserves source fields and adds
response,response_metadata, anderrorfields. Per-row endpoint failures are captured in the output, while configuration hazards fail early.Security/trust-boundary choices:
api_key_envrequires anhttps://base URLType
feat
Testing
uv run ruff format src/evalhub/adapter/live_collection.py tests/unit/test_live_collection.pyuv run ruff check src/evalhub/adapter/live_collection.py tests/unit/test_live_collection.pyuv run pytest tests/unit/test_live_collection.py -q(13 passed)uv run mypy --config-file=pyproject.toml src/evalhub/adapter/live_collection.py tests/unit/test_live_collection.pygit diff --checkBreaking changes
None. The helper is new experimental adapter SDK surface.
Summary by CodeRabbit