Add experimental live endpoint collection helper by ai-hustle-bro · Pull Request #137 · eval-hub/eval-hub-sdk

ai-hustle-bro · 2026-06-04T03:43:30Z

What and why

Refs #135.

This adds an experimental adapter-side live endpoint collection helper for the SDK-helper-first path discussed in the issue. It lets adapters collect model responses during loading by reading CSV/JSONL question rows, calling an OpenAI-compatible /chat/completions endpoint, and writing normalized JSONL rows plus a small manifest.

The helper preserves source fields and adds response, response_metadata, and error fields. Per-row endpoint failures are captured in the output, while configuration hazards fail early.

Security/trust-boundary choices:

api_key_env requires an https:// base URL
bearer tokens are read from environment variables, not job JSON
missing or blank API key environment variables fail before any request is sent
redirects are not followed, so auth headers are not forwarded to redirect targets
input and output paths cannot point to the same file
this is intentionally adapter/operator-controlled config; hosted runtimes that accept this config from untrusted users should add their own egress policy or endpoint allowlist before enabling it

Type

feat

Testing

uv run ruff format src/evalhub/adapter/live_collection.py tests/unit/test_live_collection.py
uv run ruff check src/evalhub/adapter/live_collection.py tests/unit/test_live_collection.py
uv run pytest tests/unit/test_live_collection.py -q (13 passed)
uv run mypy --config-file=pyproject.toml src/evalhub/adapter/live_collection.py tests/unit/test_live_collection.py
git diff --check
Claude Code CLI blocker-only review of the follow-up diff: no blockers

Breaking changes

None. The helper is new experimental adapter SDK surface.

Summary by CodeRabbit

New Features
- Added experimental live endpoint collection functionality to gather responses from OpenAI-compatible chat completions endpoints
- Supports configuration loading and batch processing of input data (CSV/JSONL formats)
- Automatically collects responses, metadata, and error details with retry support and manifest generation

coderabbitai · 2026-06-04T03:43:50Z

📝 Walkthrough

Walkthrough

This pull request adds experimental live collection functionality to the adapter SDK. It introduces configuration models, input loading from CSV/JSONL, orchestrated response collection from OpenAI-compatible endpoints with retry logic, and atomic JSONL output with manifest metadata. Six symbols are exported from the adapter module.

Changes

Live endpoint collection feature

Layer / File(s)	Summary
Configuration models and validation `src/evalhub/adapter/live_collection.py`, `tests/unit/test_live_collection.py`	Defines `LiveEndpointConfig` and `LiveCollectionConfig` with URL scheme/host validation, non-empty field checking, API key environment variable handling, and HTTPS enforcement when an API key is configured. Includes computed properties for chat-completions URL and manifest path. Summary model captures success/failure counts and output paths. Test support helper and validation tests cover required fields, invalid URLs, and HTTPS/API key enforcement.
Input loading from CSV and JSONL `src/evalhub/adapter/live_collection.py`, `tests/unit/test_live_collection.py`	`load_input_rows` reads `.csv` via `csv.DictReader` and `.jsonl` via line-by-line JSON parsing with empty-line skipping. Validates that JSONL rows are objects and rejects unsupported file extensions. Tests confirm both formats load correctly and non-object JSONL rows are rejected.
Response collection, endpoint integration, and result persistence `src/evalhub/adapter/live_collection.py`, `tests/unit/test_live_collection.py`	`load_live_collection_config` parses `parameters["live_collection"]` into a validated config. `run_live_collection` orchestrates loading input, collecting responses, writing JSONL output atomically, and writing a JSON manifest with metadata. `collect_live_responses` iterates rows, attaches response/error placeholders, validates question field presence, and calls the endpoint. `call_openai_chat_completion` uses optional httpx, constructs OpenAI-compatible chat completions requests, implements retry logic with exponential backoff for transient HTTP errors, validates response shape, and returns message content with metadata. `write_jsonl` writes rows using a named temporary file in the output directory and atomic replace. `write_manifest` writes a JSON sidecar with dataset paths and collection metadata. Integration tests validate the full flow, per-row success/error handling, authorization headers, retry behavior on HTTP 500, and HTTP 307 redirect errors when redirect following is disabled.
Public adapter SDK exports `src/evalhub/adapter/__init__.py`	Adds imports from `live_collection` module and extends `__all__` to export `LiveEndpointConfig`, `LiveCollectionConfig`, `LiveCollectionSummary`, `load_input_rows`, `load_live_collection_config`, and `run_live_collection` as public API.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through CSV fields so neat,
Collects responses from endpoints it will meet,
With retries and magic, the data flows free,
From OpenAI to JSON, as clean as can be! 🐰✨
Live collection now blooms in the SDK's great tree!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title concisely summarizes the main change—adding an experimental live endpoint collection helper.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description comprehensively covers all required template sections with clear, detailed information about the feature, testing performed, and breaking changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/evalhub/adapter/live_collection.py`:
- Around line 198-205: The current code in live_collection.py stringifies
non-string values with question = str(row.get(config.question_field,
"")).strip(), which turns None and numbers into valid prompts; change validation
so only actual strings are accepted: read raw_value =
row.get(config.question_field), check isinstance(raw_value, str) and then
perform .strip() to produce question, otherwise set output["error"] =
{"type":"missing_question", "message": f"Missing or empty question field:
{config.question_field}"} and yield output then continue (preserve existing
output variable and flow).
- Around line 93-123: Add a model-level validation on LiveCollectionConfig to
reject identical input_path and output_path: implement a
`@model_validator`(mode="after") method named e.g. validate_paths that compares
self.input_path.resolve() and self.output_path.resolve() (and raises
ValueError("input_path and output_path must be different") if they are equal) so
the config cannot be created when input and output point to the same file;
reference LiveCollectionConfig, input_path, output_path and manifest_path to
locate where to add the validator.
- Around line 86-90: The api_key method currently returns None when api_key_env
is set but the environment variable is missing/empty; change api_key (and use
the api_key_env attribute) to fail fast by checking os.getenv(self.api_key_env),
trimming whitespace, and raising a clear configuration error (e.g., ValueError
or RuntimeError) if the value is None or blank so callers never send
unauthenticated requests silently; otherwise return the non-empty string.

In `@tests/unit/test_live_collection.py`:
- Around line 9-17: Add a module-level pytest marker to this test file by
defining pytestmark = pytest.mark.unit at the top of
tests/unit/test_live_collection.py (the file already imports pytest), so the
module is consistently selectable in CI; place the assignment near the existing
imports (which include LiveCollectionConfig, LiveEndpointConfig,
load_input_rows, load_live_collection_config, run_live_collection) so no
per-test decorators are required.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f52b2cd3-53e2-4ef6-b051-6e1254a263f0

📥 Commits

Reviewing files that changed from the base of the PR and between 3ca8447 and 64c8838.

📒 Files selected for processing (3)

src/evalhub/adapter/__init__.py
src/evalhub/adapter/live_collection.py
tests/unit/test_live_collection.py

ai-hustle-bro · 2026-06-04T04:11:50Z

Addressed the CodeRabbit review in aa6d48e:

fail fast when api_key_env is configured but missing or blank
reject identical input_path/output_path in LiveCollectionConfig
treat non-string question values as missing_question without calling the endpoint
add module-level pytestmark = pytest.mark.unit
update the PR body to match the project template

Validated locally with ruff format/check, pytest for tests/unit/test_live_collection.py (13 passed), mypy on the changed files, git diff --check, and a Claude Code blocker-only review of the follow-up diff.

ai-hustle-bro · 2026-06-07T19:15:51Z

Closing this older implementation in favor of #139, which carries the current live collection helper work, follow-up fixes, passing checks, and the cleaner API shape for issue #135.

feat(adapter): add experimental live endpoint collection

64c8838

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread src/evalhub/adapter/live_collection.py Outdated

Comment thread src/evalhub/adapter/live_collection.py

Comment thread src/evalhub/adapter/live_collection.py Outdated

Comment thread tests/unit/test_live_collection.py

fix(adapter): address live collection review comments

aa6d48e

ruivieira self-assigned this Jun 4, 2026

ruivieira added the kind/feat Categorizes issue as a feature request label Jun 4, 2026

ruivieira added this to EvalHub Planning Jun 4, 2026

github-project-automation Bot moved this to Todo in EvalHub Planning Jun 4, 2026

ruivieira added the ready-for-review Coderabbit comments have been addressed and the PR is ready to be reviewed label Jun 4, 2026

ai-hustle-bro closed this Jun 7, 2026

github-project-automation Bot moved this from Todo to Done in EvalHub Planning Jun 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add experimental live endpoint collection helper#137

Add experimental live endpoint collection helper#137
ai-hustle-bro wants to merge 2 commits into
eval-hub:mainfrom
ai-hustle-bro:feat/live-endpoint-collection-helper

ai-hustle-bro commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ai-hustle-bro commented Jun 4, 2026

Uh oh!

ai-hustle-bro commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ai-hustle-bro commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What and why

Type

Testing

Breaking changes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ai-hustle-bro commented Jun 4, 2026

Uh oh!

ai-hustle-bro commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ai-hustle-bro commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading