Skip to content

Add experimental live endpoint collection helper#137

Closed
ai-hustle-bro wants to merge 2 commits into
eval-hub:mainfrom
ai-hustle-bro:feat/live-endpoint-collection-helper
Closed

Add experimental live endpoint collection helper#137
ai-hustle-bro wants to merge 2 commits into
eval-hub:mainfrom
ai-hustle-bro:feat/live-endpoint-collection-helper

Conversation

@ai-hustle-bro

@ai-hustle-bro ai-hustle-bro commented Jun 4, 2026

Copy link
Copy Markdown

What and why

Refs #135.

This adds an experimental adapter-side live endpoint collection helper for the SDK-helper-first path discussed in the issue. It lets adapters collect model responses during loading by reading CSV/JSONL question rows, calling an OpenAI-compatible /chat/completions endpoint, and writing normalized JSONL rows plus a small manifest.

The helper preserves source fields and adds response, response_metadata, and error fields. Per-row endpoint failures are captured in the output, while configuration hazards fail early.

Security/trust-boundary choices:

  • api_key_env requires an https:// base URL
  • bearer tokens are read from environment variables, not job JSON
  • missing or blank API key environment variables fail before any request is sent
  • redirects are not followed, so auth headers are not forwarded to redirect targets
  • input and output paths cannot point to the same file
  • this is intentionally adapter/operator-controlled config; hosted runtimes that accept this config from untrusted users should add their own egress policy or endpoint allowlist before enabling it

Type

feat

Testing

  • uv run ruff format src/evalhub/adapter/live_collection.py tests/unit/test_live_collection.py
  • uv run ruff check src/evalhub/adapter/live_collection.py tests/unit/test_live_collection.py
  • uv run pytest tests/unit/test_live_collection.py -q (13 passed)
  • uv run mypy --config-file=pyproject.toml src/evalhub/adapter/live_collection.py tests/unit/test_live_collection.py
  • git diff --check
  • Claude Code CLI blocker-only review of the follow-up diff: no blockers

Breaking changes

None. The helper is new experimental adapter SDK surface.

Summary by CodeRabbit

  • New Features
    • Added experimental live endpoint collection functionality to gather responses from OpenAI-compatible chat completions endpoints
    • Supports configuration loading and batch processing of input data (CSV/JSONL formats)
    • Automatically collects responses, metadata, and error details with retry support and manifest generation

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This pull request adds experimental live collection functionality to the adapter SDK. It introduces configuration models, input loading from CSV/JSONL, orchestrated response collection from OpenAI-compatible endpoints with retry logic, and atomic JSONL output with manifest metadata. Six symbols are exported from the adapter module.

Changes

Live endpoint collection feature

Layer / File(s) Summary
Configuration models and validation
src/evalhub/adapter/live_collection.py, tests/unit/test_live_collection.py
Defines LiveEndpointConfig and LiveCollectionConfig with URL scheme/host validation, non-empty field checking, API key environment variable handling, and HTTPS enforcement when an API key is configured. Includes computed properties for chat-completions URL and manifest path. Summary model captures success/failure counts and output paths. Test support helper and validation tests cover required fields, invalid URLs, and HTTPS/API key enforcement.
Input loading from CSV and JSONL
src/evalhub/adapter/live_collection.py, tests/unit/test_live_collection.py
load_input_rows reads .csv via csv.DictReader and .jsonl via line-by-line JSON parsing with empty-line skipping. Validates that JSONL rows are objects and rejects unsupported file extensions. Tests confirm both formats load correctly and non-object JSONL rows are rejected.
Response collection, endpoint integration, and result persistence
src/evalhub/adapter/live_collection.py, tests/unit/test_live_collection.py
load_live_collection_config parses parameters["live_collection"] into a validated config. run_live_collection orchestrates loading input, collecting responses, writing JSONL output atomically, and writing a JSON manifest with metadata. collect_live_responses iterates rows, attaches response/error placeholders, validates question field presence, and calls the endpoint. call_openai_chat_completion uses optional httpx, constructs OpenAI-compatible chat completions requests, implements retry logic with exponential backoff for transient HTTP errors, validates response shape, and returns message content with metadata. write_jsonl writes rows using a named temporary file in the output directory and atomic replace. write_manifest writes a JSON sidecar with dataset paths and collection metadata. Integration tests validate the full flow, per-row success/error handling, authorization headers, retry behavior on HTTP 500, and HTTP 307 redirect errors when redirect following is disabled.
Public adapter SDK exports
src/evalhub/adapter/__init__.py
Adds imports from live_collection module and extends __all__ to export LiveEndpointConfig, LiveCollectionConfig, LiveCollectionSummary, load_input_rows, load_live_collection_config, and run_live_collection as public API.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through CSV fields so neat,
Collects responses from endpoints it will meet,
With retries and magic, the data flows free,
From OpenAI to JSON, as clean as can be! 🐰✨
Live collection now blooms in the SDK's great tree!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely summarizes the main change—adding an experimental live endpoint collection helper.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description comprehensively covers all required template sections with clear, detailed information about the feature, testing performed, and breaking changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/evalhub/adapter/live_collection.py`:
- Around line 198-205: The current code in live_collection.py stringifies
non-string values with question = str(row.get(config.question_field,
"")).strip(), which turns None and numbers into valid prompts; change validation
so only actual strings are accepted: read raw_value =
row.get(config.question_field), check isinstance(raw_value, str) and then
perform .strip() to produce question, otherwise set output["error"] =
{"type":"missing_question", "message": f"Missing or empty question field:
{config.question_field}"} and yield output then continue (preserve existing
output variable and flow).
- Around line 93-123: Add a model-level validation on LiveCollectionConfig to
reject identical input_path and output_path: implement a
`@model_validator`(mode="after") method named e.g. validate_paths that compares
self.input_path.resolve() and self.output_path.resolve() (and raises
ValueError("input_path and output_path must be different") if they are equal) so
the config cannot be created when input and output point to the same file;
reference LiveCollectionConfig, input_path, output_path and manifest_path to
locate where to add the validator.
- Around line 86-90: The api_key method currently returns None when api_key_env
is set but the environment variable is missing/empty; change api_key (and use
the api_key_env attribute) to fail fast by checking os.getenv(self.api_key_env),
trimming whitespace, and raising a clear configuration error (e.g., ValueError
or RuntimeError) if the value is None or blank so callers never send
unauthenticated requests silently; otherwise return the non-empty string.

In `@tests/unit/test_live_collection.py`:
- Around line 9-17: Add a module-level pytest marker to this test file by
defining pytestmark = pytest.mark.unit at the top of
tests/unit/test_live_collection.py (the file already imports pytest), so the
module is consistently selectable in CI; place the assignment near the existing
imports (which include LiveCollectionConfig, LiveEndpointConfig,
load_input_rows, load_live_collection_config, run_live_collection) so no
per-test decorators are required.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f52b2cd3-53e2-4ef6-b051-6e1254a263f0

📥 Commits

Reviewing files that changed from the base of the PR and between 3ca8447 and 64c8838.

📒 Files selected for processing (3)
  • src/evalhub/adapter/__init__.py
  • src/evalhub/adapter/live_collection.py
  • tests/unit/test_live_collection.py

Comment thread src/evalhub/adapter/live_collection.py Outdated
Comment thread src/evalhub/adapter/live_collection.py
Comment thread src/evalhub/adapter/live_collection.py Outdated
Comment thread tests/unit/test_live_collection.py
@ai-hustle-bro

Copy link
Copy Markdown
Author

Addressed the CodeRabbit review in aa6d48e:

  • fail fast when api_key_env is configured but missing or blank
  • reject identical input_path/output_path in LiveCollectionConfig
  • treat non-string question values as missing_question without calling the endpoint
  • add module-level pytestmark = pytest.mark.unit
  • update the PR body to match the project template

Validated locally with ruff format/check, pytest for tests/unit/test_live_collection.py (13 passed), mypy on the changed files, git diff --check, and a Claude Code blocker-only review of the follow-up diff.

@ruivieira ruivieira self-assigned this Jun 4, 2026
@ruivieira ruivieira added the kind/feat Categorizes issue as a feature request label Jun 4, 2026
@ruivieira ruivieira added the ready-for-review Coderabbit comments have been addressed and the PR is ready to be reviewed label Jun 4, 2026
@ai-hustle-bro

Copy link
Copy Markdown
Author

Closing this older implementation in favor of #139, which carries the current live collection helper work, follow-up fixes, passing checks, and the cleaner API shape for issue #135.

@github-project-automation github-project-automation Bot moved this from Todo to Done in EvalHub Planning Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feat Categorizes issue as a feature request ready-for-review Coderabbit comments have been addressed and the PR is ready to be reviewed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants