Agent factory by afarntrog · Pull Request #10 · afarntrog/evals

afarntrog · 2026-04-13T21:24:04Z

Description

Related Issues

Documentation PR

Type of Change

Bug fix
New feature
Breaking change
Documentation update
Other (please describe):

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Introduce adapter that wraps a no-arg agent factory into a task callable, automatically handling telemetry setup, span collection, and session mapping. Add as an alternative to in and , with mutual exclusivity validation, so users no longer need to manually wire up in-memory exporters and mappers.

…k classes Replace the `create_agent_task` factory function with two class-based task adapters: `AgentTask` for simple agent invocations and `TracedAgentTask` for invocations that also collect OpenTelemetry spans for trajectory evaluation. Additionally, simplify `Experiment.run_evaluations` by removing the `agent_factory` parameter and making `task` required, eliminating ambiguous dual-path configuration. Users now pass an `AgentTask` or `TracedAgentTask` instance directly as the task argument.

…AgentTask Simplify AgentTask and TracedAgentTask API by accepting **agent_kwargs instead of requiring a lambda/callable factory. The classes now directly construct strands.Agent instances internally via a _create_agent method, reducing boilerplate for users (e.g., `AgentTask(model="...")` instead of `AgentTask(lambda: Agent(model="..."))`). Updated tests to verify kwargs forwarding and fresh agent creation per call.

Introduce `EvalTaskHandler` and `TracedHandler` classes along with an `@eval_task` decorator to standardize how task functions are wrapped with evaluation behavior. `EvalTaskHandler` normalizes return values to dicts, while `TracedHandler` adds OpenTelemetry span collection and session mapping for trajectory-based evaluators. Includes public exports and comprehensive tests.

Allow eval_task-decorated functions to either accept a Case argument or take no arguments. When the function returns an Agent instance, the decorator auto-invokes it with case.input and stringifies the result. This simplifies the common pattern where tasks just need to construct and run an agent.

- Add `eval_task` to `__init__.py` imports and `__all__` for direct access - Document TracedHandler concurrency limitations (sequential only or max_workers=1 due to shared span exporter) - Minor test formatting cleanup

@patch

Rename eval_task.py to eval_task_handler.py so the module name no longer collides with the eval_task function re-exported in __init__.py. This was causing @patch targets in TestTracedHandler to resolve to the function instead of the module, breaking all 5 TracedHandler tests with AttributeError.

afarntrog added 3 commits April 13, 2026 15:25

afarntrog had a problem deploying to auto-approve April 13, 2026 21:24 — with GitHub Actions Failure

afarntrog had a problem deploying to auto-approve April 14, 2026 19:38 — with GitHub Actions Failure

afarntrog had a problem deploying to auto-approve April 14, 2026 20:38 — with GitHub Actions Failure

feat: export eval_task decorator from package public API

384f830

- Add `eval_task` to `__init__.py` imports and `__all__` for direct access - Document TracedHandler concurrency limitations (sequential only or max_workers=1 due to shared span exporter) - Minor test formatting cleanup

afarntrog had a problem deploying to auto-approve April 15, 2026 15:20 — with GitHub Actions Failure

afarntrog had a problem deploying to auto-approve April 21, 2026 17:19 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent factory#10

Agent factory#10
afarntrog wants to merge 7 commits intomainfrom
agent_factory

afarntrog commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

afarntrog commented Apr 13, 2026

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant