Skip to content

feat: add EvaluationPlugin for agent invocation evaluation and retry#5

Open
afarntrog wants to merge 4 commits intomainfrom
wip/evaluation-plugin
Open

feat: add EvaluationPlugin for agent invocation evaluation and retry#5
afarntrog wants to merge 4 commits intomainfrom
wip/evaluation-plugin

Conversation

@afarntrog
Copy link
Copy Markdown
Owner

Introduce an EvaluationPlugin that hooks into agent invocations to evaluate outputs against expected results and automatically retries with improved system prompts on failure.

  • Add EvaluationPlugin class that wraps agent call to intercept invocations, run evaluators, and retry with LLM-suggested prompt improvements when evaluations fail
  • Add improvement suggestion prompt template for generating better system prompts based on evaluation feedback
  • Add comprehensive test suite covering plugin initialization, wrapping, evaluation execution, retry logic, and edge cases
  • Update ruff config to ignore line-length in plugin prompt templates

Description

Related Issues

Documentation PR

Type of Change

Bug fix
New feature
Breaking change
Documentation update
Other (please describe):

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Introduce an EvaluationPlugin that hooks into agent invocations to
evaluate outputs against expected results and automatically retries
with improved system prompts on failure.

- Add EvaluationPlugin class that wraps agent __call__ to intercept
  invocations, run evaluators, and retry with LLM-suggested prompt
  improvements when evaluations fail
- Add improvement suggestion prompt template for generating better
  system prompts based on evaluation feedback
- Add comprehensive test suite covering plugin initialization, wrapping,
  evaluation execution, retry logic, and edge cases
- Update ruff config to ignore line-length in plugin prompt templates
- Add comprehensive docstrings to all methods in EvaluationPlugin
- Change _suggest_improvements to return ImprovementSuggestion instead of str
- Add debug logging with reasoning when applying improved system prompt
- Replace Union[X, Y] with modern X | Y syntax
- Use dict default in kwargs.get() instead of
- Export new  module from the package's public API
- Bump minimum strands-agents dependency from 1.0.0 to 1.28.0
  to support functionality required by the plugins module
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant