Skip to content

Add per-role prompt framing optimization tasks for CLI benchmarks #34

@never2average

Description

@never2average

Goal

Add benchmark tasks that help evaluate and improve prompt framing for each supported agent role in CLI-only benchmark runs.

This issue is about making role prompts easier to compare and tune. It should not add TUI behavior.

Scope

  • Identify the agent roles that should participate in CLI benchmark runs.
  • Define a small set of framing checks per role, such as instruction clarity, task boundary handling, tool-use restraint, output format consistency, and refusal or approval behavior where relevant.
  • Add benchmark task metadata that records which role prompt behavior is being exercised.
  • Produce result fields that make per-role prompt comparisons easier to inspect after a CLI benchmark run.

Suggested first pass

  1. Search for the existing role definitions or role-selection logic in the Rust workspace.
  2. Pick 2-3 roles for the initial benchmark framing pass.
  3. Add task metadata that states the intended role behavior for each task.
  4. Document how maintainers should use benchmark results to tune role prompts.

Acceptance criteria

  • CLI benchmark tasks can be grouped or filtered by agent role.
  • Each seed framing task states what role behavior it is testing.
  • Benchmark output includes enough metadata to compare role prompt behavior across runs.
  • Documentation explains that this is a prompt-optimization aid, not a production prompt auto-tuner.
  • No TUI changes are required.

Non-goals

  • No automatic prompt rewriting.
  • No hidden eval service.
  • No changes that weaken sandboxing or approval behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    benchmarkBenchmark harnesses, datasets, and evaluation taskscliThinWedge CLI issueenhancementNew feature or requestgood first issueGood for newcomers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions