Add per-role prompt framing optimization tasks for CLI benchmarks

## Goal

Add benchmark tasks that help evaluate and improve prompt framing for each supported agent role in CLI-only benchmark runs.

This issue is about making role prompts easier to compare and tune. It should not add TUI behavior.

## Scope

- Identify the agent roles that should participate in CLI benchmark runs.
- Define a small set of framing checks per role, such as instruction clarity, task boundary handling, tool-use restraint, output format consistency, and refusal or approval behavior where relevant.
- Add benchmark task metadata that records which role prompt behavior is being exercised.
- Produce result fields that make per-role prompt comparisons easier to inspect after a CLI benchmark run.

## Suggested first pass

1. Search for the existing role definitions or role-selection logic in the Rust workspace.
2. Pick 2-3 roles for the initial benchmark framing pass.
3. Add task metadata that states the intended role behavior for each task.
4. Document how maintainers should use benchmark results to tune role prompts.

## Acceptance criteria

- CLI benchmark tasks can be grouped or filtered by agent role.
- Each seed framing task states what role behavior it is testing.
- Benchmark output includes enough metadata to compare role prompt behavior across runs.
- Documentation explains that this is a prompt-optimization aid, not a production prompt auto-tuner.
- No TUI changes are required.

## Non-goals

- No automatic prompt rewriting.
- No hidden eval service.
- No changes that weaken sandboxing or approval behavior.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-role prompt framing optimization tasks for CLI benchmarks #34

Goal

Scope

Suggested first pass

Acceptance criteria

Non-goals

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add per-role prompt framing optimization tasks for CLI benchmarks #34

Description

Goal

Scope

Suggested first pass

Acceptance criteria

Non-goals

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions