Add `model-router` skill for cost-aware LLM selection by juanmichelini · Pull Request #268 · OpenHands/extensions

juanmichelini · 2026-05-26T17:34:18Z

Summary

Adds a small new skill, model-router, that recommends the most cost-efficient LLM for a given task category, using benchmark data from the public OpenHands Index.

Inspired by the user observation that "Gemini is better for research cost-wise, Opus 4.7 for planning/analysis, DeepSeek Reasoner for heavy repetitive lifting, etc." This skill encodes those tradeoffs as a reusable lookup table the agent can consult when picking a model, configuring a sub-agent, or delegating a cloud conversation.

What the skill provides

For each OpenHands Index category, it surfaces a cost pick (best score-per-dollar on the Pareto frontier), a balanced pick, and a premium pick:

Task type	Cost pick	Balanced	Premium
Research / Information gathering	Gemini-3.1-Pro ($0.12, 76.4)	claude-opus-4-6 ($0.44, 80.0)	GPT-5.5 ($0.74, 86.1)
Bug fixing / Issue resolution	Minimax-2.7 ($0.18, 75.6)	claude-opus-4-6 ($0.77, 76.8)	GPT-5.5 ($1.52, 78.2)
Planning / Greenfield	GPT-5.4 ($4.04, 56.2)	claude-opus-4-7 ($5.69, 56.2)	claude-opus-4-7 ($5.69, 56.2)
Frontend / UI	Gemini-3.1-Pro ($1.24, 44.1)	claude-opus-4-6 ($2.37, 41.8)	claude-opus-4-7 ($2.83, 48.5)
Testing	Minimax-2.7 ($0.13, 69.1)	claude-opus-4-6 ($0.43, 78.8)	GPT-5.5 ($0.92, 83.4)
Bulk repetitive work	DeepSeek-V3.2-Reasoner ($0.57)	Minimax-2.7 ($0.13-0.18)	n/a (escalate)

Numbers are average cost per problem (USD) and aggregate score from index.openhands.dev as of the May 2026 snapshot.

The skill also includes:

Heuristics for when to escalate from cost pick to balanced/premium.
An example mixed pipeline (research with Gemini, plan with Opus, implement with Minimax/Opus, test with Opus, repetitive cleanup with DeepSeek).
Caveats about Pareto frontier vs. headline score, benchmark age, one-shot vs. multi-turn, open vs. closed weights, and greenfield being expensive everywhere.
Direct links back to each category page on https://index.openhands.dev so users can verify or re-derive the picks.

Files

skills/model-router/SKILL.md - skill body with progressive-disclosure description, decision table, and heuristics.
skills/model-router/README.md - human-facing notes.
marketplaces/openhands-extensions.json - new catalog entry under productivity.
README.md - auto-regenerated catalog section (via scripts/sync_extensions.py).

Validation

python scripts/sync_extensions.py --check -> All extensions in sync. ✓
pytest tests/test_catalogs.py tests/test_skills_have_readme.py tests/test_sync_extensions.py -> 38 passed.

Triggers

Keyword triggers only (no slash command, since this is reference content rather than a workflow):

which model, model selection, pick a model, model router, cost efficient model, cheapest model, best model for.

This pull request was created by an AI agent (OpenHands) on behalf of the user.

@juanmichelini can click here to continue refining the PR

Adds a small skill that maps task categories (research, bug fixing, planning, frontend, testing, bulk repetitive work) to the most cost-efficient LLM according to the public OpenHands Index benchmark (https://index.openhands.dev). For each category the skill recommends a cost pick (best score-per- dollar on the Pareto frontier), a balanced pick, and a premium pick, along with usage heuristics and links to the per-category leaderboard. Co-authored-by: openhands <openhands@all-hands.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `model-router` skill for cost-aware LLM selection#268

Add `model-router` skill for cost-aware LLM selection#268
juanmichelini wants to merge 1 commit into
mainfrom
add-model-router-skill

juanmichelini commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

juanmichelini commented May 26, 2026

Summary

What the skill provides

Files

Validation

Triggers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants