Add prompt-learning skill (RES-397) by currentlycodinng · Pull Request #2 · orq-ai/assistant-plugins

currentlycodinng · 2026-03-11T09:43:07Z

Summary

New prompt-learning skill: automated feedback-driven prompt improvement via "If [TRIGGER] then [ACTION]" rules
Full 6-phase pipeline: Collect → Normalize → Meta-Prompt → Aggregate → Apply → Validate
Incorporates all RES-205 research findings: domain gating, multi-judge validation, model-specific configs, split model strategy
Bidirectional companion skill references with feedback-loop, optimize-prompt, run-experiment, build-evaluator, trace-analysis
Validation script (scripts/validate_prompt_learning.py) with 5 structural + research alignment checks

Key RES-205 findings implemented

Domain gating: only works on focused domains (0/70 significant on broad tasks)
Multi-judge validation: single-judge overestimates by 40-60%, require 3+ diverse judges
Defaults: F=10, P=0, iterations=1, rule cap 10
Model tier: small models learn best (+40%), top-tier at ceiling (-3% to -10%)
Split model strategy: cheap learner + powerful generator (+20% win rate)
Freetext feedback: +26.7% vs +6.7% categorical
Model-specific configs: Claude/Gemini/GPT each have different optimal F/P/iterations

Still pending

Dataset scaling guidance (pending testing)
Experiment 6 (Multi-Generator) results from RES-205

Test plan

scripts/validate_prompt_learning.py — all 5 checks pass
./scripts/publish.sh --check — generated artifacts up to date
All companion skills reference prompt-learning back
Manual testing of skill execution with real feedback data

🤖 Generated with Claude Code

Implements the prompt-learning skill (RES-397) which automatically improves prompts by collecting feedback, normalizing it, generating "If [TRIGGER] then [ACTION]" rules via a meta-prompt, and validating with A/B experiments. - Create SKILL.md with 6-phase pipeline (Collect → Normalize → Meta-Prompt → Aggregate → Apply → Validate) and research-validated defaults (f=10, p=3) - Create resources/meta-prompt.md (v2 batch-mode template with issue taxonomy, worked example, and structured output format) - Add bidirectional companion skill references (feedback-loop, optimize-prompt, run-experiment, build-evaluator, trace-analysis) - Add validation script for structural/reference integrity checks - Regenerate artifacts via publish.sh (AGENTS.md, README.md, plugins) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…dation - Remove duplicated Issue Taxonomy table from SKILL.md, reference meta-prompt.md as single source of truth instead - Add missing "Prompt versioning" documentation link - Mark defaults table as preliminary pending RES-205 results - Move Template Usage Notes out of meta-prompt.md (not seen by LLM) - Add cross-file consistency check to validation script (taxonomy tags, output sections, LEARNED_RULES references) - Fix false positive in taxonomy tag regex matching Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Per Anthropic docs, resource files are supplemental content that Claude may skip loading. The meta-prompt is the core mechanism of this skill — it must always be available. Inlined it directly into Phase 3 Step 7. - Move meta-prompt template from resources/meta-prompt.md into SKILL.md - Delete resources/ directory (no longer needed) - Restore Issue Taxonomy table inline (single source of truth) - Rewrite validation script for inline meta-prompt checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The meta-prompt is instructions for Claude to follow as part of the skill, not a prompt to send to an external LLM. Updated wording to make this clear. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Major changes based on completed research experiments: - Domain gating: focused domains only (0% success on broad tasks) - P=0 default: positives not required, P=0 outperforms P=3-5 - Iterations: 1 optimal (2 max), more causes prompt bloat - Multi-judge validation mandatory: single-judge overestimates by 40-60% - Model ceiling warning: top-tier models (>4.5/5) show no improvement - Reference comparison anti-pattern: CriSPO-style refs make results worse - Simplified meta-prompt: removed anchor check step (P=0), added expected_behavior to normalized representation, added severity mapping - Added "When NOT to use" section - Updated defaults table, anti-patterns, and validation script - Removed "preliminary/pending" language — results are final Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ck guidance RES-205 findings: model-family-specific defaults (Claude/Gemini/GPT/Other), split model strategy (+20% vs +13%), freetext > categorical (+26.7% vs +6.7%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

linear · 2026-03-11T09:43:11Z

RES-397 Implement prompt learning in orq skills

Relevant:

thedevtoni

LGTM

- Remove invalid `metadata` field from memory creation examples (issues #1) - Fix memory document field name from `content` to `text` (issue #2) - Add required `path` field to tool creation example (issue #3) - Fix KB search param from `limit` to `top_k` (issue #4) - Correct MCP URL to `https://my.orq.ai/v2/mcp` in run-experiment and generate-synthetic-dataset (issue #5) - Replace non-existent `setup-observability` with `analyze-trace-failures` in compare-agents companion skills (issue #6) - Update stale TOC entry in knowledge-base-management (issue #7) - Add missing `path` and `type` to KB creation in run-experiment (issue #8) - Add explicit `-X POST` to KB search curl commands (issue #9) - Fix "two things" wording when listing three items (issue #10) - Standardize agent model format to object style (issue #11) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

currentlycodinng and others added 6 commits March 10, 2026 13:55

Clarify meta-prompt execution: Claude follows the process directly

bee253a

The meta-prompt is instructions for Claude to follow as part of the skill, not a prompt to send to an external LLM. Updated wording to make this clear. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

currentlycodinng requested a review from arianpasquali March 11, 2026 09:43

currentlycodinng self-assigned this Mar 11, 2026

thedevtoni approved these changes Mar 11, 2026

View reviewed changes

currentlycodinng merged commit 4f6b063 into main Mar 12, 2026
1 check passed

Baukebrenninkmeijer mentioned this pull request Mar 27, 2026

feat: add compare-agents skill for cross-framework evaluation #13

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt-learning skill (RES-397)#2

Add prompt-learning skill (RES-397)#2
currentlycodinng merged 6 commits into
mainfrom
amina/res-397-implement-prompt-learning-in-orq-skills

currentlycodinng commented Mar 11, 2026

Uh oh!

linear Bot commented Mar 11, 2026

Uh oh!

thedevtoni left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

currentlycodinng commented Mar 11, 2026

Summary

Key RES-205 findings implemented

Still pending

Test plan

Uh oh!

linear Bot commented Mar 11, 2026

Uh oh!

thedevtoni left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thedevtoni left a comment •

edited

Loading