Skip to content

Add prompt-learning skill (RES-397)#2

Merged
currentlycodinng merged 6 commits into
mainfrom
amina/res-397-implement-prompt-learning-in-orq-skills
Mar 12, 2026
Merged

Add prompt-learning skill (RES-397)#2
currentlycodinng merged 6 commits into
mainfrom
amina/res-397-implement-prompt-learning-in-orq-skills

Conversation

@currentlycodinng
Copy link
Copy Markdown
Collaborator

Summary

  • New prompt-learning skill: automated feedback-driven prompt improvement via "If [TRIGGER] then [ACTION]" rules
  • Full 6-phase pipeline: Collect → Normalize → Meta-Prompt → Aggregate → Apply → Validate
  • Incorporates all RES-205 research findings: domain gating, multi-judge validation, model-specific configs, split model strategy
  • Bidirectional companion skill references with feedback-loop, optimize-prompt, run-experiment, build-evaluator, trace-analysis
  • Validation script (scripts/validate_prompt_learning.py) with 5 structural + research alignment checks

Key RES-205 findings implemented

  • Domain gating: only works on focused domains (0/70 significant on broad tasks)
  • Multi-judge validation: single-judge overestimates by 40-60%, require 3+ diverse judges
  • Defaults: F=10, P=0, iterations=1, rule cap 10
  • Model tier: small models learn best (+40%), top-tier at ceiling (-3% to -10%)
  • Split model strategy: cheap learner + powerful generator (+20% win rate)
  • Freetext feedback: +26.7% vs +6.7% categorical
  • Model-specific configs: Claude/Gemini/GPT each have different optimal F/P/iterations

Still pending

  • Dataset scaling guidance (pending testing)
  • Experiment 6 (Multi-Generator) results from RES-205

Test plan

  • scripts/validate_prompt_learning.py — all 5 checks pass
  • ./scripts/publish.sh --check — generated artifacts up to date
  • All companion skills reference prompt-learning back
  • Manual testing of skill execution with real feedback data

🤖 Generated with Claude Code

currentlycodinng and others added 6 commits March 10, 2026 13:55
Implements the prompt-learning skill (RES-397) which automatically improves
prompts by collecting feedback, normalizing it, generating "If [TRIGGER] then
[ACTION]" rules via a meta-prompt, and validating with A/B experiments.

- Create SKILL.md with 6-phase pipeline (Collect → Normalize → Meta-Prompt →
  Aggregate → Apply → Validate) and research-validated defaults (f=10, p=3)
- Create resources/meta-prompt.md (v2 batch-mode template with issue taxonomy,
  worked example, and structured output format)
- Add bidirectional companion skill references (feedback-loop, optimize-prompt,
  run-experiment, build-evaluator, trace-analysis)
- Add validation script for structural/reference integrity checks
- Regenerate artifacts via publish.sh (AGENTS.md, README.md, plugins)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dation

- Remove duplicated Issue Taxonomy table from SKILL.md, reference
  meta-prompt.md as single source of truth instead
- Add missing "Prompt versioning" documentation link
- Mark defaults table as preliminary pending RES-205 results
- Move Template Usage Notes out of meta-prompt.md (not seen by LLM)
- Add cross-file consistency check to validation script (taxonomy tags,
  output sections, LEARNED_RULES references)
- Fix false positive in taxonomy tag regex matching

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Per Anthropic docs, resource files are supplemental content that Claude
may skip loading. The meta-prompt is the core mechanism of this skill —
it must always be available. Inlined it directly into Phase 3 Step 7.

- Move meta-prompt template from resources/meta-prompt.md into SKILL.md
- Delete resources/ directory (no longer needed)
- Restore Issue Taxonomy table inline (single source of truth)
- Rewrite validation script for inline meta-prompt checks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The meta-prompt is instructions for Claude to follow as part of the
skill, not a prompt to send to an external LLM. Updated wording to
make this clear.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major changes based on completed research experiments:

- Domain gating: focused domains only (0% success on broad tasks)
- P=0 default: positives not required, P=0 outperforms P=3-5
- Iterations: 1 optimal (2 max), more causes prompt bloat
- Multi-judge validation mandatory: single-judge overestimates by 40-60%
- Model ceiling warning: top-tier models (>4.5/5) show no improvement
- Reference comparison anti-pattern: CriSPO-style refs make results worse
- Simplified meta-prompt: removed anchor check step (P=0), added
  expected_behavior to normalized representation, added severity mapping
- Added "When NOT to use" section
- Updated defaults table, anti-patterns, and validation script
- Removed "preliminary/pending" language — results are final

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ck guidance

RES-205 findings: model-family-specific defaults (Claude/Gemini/GPT/Other),
split model strategy (+20% vs +13%), freetext > categorical (+26.7% vs +6.7%).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear Bot commented Mar 11, 2026

Copy link
Copy Markdown
Collaborator

@thedevtoni thedevtoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@currentlycodinng currentlycodinng merged commit 4f6b063 into main Mar 12, 2026
1 check passed
arianpasquali added a commit that referenced this pull request Apr 2, 2026
- Remove invalid `metadata` field from memory creation examples (issues #1)
- Fix memory document field name from `content` to `text` (issue #2)
- Add required `path` field to tool creation example (issue #3)
- Fix KB search param from `limit` to `top_k` (issue #4)
- Correct MCP URL to `https://my.orq.ai/v2/mcp` in run-experiment and
  generate-synthetic-dataset (issue #5)
- Replace non-existent `setup-observability` with `analyze-trace-failures`
  in compare-agents companion skills (issue #6)
- Update stale TOC entry in knowledge-base-management (issue #7)
- Add missing `path` and `type` to KB creation in run-experiment (issue #8)
- Add explicit `-X POST` to KB search curl commands (issue #9)
- Fix "two things" wording when listing three items (issue #10)
- Standardize agent model format to object style (issue #11)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants