Skip to content

quantsquirrel/claude-forge-smith

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

English | ν•œκ΅­μ–΄

Forge forge

βš”οΈ Forge your skills into legendary weapons

Version Tests License Stars

TDD-powered automatic skill evolution for Claude Code

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸ”₯ The Forging Process

Every legendary weapon starts as raw material. Through heat, strikes, and tempering, ordinary metal becomes extraordinary.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#2D1810',
  'primaryTextColor': '#FFD700',
  'primaryBorderColor': '#FF6B00',
  'lineColor': '#FFB800',
  'secondaryColor': '#1A0A00',
  'tertiaryColor': '#1A0A00'
}}}%%
graph LR
    A["βš™οΈ RAW<br/>SKILL"] -->|"πŸ”₯ HEAT"| B["πŸ” ANALYZE<br/>Structure"]
    B -->|"πŸ”¨ STRIKE"| C["⚑ EVOLVE<br/>Refine"]
    C -->|"πŸ’§ TEMPER"| D["βœ… VERIFY<br/>Tests"]
    D -->|"βš”οΈ"| E["✨ LEGENDARY"]

    style A fill:#2D1810,stroke:#A0A0A0,stroke-width:2px,color:#A0A0A0
    style B fill:#1A0A00,stroke:#FF6B00,stroke-width:3px,color:#FFB800
    style C fill:#1A0A00,stroke:#FFB800,stroke-width:3px,color:#FFD700
    style D fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style E fill:#FFD700,stroke:#FFD700,color:#1A0A00,stroke-width:4px
Loading

The Forge never rests β€” Each skill is heated in analysis, struck with improvements, tempered by tests, and emerges stronger.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸ“‹ Prerequisites

Before firing up the forge, ensure you have the required tools:

Requirement Version Check
Bash 4.0+ bash --version
Git 2.0+ git --version
Python 3 3.6+ python3 --version
bc any which bc
jq 1.6+ jq --version
Claude Code CLI latest claude --version

Environment Variables

Variable Default Description
CLAUDE_PLUGIN_ROOT (your plugin install directory) Plugin installation path
FORGE_EVALUATOR_CMD (built-in) Custom evaluator script path

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚑ Quick Start

# Install the forge
git clone https://github.com/quantsquirrel/claude-forge-smith.git \
  "$CLAUDE_PLUGIN_ROOT"

# Ignite the flames
/forge:forge --scan

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸ’Ž Features

πŸ”¨ Forged in Fire ⚑ Auto Evolution πŸ›‘οΈ Safe Trials πŸ“Š Triple Strike
Every change tested 3Γ— evaluation consensus Original preserved 95% CI validation

πŸ”€ Dual Forging Paths (v1.0)

Skills can be forged through two methods depending on material quality:

Path Condition Technique
βš”οΈ TDD Forge Test files exist Statistical validation (95% CI)
πŸ”₯ Pattern Forge No tests Usage patterns + heuristic analysis
# Check forging method
source hooks/lib/storage-local.sh
get_upgrade_mode "my-skill"  # Returns: TDD_FIT or HEURISTIC

πŸ“Š Forge Monitor (v1.0)

Track your weapons and see which need reforging:

/monitor [--priority=HIGH|MED|LOW] [--type=explicit|silent|all]

Output:

╔══════════════════════════════════════════════════════════════════════╗
β•‘                      πŸ”₯ Forge Monitor                                  β•‘
╠══════════════════════════════════════════════════════════════════════╣
β•‘ Quality Analysis (ν’ˆμ§ˆ 기반 - μ‚¬μš©λŸ‰κ³Ό 무관)                          β•‘
╠════════════════════════╀══════════╀═══════╀══════════╀═══════════════╣
β•‘ Skill                  β”‚ Type     β”‚ Score β”‚ Grade    β”‚ Priority      β•‘
╠════════════════════════β•ͺ══════════β•ͺ═══════β•ͺ══════════β•ͺ═══════════════╣
β•‘ omc:git-master         β”‚ silent   β”‚   45  β”‚ C        β”‚ [HIGH] ⚑     β•‘
β•‘ forge:forge      β”‚ explicit β”‚   90  β”‚ A        β”‚ [READY] βœ“     β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

βš”οΈ Skill Type Detection (v1.0)

Skills are classified by how they're invoked:

Type Description Quality Criteria
explicit User invokes with /command argument-hint, mode docs, examples
silent Auto-triggered by context trigger keywords, when-to-use, red-flags
# Check skill type
source hooks/lib/storage-local.sh
get_skill_type "my-skill"  # Returns: explicit | silent

πŸ“ˆ Quality-Based Recommendations (v1.0)

Core Principle: Usage β‰  Quality

The forge evaluates skills by structure, not popularity:

Priority Score Action
HIGH < 40 Immediate reforging needed
MED 40-59 Improvement recommended
LOW 60-79 Optional enhancement
READY β‰₯ 80 Quality assured
# Get quality score
get_skill_quality_score "my-skill"
# Returns: JSON with score, breakdown, grade (A/B/C/D)

πŸŽ–οΈ Legendary Grades (v1.0)

Exceptional weapons earn special marks:

Enhancement Bonus Forged When
Reforged +1 upgraded: true
Efficient +0.5 tokens/usage < 1500
Hot Streak +0.5 positive trend
Tested +0.5 has test files

S + Reforged + Efficient = β˜…β˜…β˜… SSS LEGENDARY

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸ›‘οΈ Trial Branch β€” The Safe Anvil

Master smiths never work directly on the masterpiece. They test on trial pieces first.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#2D1810',
  'primaryTextColor': '#FFD700',
  'primaryBorderColor': '#FF6B00',
  'lineColor': '#FFB800',
  'secondaryColor': '#1A0A00',
  'tertiaryColor': '#1A0A00'
}}}%%
flowchart TB
    subgraph MAIN["βš”οΈ main (Master Weapon)"]
        direction LR
        C1["v0.6<br/>71pts"]
        C2["v0.7<br/>90pts"]
        C1 -.-> C2
    end

    subgraph TRIAL["πŸ”₯ trial/skill-name (Testing Anvil)"]
        direction LR
        T1["πŸ”¨ Strike"]
        T2["πŸ”¨ Strike"]
        T3["πŸ”¨ Strike"]
        T4{"Worthy?"}
        T1 --> T2 --> T3 --> T4
    end

    C1 -->|"fork"| T1
    T4 -->|"βœ… Stronger"| C2
    T4 -->|"❌ Brittle"| D["πŸ—‘οΈ Discard"]

    style C1 fill:#2D1810,stroke:#FFD700,stroke-width:2px,color:#FFD700
    style C2 fill:#FFD700,stroke:#FFD700,color:#1A0A00,stroke-width:3px
    style T1 fill:#1A0A00,stroke:#FF6B00,stroke-width:2px,color:#FFB800
    style T2 fill:#1A0A00,stroke:#FF6B00,stroke-width:2px,color:#FFB800
    style T3 fill:#1A0A00,stroke:#FF6B00,stroke-width:2px,color:#FFB800
    style T4 fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style D fill:#1A0A00,stroke:#A0A0A0,stroke-width:1px,color:#A0A0A0
Loading

Safety First β€” The master weapon (main) is never touched until the trial proves worthy. Failed experiments are discarded, not merged.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸ”¨ Triple Strike β€” The Smith's Consensus

A single hammer blow can deceive. Three strikes reveal the truth.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#2D1810',
  'primaryTextColor': '#FFD700',
  'primaryBorderColor': '#FF6B00',
  'lineColor': '#FFB800',
  'secondaryColor': '#1A0A00',
  'tertiaryColor': '#1A0A00'
}}}%%
flowchart LR
    subgraph STRIKE["πŸ”¨ Triple Strike Evaluation"]
        direction TB
        S1["πŸ”¨ Smith 1<br/>Score: 78"]
        S2["πŸ”¨ Smith 2<br/>Score: 81"]
        S3["πŸ”¨ Smith 3<br/>Score: 79"]
    end

    subgraph MEASURE["βš–οΈ Measure Quality"]
        direction TB
        M1["Mean: 79.3"]
        M2["95% Confidence"]
    end

    subgraph VERDICT["βš”οΈ Final Verdict"]
        V1{"Stronger than<br/>before?"}
        V1 -->|"YES"| ACCEPT["βœ… REFORGE"]
        V1 -->|"NO"| REJECT["❌ DISCARD"]
    end

    STRIKE --> MEASURE --> VERDICT

    style S1 fill:#1A0A00,stroke:#FFB800,stroke-width:2px,color:#FFD700
    style S2 fill:#1A0A00,stroke:#FFB800,stroke-width:2px,color:#FFD700
    style S3 fill:#1A0A00,stroke:#FFB800,stroke-width:2px,color:#FFD700
    style M1 fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style M2 fill:#2D1810,stroke:#FF6B00,stroke-width:2px,color:#FFD700
    style ACCEPT fill:#FFD700,stroke:#FFD700,color:#1A0A00,stroke-width:3px
    style REJECT fill:#1A0A00,stroke:#A0A0A0,stroke-width:1px,color:#A0A0A0
Loading

Statistical Consensus β€” Three independent evaluations. Statistical confidence intervals. Only merge if the new version is provably superior.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸ“Š Forging Results

Before: 71 points β€” Raw, unrefined After: 90.33 points β€” Tempered, legendary

+27% improvement β€” Forge reforged itself

The ultimate test: A tool that improves itself through its own process.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸ”’ Safety Mechanisms

Master smiths build in multiple safeguards:

Safeguard Protection
πŸ”„ Rollback Ready Original always preserved
πŸ”’ Isolated Trials Test in separate branch
πŸ“ Full Logs Every strike recorded
⏱️ Iteration Limit Maximum 6 attempts
βœ… Test Verification All tests must pass

No weapon leaves the forge untested. No master version is ever corrupted.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸš€ Commands

Command Action
/forge:forge --scan πŸ” Scout for skills ready to reforge
/forge:forge <skill> ⚑ Reforge a specific skill
/forge:forge --history πŸ“œ View forging chronicles
/forge:forge --watch πŸ‘οΈ Monitor the forge
/forge:monitor πŸ“Š Quality dashboard
/forge:smelt πŸ”₯ Skill creation with TDD methodology

πŸ’‘ Argument Hints (v1.0)

When typing a slash command, you'll see available modes:

/forge <skill-name> [--precision=high|-n5] - modes: TDD_FIT|HEURISTIC
/monitor [--priority=HIGH|MED|LOW] [--type=explicit|silent|all]

Add argument-hint to your skill's frontmatter to enable this feature.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

βš™οΈ Configuration

Forge behavior can be customized via config/settings.env:

Setting Default Description
STORAGE_MODE local Storage backend (currently only local supported)
LOCAL_STORAGE_DIR ~/.claude/.skill-evaluator Local storage directory for skill data
SKILL_EVAL_DEBUG false Enable debug logging to stderr

Example:

# Enable debug mode
export SKILL_EVAL_DEBUG=true

# Use custom storage location
export LOCAL_STORAGE_DIR="$HOME/.my-forge-data"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸ”§ Troubleshooting

Common Issues

bc: command not found

# macOS
brew install bc

# Ubuntu/Debian
sudo apt-get install bc

# Fedora/RHEL
sudo dnf install bc

jq: command not found

# macOS
brew install jq

# Ubuntu/Debian
sudo apt-get install jq

# Fedora/RHEL
sudo dnf install jq

Permission denied when running commands

# Make scripts executable
cd "$CLAUDE_PLUGIN_ROOT"
chmod +x hooks/*.sh
chmod +x bin/*

Plugin not detected by Claude Code

  1. Check installation path matches CLAUDE_PLUGIN_ROOT
  2. Verify plugin.json exists in the plugin root
  3. Restart Claude Code CLI
  4. Run /help to see if Forge commands appear

Forge evaluations fail silently

# Enable debug logging
export SKILL_EVAL_DEBUG=true

# Check storage directory exists
ls -la ~/.claude/.skill-evaluator

# Verify evaluator script is executable
ls -la "$CLAUDE_PLUGIN_ROOT/bin/skill-evaluator.py"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸ“š The Theory Behind the Forge

GΓΆdel Machines (Schmidhuber 2007) β€” Self-referential systems that can improve their own code Dynamic Adaptation β€” Incremental evolution with statistical validation TDD Safety Boundaries β€” Tests prevent catastrophic self-modification Multi-Evaluator Consensus β€” Multiple independent judges reduce bias

Read the full theory β†’

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Inspired by skill-up

βš’οΈ Forged with Claude Code Β· πŸ”₯ MIT License Β· βš”οΈ v1.0

This project is not affiliated with or endorsed by Anthropic. Claude and Claude Code are trademarks of Anthropic PBC.