This is a technical specification for AI coding tools (Claude Code, ChatGPT Codex, Cursor, etc.) to follow when developing APPs β the pipeline-driven evolution of LLM Skills.
δΈζζζ‘£ | Design Principles
APP is the evolution of a Skill. A Skill is fully driven by an LLM dynamically; an APP codifies most known execution paths into Pipelines (deterministic code), falling back to Skill mode (LLM dynamic orchestration) only for unknown paths.
An APP consists of three parts:
- SKILL.md β Brain mode. Used only when no Pipeline matches. SKILL.md does not serve as the Runner's dispatch logic
- Pipeline β Muscle-memory mode. Deterministic code controls the flow; the LLM is only invoked during
llm-type steps - Pipeline Runner β A standalone program responsible for routing, scheduling, and data passing. The Runner is not an LLM, nor text instructions inside SKILL.md
The routing layer is built automatically by the Pipeline Runner β on startup it scans all pipeline.yaml files, collects description + triggers, and builds a routing table for intent classification.
Inspired by C++ constructors/destructors, the Runner enforces execution of special Pipelines before and after every business Pipeline:
- Constructor Pipeline β Runs before the first step of the business Pipeline. Used for config checks, environment initialization, git pull, etc. If the constructor fails, the business Pipeline does not execute; the Runner returns an error immediately
- Destructor Pipeline β Runs after the last step of the business Pipeline (regardless of success or failure, like
finally). Used for state cleanup, logging, temp file removal
Constructor and destructor Pipelines are a framework-level guarantee β they do not rely on LLM judgment or business steps remembering to call them.
my-app/
βββ SKILL.md # Brain-mode instructions (pure fallback, no Runner logic)
βββ pipelines/
β βββ _constructor/ # Constructor Pipeline (optional, auto-detected by Runner)
β β βββ pipeline.yaml
β β βββ steps/
β β βββ check_config.py
β βββ _destructor/ # Destructor Pipeline (optional, auto-detected by Runner)
β β βββ pipeline.yaml
β β βββ steps/
β β βββ cleanup.py
β βββ <pipeline-name>/ # Business Pipeline
β β βββ pipeline.yaml # Pipeline definition: step declarations + routing info
β β βββ schemas/ # JSON Schemas for LLM step outputs
β β β βββ <step-name>.json
β β βββ steps/ # Step scripts (any language)
β β βββ <step-name>.py
β β βββ <step-name>.sh
β β βββ validate_<step>.py # Content validation script (optional)
β βββ <pipeline-name>/
β βββ ...
βββ tools/ # Shared utilities (used by both Pipelines and Skill mode)
Conventions:
- One directory per Pipeline; directory name = Pipeline name
_constructor/and_destructor/are reserved directory names (prefixed with_); the Runner auto-detects them and excludes them from the routing tablepipeline.yamlis the single source of truth for each Pipeline, containing both step definitions and routing info (description + triggers)schemas/holds JSON Schemas for LLM steps; filenames correspond to step namessteps/holds step scripts in any languagetools/is a shared utility directory, available to both Pipeline and Skill mode- No centralized index file needed β the Runner auto-scans
pipelines/*/pipeline.yamlto build the routing table (skipping_-prefixed directories)
pipeline.yaml is the single source of truth for each Pipeline, defining both execution steps and routing information. The Runner auto-scans all pipelines/*/pipeline.yaml on startup, collecting description + triggers to build the routing table β no manual index file required.
name: code-review
description: "Standard code review: fetch diff, analyze per-file, generate report"
triggers:
- "review this PR"
- "code review"
- "check this diff"
input:
repo: string
pr_number: integer
steps:
- name: fetch_diff
type: code
command: "bash steps/fetch_diff.sh"
- name: analyze_files
type: llm
model: standard
prompt: |
Analyze the following code changes, identify potential issues per file:
{{fetch_diff.output}}
schema: schemas/analyze_files.json
validate: steps/validate_analysis.py
retry: 3
- name: calc_stats
type: code
command: "python3 steps/calc_stats.py"
- name: gen_report
type: code
command: "python3 steps/gen_report.py"
output: gen_reportField Reference:
| Field | Required | Description |
|---|---|---|
name |
Yes | Pipeline name, must match directory name |
description |
Yes | Pipeline description, also used by the routing model for intent matching |
triggers |
No | Example trigger phrases to help the routing model match (reference examples, not exact keywords) |
input |
No | Input parameter definitions (name: type) |
steps |
Yes | Ordered list of steps to execute |
output |
No | Which step's output serves as the Pipeline's final output; defaults to the last step |
Routing mechanism: The Pipeline Runner sends the user's request along with all Pipelines' description + triggers to the routing model (default: lite tier) for intent classification. Match β execute the Pipeline; no match β fall back to Skill mode (SKILL.md dynamic orchestration).
Deterministic code, executed by the Pipeline Runner as a subprocess.
- name: fetch_diff
type: code
command: "bash steps/fetch_diff.sh"| Field | Required | Description |
|---|---|---|
name |
Yes | Step name, unique within the Pipeline |
type |
Yes | Must be code |
command |
Yes | Command to execute; can be a script in any language |
Data Passing Convention:
The Runner passes JSON to the script via stdin:
{
"input": {
"repo": "owner/repo",
"pr_number": 42
},
"steps": {
"fetch_diff": {
"output": { "files": ["..."], "diff": "..." }
}
}
}inputβ The Pipeline's original input parametersstepsβ Outputs from all completed preceding steps
The script returns JSON via stdout:
{
"output": { "..." : "..." }
}The only contract: stdin JSON in, stdout JSON out. The implementation language doesn't matter β Python, Shell, Node.js, Java (java -jar), compiled Go binary β as long as this contract is respected.
Error handling: A non-zero exit code signals failure; the Runner terminates the Pipeline and returns the error.
LLM invocation, handled directly by the Pipeline Runner (no subprocess).
- name: analyze_files
type: llm
model: standard
prompt: |
Analyze the following code changes, identify potential issues per file:
{{fetch_diff.output}}
schema: schemas/analyze_files.json
validate: steps/validate_analysis.py
retry: 3| Field | Required | Description |
|---|---|---|
name |
Yes | Step name |
type |
Yes | Must be llm |
model |
No | Model tier (see Section 5); defaults to standard |
prompt |
Yes | Prompt template; supports {{step_name.output}} to reference preceding step outputs |
schema |
No | JSON Schema file path for structural validation |
validate |
No | Content validation script path |
retry |
No | Max retries on validation failure; defaults to 2 |
Execution flow:
1. Render prompt template (replace {{}} variables with preceding step outputs)
2. Call LLM API (with schema for structured output)
3. Structural validation (JSON Schema)
β Fail β retry with error message
4. Content validation (run validation script, if configured)
β Fail β retry with error message
5. All pass β output result, proceed to next step
6. Retries exhausted β error, terminate Pipeline
Prompt Template Syntax:
{{step_name.output}}β Full output of the specified step (JSON-serialized to string){{step_name.output.field}}β A specific field from the specified step's output{{input.param}}β A Pipeline input parameter
Content validation scripts follow the same stdin/stdout convention as code steps.
The Runner passes via stdin:
{
"output": { "..." : "..." },
"input": { "..." : "..." },
"steps": { "..." : "..." }
}outputβ The current LLM step's output (to be validated)inputβ The Pipeline's original inputstepsβ Preceding step outputs (for cross-validation)
Pass: exit code 0, stdout {"valid": true}
Fail: non-zero exit code, stdout with error details (injected into the retry prompt):
{
"valid": false,
"errors": [
"Analyzed 12 files but the diff only has 8",
"Critical issue #3 lacks a specific description"
]
}Validation scripts are optional. Without a validate field, only JSON Schema structural validation is performed.
APPs do not bind to specific model names. Pipelines declare the required capability level via tiers; the runtime maps tiers to actual models.
| Tier | Purpose | Typical Use Cases |
|---|---|---|
lite |
Lightweight and fast; good at classification and simple extraction | Routing, parameter extraction, format conversion |
standard |
Balanced capability and cost; general purpose | Semantic analysis, content generation, code understanding |
reasoning |
Deep reasoning with extended thinking | Complex judgment, compliance review, security analysis |
Default: LLM steps without a model field default to standard.
Runtime mapping example (not part of the spec; configured by the runtime):
# Runtime config example (not part of the APP spec)
model_mapping:
lite: claude-haiku-4-5
standard: claude-sonnet-4-6
reasoning: claude-opus-4-6The Pipeline Runner is the APP's runtime. It must be a standalone program (not text instructions in SKILL.md for the LLM to follow step-by-step). The LLM is only involved when an llm-type step is reached; the Runner operates autonomously the rest of the time.
- Discovery β Scan
pipelines/*/pipeline.yaml(skip_-prefixed directories), collect all business Pipeline definitions and routing info (description + triggers), build the routing table - Routing β Send the user's request along with all Pipelines' description + triggers to the routing model (default:
litetier) for intent classification. Match β execute the Pipeline; no match β fall back to Skill mode (SKILL.md) - Execution β Execute steps sequentially per
pipeline.yaml: subprocess for code steps, LLM API call for llm steps - Data Passing β Maintain inter-step data context; each step's output is stored and available to subsequent steps
- Validation & Retry β For llm steps: structural validation (JSON Schema) + content validation (script), retry with error messages on failure
The Runner follows this fixed flow for every business Pipeline execution:
1. Execute Constructor Pipeline (_constructor/)
β Failure β abort, do not execute business Pipeline, return constructor error
β Success β continue
2. Execute Business Pipeline
β Success or failure, always proceed to next step
3. Execute Destructor Pipeline (_destructor/)
β Always executes regardless of business Pipeline outcome (like finally)
β Destructor failure does not mask business Pipeline errors; both are returned
Typical constructor uses:
- Check config file completeness
- Initialize environment variables
- Run
git pullto sync latest data - Verify required tools are available
Typical destructor uses:
- Clean up temporary files
- Record execution logs
- Release locks
If the APP has no _constructor/ or _destructor/ directory, the Runner skips the corresponding phase and executes the business Pipeline directly. Both are optional.
If the LLM acts as the Runner (reading execution instructions from SKILL.md and dispatching step-by-step), the following problems arise:
- Each code step adds an extra LLM tool call round-trip (5-10 second latency)
- Data passing depends on the LLM correctly assembling JSON (error-prone)
- Constructor/destructor execution cannot be guaranteed (LLM may skip them)
- Validation/retry mechanisms cannot be reliably implemented
With the Runner as a standalone program, all deterministic logic (routing, step scheduling, data passing, constructor/destructor) is guaranteed by code. The LLM is only invoked when semantic understanding is needed in an llm step.
The Runner's implementation language is not specified, but Python is recommended (mature ecosystem, convenient for LLM API calls and JSON Schema validation).
Given an existing Skill (SKILL.md + tools/), AI coding tools follow this process to convert it into an APP:
- Read SKILL.md to understand the Skill's intent and capabilities
- Read the tools/ directory to understand available tools
- Identify main paths: which execution flows are fixed and repeatedly invoked
- Identify flexible parts: which situations require LLM judgment
For each identified main path:
- Split into steps: which parts are deterministic work (code steps), which need semantic understanding (llm steps)
- Code steps: generate executable scripts following the stdin JSON in / stdout JSON out convention
- LLM steps:
- Write prompt templates
- Define JSON Schemas (structural validation)
- Generate content validation scripts if needed
- Choose appropriate model tier (lite / standard / reasoning)
- Assemble pipeline.yaml β containing step definitions, description, and triggers (routing info and step definitions in one file, no extra index needed)
- The original SKILL.md remains unchanged as the fallback for unmatched requests
- Ensure tools/ directory utilities still work in Skill mode
Input: Output:
SKILL.md β SKILL.md (unchanged)
tools/ β tools/ (unchanged)
pipelines/
βββ _constructor/ (if cross-cutting concerns exist)
βββ _destructor/ (if cleanup is needed)
βββ <pipeline-1>/
β βββ pipeline.yaml
β βββ schemas/
β βββ steps/
βββ <pipeline-2>/
βββ ...
- pipeline.yaml is the single source of truth β Step definitions and routing info (description + triggers) live in one file per Pipeline; no centralized index, eliminating duplication and consistency risks
- Pipelines are declarations, not implementations β YAML defines the flow; scripts implement the steps; language-agnostic
- stdin JSON in, stdout JSON out β All scripts (code steps and validation scripts) follow a unified data passing convention
- Structural + content validation β JSON Schema for format checks; validation scripts for business logic checks; two lines of defense
- Model tiers, not model names β APPs declare capability levels (lite / standard / reasoning); the runtime maps to actual models
- SKILL.md is pure fallback β Used only when no Pipeline matches; does not serve as the Runner's dispatch logic
- Runner is a standalone program β All deterministic logic (routing, scheduling, data passing, constructor/destructor) is guaranteed by code; the LLM is only invoked during llm steps
- Constructor/destructor is a framework-level guarantee β Cross-cutting concerns (config checks, environment init, cleanup) do not rely on LLM judgment or business steps calling them; enforced by the Runner
- Marginal cost of Pipelines is near-zero β Generated by LLMs; all known paths can be codified on Day 0
Version: v0.4 Date: 2026-04-16 License: MIT