A curated library of high-fidelity prompts, cognitive architectures, and agent protocols. Every module is typed, versioned, tested, composable, and falsifiable.
One file. One job. One test. One refusal path. No essays. No emoji. No runtime.
This repository is structured around Elon Musk's five-step engineering algorithm. Every release can be placed in exactly one of the five steps:
1. Make the requirements less dumb → we rewrote "ship as many prompts as possible" to "every prompt ships with a frontmatter schema, a test spec, a refusal condition, and a prior-art citation." 2. Delete the part or process → we removed three aspirational eval specs (foundation primitives are meta-templates), the
testedbadge that was not true, and the ~4,000-line upstream Kriterion reference runner (we kept the 180-line kernel). 3. Simplify or optimise → unifiedpxlCLI replaced six legacy entry points;canonical.pyreplaced 4,000 LOC with 180 LOC;ExecutionChainis a 25-line dataclass. 4. Accelerate cycle time → full local quality gate runs in ~15 s; full CI runs in ~1 m 6 s across 142 tests and 3-layer audit. 5. Automate → seven GitHub Actions jobs, a tag-triggered release workflow, andpxl dashboardregenerates every number from real artifacts on every call. Automation came last, not first.
See docs/first-principles.md for the full retrospective audit with deletions traced to commits, honest exceptions, and the discipline check ("you should have to add back at least 10% of what you delete").
On a single commodity core, the canonical primitive produces 5,290 full tamper-evident seven-phase execution chains per second. Linear scaling holds because each chain is an independent, pure-function unit.
| Cores | Hardware class | Full audit chains/sec |
|---|---|---|
| 1 | laptop | 5.29 K |
| 8 | workstation | 42 K |
| 96 | dual-socket server | 508 K |
| 10 K | small Kubernetes cluster | 52.9 M |
| 100 K | frontier HPC | 529 M |
| 1 M | hypothetical hyperscale | 5.29 B |
Key observation: the world currently produces ~5 × 10⁵ LLM responses per second across all vendors combined. A single rackmount server running pxl.scale.batch_execution_chain at 96 cores produces 508 K chains/sec — enough to audit every LLM response on Earth, in real time, with compute cost < 1% of the inference cost that produced them.
The canonical primitive is never the bottleneck. The LLM is. This ratio holds to 10⁸ responses/sec and beyond, which is where the 2030 credible projections put the industry.
src/pxl/scale.py exposes the parallel primitive as a public API:
from pxl.scale import batch_execution_chain, parallel_audit_all, batch_canonical_hash
# Audit every integrated layer in parallel (3-layer, <1ms)
results = parallel_audit_all()
# Compute full 7-phase chains for a million bundles (embarrassingly parallel)
terminal_hashes = batch_execution_chain(
bundles,
contract_version="2026.04",
max_workers=96,
)Parallel versions are tested byte-for-byte against their serial counterparts. If the primitive diverges between cores, CI fails. Reproducibility is not negotiated.
See docs/scaling.md for the full scaling math, compute-cost analysis, and the argument for universal deployment.
$ pip install https://github.com/neuron7xLab/prompt-x-lab/releases/download/v0.7.0/prompt_x_lab-0.7.0-py3-none-any.whl
$ pxl dashboard
╭──────────────────────────────────────╮
│ p r o m p t x l a b │
│ version 0.7.0 · production-stable │
╰──────────────────────────────────────╯
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳─────────┳───────┓
┃ Layer ┃ Kind ┃ Modules ┃ Audit ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ 00 FOUNDATION │ primitives │ 3 │ — │
│ 01 COGNITION │ scaffolds │ 3 │ — │
│ 02 ENGINEERING │ seed │ 3 │ — │
│ 03 PERSONAS │ seed │ 2 │ — │
│ 04 VALIDATION │ gates │ 2 │ — │
│ 05 ORCHESTRATION │ verbatim │ 26 │ ✓ │
│ 06 ECA ENGINE │ typed port │ 34 │ ✓ │
│ 07 KRITERION │ kernel │ 18 │ ✓ │
├──────────────────┼────────────┼─────────┼───────┤
│ TOTAL │ │ 91 │ │
└──────────────────┴────────────┴─────────┴───────┘
Subsystem health · ECA router 99.44% · scorer 90.62% · FP=0
· Kriterion 10/10 reproduction · canonical kernel 180 LOC
✓ validate 91 modules ✓ audit 3 layers · commit 7e50a5f
| What you get | Number | How |
|---|---|---|
| Layers in the stack | 8 | foundation → cognition → engineering → personas → validation → orchestration → ECA → Kriterion |
| Total modules | 91 | hand-written seed (13) + verbatim (26) + typed ports (52) |
| pytest tests | 129 | unit · integration · 12 hypothesis property-based · 10 benchmarks |
| mypy --strict files | 31 | one-to-one with runtime imports |
| SHA-256 audit layers | 3 | 78 hashed bodies, CI-verified on every push |
step_hash latency |
4.0 μs | 250K ops/sec on commodity hardware — never the bottleneck |
| Full 7-phase chain | 189 μs | 5.3K audited evaluations/sec/core |
| ECA routing | 18 μs | 56K requests/sec/core |
| Kriterion reproduction | 10/10 | byte-for-byte match against upstream dataset_manifest.json |
| CLI entry points | 7 | pxl (unified) + 6 legacy aliases · python -m pxl works |
Prompt engineering is in the phase that software engineering was in before version control. Most prompts live in Notion pages, scratch files, and Slack messages. When they work, nobody knows why. When they break, nobody knows what changed.
Prompt X Lab is the opposite of that.
Lower layers compose into higher layers. Every module imports only from layers below it. |
Every module here is:
This is not a prompt zoo. It is an engineering library. |
┌─────────────────────────────────────────────────────┐
│ P R O M P T X L A B │
│ 6-layer · 13 seed + 26 orchestration · text-only │
└─────────────────────────┬───────────────────────────┘
│
┌─────────────────────────────────────┼─────────────────────────────────────┐
│ │ │
┌───────▼────────┐ ┌───────────────┐ ┌──────▼───────┐ ┌───────────────┐ ┌───────▼────────┐
│ 00 FOUNDATION │ │ 01 COGNITION │ │ 02 ENGINEER │ │ 03 PERSONAS │ │ 04 VALIDATION │
│ identity │ │ executive-eng │ │ senior-rev │ │ socratic tutor│ │ hallucination │
│ constraint │ │ creator-crit │ │ legacy-rfctr │ │ strat advisor │ │ fallacy check │
│ output │ │ CoT scaffold │ │ test-gen │ │ │ │ │
└────────┬───────┘ └───────┬───────┘ └──────┬───────┘ └───────┬───────┘ └────────┬───────┘
│ │ │ │ │
└──────────────────┴─────────────────┼──────────────────┴───────────────────┘
│
┌───────────────────▼─────────────────────┐
│ 05 O R C H E S T R A T I O N │
│ protocols · agents · frameworks │
│ · crypto · research │
│ (26 production-grade long systems) │
└───────────────────┬─────────────────────┘
│
┌───────────────────▼─────────────────────┐
│ COMPOSITION PATTERN │
│ identity + constraint + scaffold │
│ + domain + output + gate │
└─────────────────────────────────────────┘
| Layer | Path | Count | Purpose |
FOUNDATION | 00_foundation/ | 3 | Primitives every module inherits — identity, constraint, output shape. |
COGNITION | 01_cognition/ | 3 | Thinking scaffolds — executive-engine, creator-critic-verifier, CoT scaffold. |
ENGINEERING | 02_engineering/ | 3 | Senior code reviewer, legacy refactor surgeon, property-test generator. |
PERSONAS | 03_personas/ | 2 | Stateful interactive agents — Socratic tutor, strategic advisor. |
VALIDATION | 04_validation/ | 2 | Adversarial output gates — hallucination gate, fallacy checker. |
ORCHESTRATION | 05_orchestration/ | 26 | Production long-form systems — execution protocols, PR agents, flagship frameworks, crypto/trading, research methodology. |
ECA ENGINE | 06_eca_engine/ | 34 | ECA v1.1 cognitive engine — content, typed Python port, reproduction tests. 77-iter calibrated. |
KRITERION | 07_kriterion/ | 18 | Kriterion v2026.4.5 — fail-closed evaluation primitive, 6 protocols, 9 schemas, canonical hashing kernel, 10-case reproduction. |
|
If a prompt does two things, split it. The test: can you describe what it does in one sentence without using "and"? "Be thoughtful" is banned. "Cite the exact line number" is required. Every module states its role, its forbidden modes, and its output shape. Every module has a literal refusal string. Graceful degradation into plausible-sounding hallucination is the default failure mode of prompt engineering — this repo refuses to ship it. |
Modules reference the paper, heuristic, or incident they come from. Feathers for refactoring. Peirce for inference types. Halmos for Socratic teaching. No mystery meat. Breaking changes bump the filename version. Four questions — if any answer is no, the module is not ready:
|
Identity |
Constraint |
Output |
Executive |
C-C-V Triad |
CoT Scaffold |
Senior Reviewer |
Refactor Surgeon |
Test Generator |
Socratic Tutor |
Strategic Advisor |
Hallucination Gate |
Fallacy Checker |
Your Module |
Taxonomy |
Modules are designed to stack. A typical production prompt is assembled top-down from lower to higher layers:
┌─────────────────────────────────────────────────────────────────────┐
│ IDENTITY ← 00_foundation/identity-primitive.md │
│ Who the model is. What it refuses to be. │
├─────────────────────────────────────────────────────────────────────┤
│ CONSTRAINTS ← 00_foundation/constraint-primitive.md │
│ The 3 failure modes it must never produce. │
├─────────────────────────────────────────────────────────────────────┤
│ SCAFFOLD ← 01_cognition/executive-engine.md │
│ How it thinks: planner → executor → critic. │
├─────────────────────────────────────────────────────────────────────┤
│ DOMAIN ← 02_engineering/senior-code-reviewer.md │
│ What it does today: review this PR. │
├─────────────────────────────────────────────────────────────────────┤
│ OUTPUT ← 00_foundation/output-primitive.md │
│ Exact shape: sections, lengths, refusal string. │
├─────────────────────────────────────────────────────────────────────┤
│ GATE ← 04_validation/hallucination-gate.md │
│ Adversarial filter. PASS or REFUSED. │
└─────────────────────────────────────────────────────────────────────┘
│
▼
trusted output
The stack reads top-down: who you are → what you must not do → how you think → what you're doing today → how you answer → who double-checks the answer.
| Situation | Use this | Why |
|---|---|---|
| Task is well-defined; first answer is usually right | No scaffold. Direct prompt. | Scaffolds cost tokens. Don't pay for what you don't need. |
| Non-trivial task where the model could miss a failure mode | executive-engine |
Planner + Executor + Critic catches its own mistakes. |
| High-stakes decision where a polished wrong answer is expensive | creator-critic-verifier |
Adversarial triad; the Critic is structurally unable to rubber-stamp. |
| Reasoning-heavy task where every step must be auditable | chain-of-thought-scaffold |
Every inference is tagged with its type (deductive / abductive / …). |
Rule of thumb: the right scaffold is the smallest one that works. Don't use a triad for a task a direct prompt would solve.
# clone
git clone https://github.com/neuron7xLab/prompt-x-lab.git
cd prompt-x-lab
# pick a module
cat 02_engineering/senior-code-reviewer.md
# copy the Identity, Core logic, and Constraints sections into your system prompt
# paste the diff you want reviewed as the user message
# read the output — if it's bad, file an issue with the exact input that broke itNo installation. No dependencies. No runtime. Just text.
╳ prompt essays long prose walls — models don't read them, they pattern-match on them
╳ vibey role-play "you are a world-class expert" — this is a vibe, not a constraint
╳ emoji-heavy UX cute, widely copied, and almost always padding
╳ "let's think step by step" vague, mostly superseded — use a typed CoT scaffold instead
╳ self-referential templates if your prompt talks about prompts, it's methodology not a module
╳ silent hallucination refuse loudly; never degrade into plausible-sounding guesses
prompt-x-lab/
├── README.md ← you are here
├── LICENSE ← MIT
├── CHANGELOG.md ← Keep a Changelog + SemVer
├── .gitignore
│
├── .github/
│ └── assets/
│ ├── banner-dark.svg ← minimalist RGB-on-black banner
│ ├── banner-light.svg ← same composition, paired
│ ├── divider.svg ← RGB gradient divider
│ ├── eca-cognitive-engine.svg ← neuro-fractal visual study
│ ├── crest-{360,720,1080}.webp ← Advanced Orchestration crest
│ └── crest.manifest.json ← crest variant metadata
│
├── .metadata/
│ ├── taxonomy.json ← machine-readable layer graph
│ └── manifest.yaml ← module inventory with status + vectors
│
├── 00_foundation/ ← primitives every module inherits
│ ├── README.md
│ ├── identity-primitive.md
│ ├── constraint-primitive.md
│ └── output-primitive.md
│
├── 01_cognition/ ← thinking architectures
│ ├── README.md
│ ├── executive-engine.md
│ ├── creator-critic-verifier.md
│ └── chain-of-thought-scaffold.md
│
├── 02_engineering/ ← code synthesis, review, refactor, tests
│ ├── README.md
│ ├── senior-code-reviewer.md
│ ├── legacy-refactor-expert.md
│ └── test-generator.md
│
├── 03_personas/ ← stateful interactive agents
│ ├── README.md
│ ├── socratic-tutor.md
│ └── strategic-advisor.md
│
├── 04_validation/ ← adversarial output gates
│ ├── README.md
│ ├── hallucination-gate.md
│ └── logical-fallacy-checker.md
│
├── 05_orchestration/ ← Advanced Orchestration v1 (26 modules)
│ ├── README.md
│ ├── protocols/ (6) ← SPST · DSIO · IOA · LRE · PGE · SMLRS
│ ├── agents/ (9) ← PR automation agents
│ ├── frameworks/ (5) ← flagship long-form frameworks
│ ├── crypto/ (3) ← crypto & trading systems
│ └── research/ (3) ← methodology & research protocols
│
├── 06_eca_engine/ ← ECA Cognitive Engine v1.1 (34 files)
│ ├── README.md
│ ├── core/ (6) ← prompt, proof tiers, config, templates
│ ├── runtime/ (4) ← policy, fallback, router spec, budget
│ ├── benchmarks/ (3) ← metrics, rubric, live protocol
│ ├── security/ (3) ← model, guardrails, provenance
│ ├── schemas/ (2) ← request + response envelopes
│ ├── legal/ (1) ← EULA template
│ ├── docs/ (15) ← architecture · calibration · ops
│ └── AUDIT.sha256 ← body audit (34 entries)
│
├── 07_kriterion/ ← Kriterion v2026.4.5 (18 files)
│ ├── README.md
│ ├── protocols/ (6) ← 6 security-role protocols
│ ├── schemas/ (9) ← 9 canonical evaluation schemas
│ ├── methodology/ (3) ← methodology · threat model · reasoning
│ └── AUDIT.sha256 ← body audit (18 entries)
│
├── templates/
│ └── base-module.md ← start every new module here
│
├── src/pxl/ ← Python package (mypy --strict, 24 files)
│ ├── models.py, assembly.py, providers.py, judge.py
│ ├── runner.py, validator.py, audit.py, badges.py, cli.py
│ ├── eca/ ← ECA v1.1 typed subsystem
│ │ ├── schemas.py, config.py, router.py, scorer.py
│ │ ├── signer.py, validate.py, cli.py
│ │ ├── assets/ ← bundled YAML / JSON / TXT
│ │ └── datasets/ ← 180 req + 192 resp + holdouts
│ ├── kriterion/ ← Kriterion minimalist kernel
│ │ ├── canonical.py ← 180-line fail-closed primitive
│ │ ├── schemas.py, protocols.py
│ │ ├── benchmark.py, cli.py
│ │ ├── assets/ ← 9 schemas + 6 protocols
│ │ └── datasets/ ← 10 synthetic cases + manifest
│ └── py.typed
│
├── schemas/ ← JSON Schemas — single source of truth
│ ├── module.schema.json
│ ├── eval-spec.schema.json
│ └── eval-result.schema.json
│
├── evals/ ← evaluation harness
│ ├── specs/ (10 YAML, 20 cases)
│ └── results/ badges.json · run JSONs
│
├── tests/ ← pytest suite (22 tests · 6 files)
│
├── docs/
│ ├── methodology.md ← why this library exists
│ ├── naming-convention.md ← SemVer + filename rules
│ ├── usage-guide.md ← composition, scaffold selection
│ ├── composition-algebra.md ← EBNF grammar + type rules
│ ├── evaluation-protocol.md ← pass/fail epistemology
│ ├── references.bib ← bibliography (BibTeX)
│ └── case-studies/ ← 3 concrete runs with full rubric traces
│
├── Makefile ← make validate · test · lint · typecheck · eval · audit
├── pyproject.toml ← project metadata, deps, tool config
├── CLAUDE.md ← development rules (Claude Code contract)
├── .pre-commit-config.yaml ← hooks
└── .github/workflows/ci.yml ← validate · test · lint · mypy · audit · eval-mock
Start from templates/base-module.md. Every new module must ship with:
|
Required — no exceptions
Modules without tests are rejected. No exceptions. |
Version bump rules
Deprecation: set |
| Project | What it is |
|---|---|
neuron7xLab/GeoSync |
Geometric market intelligence — Kuramoto · Ricci · thermodynamics · 57 invariants |
neuron7xLab/neurophase |
Phase synchronization as execution gate — brain × market oscillators |
neuron7xLab/neosynaptex |
γ-scaling diagnostics across biological, physical, cognitive substrates |
neuron7xLab/mycelium-fractal-net |
Morphogenetic field engine — reaction-diffusion + TDA + causal rules |
|
prompt-x-lab v0.3.0 ships with a real engineering harness — not a demo, not a marketing claim. Every discipline the library preaches in its seed modules is enforced mechanically against its own content:
|
| Gate | Command | What it proves |
frontmatter | pxl-validate | All 39 modules conform to the Pydantic schema. |
pytest | pytest -q | 22 unit tests covering validator, assembly, audit, judge, runner, models. |
ruff | ruff check src scripts evals tests | Style and lint rules (E, F, I, B, UP, N, SIM, RUF, ANN). |
mypy --strict | mypy src | Full type check across the 10 `pxl` source files. |
audit | python -m pxl.audit verify | SHA256 body integrity of all 26 orchestration modules. |
eval · mock | pxl-eval --provider mock | End-to-end harness plumbing (no API key required). |
eval · real | pxl-eval | Live rubric evaluation against Claude Opus 4.6. |
|
A composition is typed |
|
Every seed module now has a Prior art section naming its intellectual ancestors. Citations are collected in docs/references.bib — Peirce for inference types, Feathers for refactoring, Halmos for Socratic teaching, Kahneman for the Executive Engine, QuickCheck for property tests, Popper for falsifiability, Horowitz for advising.
A claim without a prior-art anchor is not allowed to ship. The rule is mechanical: if the module references a technique that is not in the bibliography, the module is rejected.
docs/case-studies/ contains three concrete runs — the unbounded-cache PR review, the Fibonacci(n ≤ 10¹⁸) trap for Executive Engine, and the Apollo 11 hallucination gate. Each study shows the exact input, exact output, rubric trace, verdict, and the adversarial variant that would have broken it. Case studies are the second pillar of falsifiability in the repo (the first is the eval harness).
Provenance: Advanced Orchestration v1 Status: integrated verbatim License: single-owner proprietary Packaging: prompt-x-lab native |
The first four layers are primitives: short, hand-written, fit-on-one-screen. Layer 05 is the opposite — production-sized systems that would drown a foundation layer, adapted here without a single byte of content change. Every module in Composition rule preserved: an orchestration module is still wrapped top-down by |
| Category | Path | Modules | Contents |
PROTOCOLS |
05_orchestration/protocols/ |
6 | SPST · DSIO · IOA · LRE · PGE · SMLRS — execution protocols for Codex/Principal-Eng level repo transformations. |
AGENTS |
05_orchestration/agents/ |
9 | Pull-request automation agents: transform, audit, stabilise, and ship repository-scale changes deterministically. |
FRAMEWORKS |
05_orchestration/frameworks/ |
5 | Flagship long-form frameworks — multi-phase, multi-contract, multi-artifact operators. |
CRYPTO |
05_orchestration/crypto/ |
3 | Crypto & trading systems — order-flow, regime detection, quant pipeline integration. |
RESEARCH |
05_orchestration/research/ |
3 | Methodology & research protocols — reproducibility, evidence-bound inference, falsification ladders. |
ECA v1.1.0 · selected_iteration 27 · 77-iter calibrated
22 pytest tests · |
ECA is a production-candidate cognitive operating layer: one synchronised reasoning system routed across six modes ( Unlike layer 05, which is a verbatim text copy, layer 06 is a full native integration:
Reproduced calibration numbers (full-corpus replay):
The zero-FP invariant is load-bearing: across the entire synthetic response corpus, the scorer has never green-lit a response it should have blocked. This is the strongest single property of the ECA quality gate. |
Kriterion v2026.4.5 · 10/10 reproduction · 40 tests |
Kriterion's contribution is a single load-bearing idea: if every evaluation phase hashes its canonical input and links to the previous phase, the whole pipeline is a tamper-evident chain. Any modification to any phase invalidates every subsequent hash. Fail-closed by construction. Layer 07 integrates only the reusable kernel:
The kernel is reusable for any audit pipeline, not just Kriterion's. If you want fail-closed evaluation: from pxl.kriterion import ExecutionChain, Phase
chain = ExecutionChain.start(my_bundle, contract_version="1.0.0")
for phase in Phase:
chain.advance(phase, phase_input=phase_state[phase])
# chain.terminal_hash now proves the entire executionSeven properties guarantee fail-closedness: canonical-form uniqueness, domain separation of genesis and step hashes, linear chain structure, format-version baking, contract-version baking, bundle-dependent genesis, and re-derivable terminal hash. Every property is enforced by a dedicated test in License: content under AGCL-1.0 (community, non-commercial); |
Radial recursive expansion · Metatron's Cube core · seven fractal rings · golden-ratio Trinity arcs · bilateral symmetry — organic left, crystalline right · void-shift palette.
|
No flowchart. No boxes. No rectangles. Every element obeys a mathematical rule:
Palette: |
MIT · Solo · Ukraine 🇺🇦 · 2026
"Don't trust anyone. Don't even trust yourself." — Elon Musk, Lex Fridman Podcast #400
This repo is a discipline, not a catalog. Every module earns its place.