`p r o m p t x l a b`

A curated library of high-fidelity prompts, cognitive architectures, and agent protocols. Every module is typed, versioned, tested, composable, and falsifiable.

One file. One job. One test. One refusal path. No essays. No emoji. No runtime.

First principles · applied

This repository is structured around Elon Musk's five-step engineering algorithm. Every release can be placed in exactly one of the five steps:

1. Make the requirements less dumb → we rewrote "ship as many prompts as possible" to "every prompt ships with a frontmatter schema, a test spec, a refusal condition, and a prior-art citation." 2. Delete the part or process → we removed three aspirational eval specs (foundation primitives are meta-templates), the tested badge that was not true, and the ~4,000-line upstream Kriterion reference runner (we kept the 180-line kernel). 3. Simplify or optimise → unified pxl CLI replaced six legacy entry points; canonical.py replaced 4,000 LOC with 180 LOC; ExecutionChain is a 25-line dataclass. 4. Accelerate cycle time → full local quality gate runs in ~15 s; full CI runs in ~1 m 6 s across 142 tests and 3-layer audit. 5. Automate → seven GitHub Actions jobs, a tag-triggered release workflow, and pxl dashboard regenerates every number from real artifacts on every call. Automation came last, not first.

See docs/first-principles.md for the full retrospective audit with deletions traced to commits, honest exceptions, and the discipline check ("you should have to add back at least 10% of what you delete").

At scale · the arithmetic of cognitive infrastructure

On a single commodity core, the canonical primitive produces 5,290 full tamper-evident seven-phase execution chains per second. Linear scaling holds because each chain is an independent, pure-function unit.

Cores	Hardware class	Full audit chains/sec
1	laptop	5.29 K
8	workstation	42 K
96	dual-socket server	508 K
10 K	small Kubernetes cluster	52.9 M
100 K	frontier HPC	529 M
1 M	hypothetical hyperscale	5.29 B

Key observation: the world currently produces ~5 × 10⁵ LLM responses per second across all vendors combined. A single rackmount server running pxl.scale.batch_execution_chain at 96 cores produces 508 K chains/sec — enough to audit every LLM response on Earth, in real time, with compute cost < 1% of the inference cost that produced them.

The canonical primitive is never the bottleneck. The LLM is. This ratio holds to 10⁸ responses/sec and beyond, which is where the 2030 credible projections put the industry.

src/pxl/scale.py exposes the parallel primitive as a public API:

from pxl.scale import batch_execution_chain, parallel_audit_all, batch_canonical_hash

# Audit every integrated layer in parallel (3-layer, <1ms)
results = parallel_audit_all()

# Compute full 7-phase chains for a million bundles (embarrassingly parallel)
terminal_hashes = batch_execution_chain(
    bundles,
    contract_version="2026.04",
    max_workers=96,
)

Parallel versions are tested byte-for-byte against their serial counterparts. If the primitive diverges between cores, CI fails. Reproducibility is not negotiated.

See docs/scaling.md for the full scaling math, compute-cost analysis, and the argument for universal deployment.

At a glance

$ pip install https://github.com/neuron7xLab/prompt-x-lab/releases/download/v0.7.0/prompt_x_lab-0.7.0-py3-none-any.whl

$ pxl dashboard
  ╭──────────────────────────────────────╮
  │  p r o m p t   x   l a b             │
  │  version 0.7.0  ·  production-stable │
  ╰──────────────────────────────────────╯
  ┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳─────────┳───────┓
  ┃ Layer            ┃ Kind       ┃ Modules ┃ Audit ┃
  ┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
  │ 00 FOUNDATION    │ primitives │    3    │   —   │
  │ 01 COGNITION     │ scaffolds  │    3    │   —   │
  │ 02 ENGINEERING   │ seed       │    3    │   —   │
  │ 03 PERSONAS      │ seed       │    2    │   —   │
  │ 04 VALIDATION    │ gates      │    2    │   —   │
  │ 05 ORCHESTRATION │ verbatim   │   26    │   ✓   │
  │ 06 ECA ENGINE    │ typed port │   34    │   ✓   │
  │ 07 KRITERION     │ kernel     │   18    │   ✓   │
  ├──────────────────┼────────────┼─────────┼───────┤
  │ TOTAL            │            │   91    │       │
  └──────────────────┴────────────┴─────────┴───────┘
  Subsystem health · ECA router 99.44% · scorer 90.62% · FP=0
                   · Kriterion 10/10 reproduction · canonical kernel 180 LOC
  ✓ validate 91 modules  ✓ audit 3 layers  · commit 7e50a5f

What you get	Number	How
Layers in the stack	8	foundation → cognition → engineering → personas → validation → orchestration → ECA → Kriterion
Total modules	91	hand-written seed (13) + verbatim (26) + typed ports (52)
pytest tests	129	unit · integration · 12 hypothesis property-based · 10 benchmarks
mypy --strict files	31	one-to-one with runtime imports
SHA-256 audit layers	3	78 hashed bodies, CI-verified on every push
`step_hash` latency	4.0 μs	250K ops/sec on commodity hardware — never the bottleneck
Full 7-phase chain	189 μs	5.3K audited evaluations/sec/core
ECA routing	18 μs	56K requests/sec/core
Kriterion reproduction	10/10	byte-for-byte match against upstream `dataset_manifest.json`
CLI entry points	7	`pxl` (unified) + 6 legacy aliases · `python -m pxl` works

The Signal

Prompt engineering is in the phase that software engineering was in before version control. Most prompts live in Notion pages, scratch files, and Slack messages. When they work, nobody knows why. When they break, nobody knows what changed.

Prompt X Lab is the opposite of that.

   layer 07 ── Kriterion (18 content · canonical primitive · benchmark)
      ▲
   layer 06 ── ECA cognitive engine (34 content · Python port · reproduction)
      ▲
   layer 05 ── orchestration (26 long-form systems)
      ▲
   layer 04 ── validation gates
      ▲
   layer 03 ── interactive personas
      ▲
   layer 02 ── engineering modules
      ▲
   layer 01 ── cognitive scaffolds
      ▲
   layer 00 ── foundation primitives
      ▲
   ────────────────────────────────
   the model

Lower layers compose into higher layers. Every module imports only from layers below it.

Every module here is:

Typed — declares category, vector, target models.
Versioned — filename embeds SemVer. No silent mutations.
Tested — ships with a test prompt + expected behavior. Modules without tests are rejected.
Composable — depends on lower-layer modules by reference.
Falsifiable — every claim the module makes is something a reviewer can check in five minutes.
Model-agnostic — tuned on Claude 4.6, GPT-5.4, Llama-4; tested across all three.

This is not a prompt zoo. It is an engineering library.

Architecture

                      ┌─────────────────────────────────────────────────────┐
                      │              P R O M P T   X   L A B                │
                      │   6-layer · 13 seed + 26 orchestration · text-only  │
                      └─────────────────────────┬───────────────────────────┘
                                                │
          ┌─────────────────────────────────────┼─────────────────────────────────────┐
          │                                     │                                     │
  ┌───────▼────────┐  ┌───────────────┐  ┌──────▼───────┐  ┌───────────────┐  ┌───────▼────────┐
  │ 00 FOUNDATION  │  │ 01 COGNITION  │  │ 02 ENGINEER  │  │ 03 PERSONAS   │  │ 04 VALIDATION  │
  │ identity       │  │ executive-eng │  │ senior-rev   │  │ socratic tutor│  │ hallucination  │
  │ constraint     │  │ creator-crit  │  │ legacy-rfctr │  │ strat advisor │  │ fallacy check  │
  │ output         │  │ CoT scaffold  │  │ test-gen     │  │               │  │                │
  └────────┬───────┘  └───────┬───────┘  └──────┬───────┘  └───────┬───────┘  └────────┬───────┘
           │                  │                 │                  │                   │
           └──────────────────┴─────────────────┼──────────────────┴───────────────────┘
                                                │
                            ┌───────────────────▼─────────────────────┐
                            │         05  O R C H E S T R A T I O N  │
                            │   protocols · agents · frameworks       │
                            │       · crypto · research               │
                            │   (26 production-grade long systems)    │
                            └───────────────────┬─────────────────────┘
                                                │
                            ┌───────────────────▼─────────────────────┐
                            │          COMPOSITION  PATTERN           │
                            │   identity + constraint + scaffold      │
                            │      + domain + output + gate           │
                            └─────────────────────────────────────────┘

Layer	Path	Count	Purpose
`FOUNDATION`	`00_foundation/`	3	Primitives every module inherits — identity, constraint, output shape.
`COGNITION`	`01_cognition/`	3	Thinking scaffolds — executive-engine, creator-critic-verifier, CoT scaffold.
`ENGINEERING`	`02_engineering/`	3	Senior code reviewer, legacy refactor surgeon, property-test generator.
`PERSONAS`	`03_personas/`	2	Stateful interactive agents — Socratic tutor, strategic advisor.
`VALIDATION`	`04_validation/`	2	Adversarial output gates — hallucination gate, fallacy checker.
`ORCHESTRATION`	`05_orchestration/`	26	Production long-form systems — execution protocols, PR agents, flagship frameworks, crypto/trading, research methodology.
`ECA ENGINE`	`06_eca_engine/`	34	ECA v1.1 cognitive engine — content, typed Python port, reproduction tests. 77-iter calibrated.
`KRITERION`	`07_kriterion/`	18	Kriterion v2026.4.5 — fail-closed evaluation primitive, 6 protocols, 9 schemas, canonical hashing kernel, 10-case reproduction.

The Five Design Commitments

1. One module, one job

If a prompt does two things, split it. The test: can you describe what it does in one sentence without using "and"?

2. Explicit over implicit

"Be thoughtful" is banned. "Cite the exact line number" is required. Every module states its role, its forbidden modes, and its output shape.

3. Fail loudly

Every module has a literal refusal string. Graceful degradation into plausible-sounding hallucination is the default failure mode of prompt engineering — this repo refuses to ship it.

4. Cite the source

Modules reference the paper, heuristic, or incident they come from. Feathers for refactoring. Peirce for inference types. Halmos for Socratic teaching. No mystery meat.

5. Version everything

Breaking changes bump the filename version. executive-engine.md → executive-engine-v2.md. The old file gets status: deprecated but is never deleted — dependents still work.

The test for every module

Four questions — if any answer is no, the module is not ready:

One-sentence description without "and"?
A specific naive input handled correctly?
At least one explicit refusal?
A simpler version that still works?

Seed Modules

    ╭──────╮
   │  WHO   │
    ╰──┬───╯
       │
     identity

Identity
_{identity-primitive}

    ╭──────╮
   │ FORBID │
    ╰──┬───╯
       │
    constraints

Constraint
_{constraint-primitive}

    ╭──────╮
   │ SHAPE  │
    ╰──┬───╯
       │
     output

Output
_{output-primitive}

   plan
    │
    ▼
   exec
    │
    ▼
  critic

Executive
_{executive-engine}

  create
   × ↓
  critic
   × ↓
  verify

C-C-V Triad
_{creator-critic-verifier}

   obs
    │
   infer
    │
   open?
    │
   decide

CoT Scaffold
_{chain-of-thought}

   ╭───╮
   │ PR │
   ╰─┬─╯
     ▼
   review

Senior Reviewer
_{senior-code-reviewer}

   legacy
     │
  char-tests
     │
   refactor

Refactor Surgeon
_{legacy-refactor-expert}

  sig + doc
     │
  invariants
     │
   properties

Test Generator
_{test-generator}

   ?
    │
    ?
    │
    ?
   insight

Socratic Tutor
_{socratic-tutor}

  restate
    │
  unsaid
    │
  decide
    │
  24h act

Strategic Advisor
_{strategic-advisor}

 ctx + draft
     │
  audit
     │
  PASS/FAIL

Hallucination Gate
_{hallucination-gate}

  argument
     │
  taxonomy
     │
  steelman

Fallacy Checker
_{logical-fallacy-checker}

   ┌───┐
   │ + │
   │ → │
   └───┘

Your Module
_{start from template}

   ┌───┐
   │ ✕ │
   │ ✕ │
   └───┘

Taxonomy
_{taxonomy.json}

Composition Pattern

Modules are designed to stack. A typical production prompt is assembled top-down from lower to higher layers:

┌─────────────────────────────────────────────────────────────────────┐
│  IDENTITY       ← 00_foundation/identity-primitive.md               │
│    Who the model is. What it refuses to be.                          │
├─────────────────────────────────────────────────────────────────────┤
│  CONSTRAINTS    ← 00_foundation/constraint-primitive.md              │
│    The 3 failure modes it must never produce.                        │
├─────────────────────────────────────────────────────────────────────┤
│  SCAFFOLD       ← 01_cognition/executive-engine.md                   │
│    How it thinks: planner → executor → critic.                       │
├─────────────────────────────────────────────────────────────────────┤
│  DOMAIN         ← 02_engineering/senior-code-reviewer.md             │
│    What it does today: review this PR.                               │
├─────────────────────────────────────────────────────────────────────┤
│  OUTPUT         ← 00_foundation/output-primitive.md                  │
│    Exact shape: sections, lengths, refusal string.                   │
├─────────────────────────────────────────────────────────────────────┤
│  GATE           ← 04_validation/hallucination-gate.md                │
│    Adversarial filter. PASS or REFUSED.                              │
└─────────────────────────────────────────────────────────────────────┘
                               │
                               ▼
                        trusted output

The stack reads top-down: who you are → what you must not do → how you think → what you're doing today → how you answer → who double-checks the answer.

Choosing a Scaffold

Situation	Use this	Why
Task is well-defined; first answer is usually right	No scaffold. Direct prompt.	Scaffolds cost tokens. Don't pay for what you don't need.
Non-trivial task where the model could miss a failure mode	`executive-engine`	Planner + Executor + Critic catches its own mistakes.
High-stakes decision where a polished wrong answer is expensive	`creator-critic-verifier`	Adversarial triad; the Critic is structurally unable to rubber-stamp.
Reasoning-heavy task where every step must be auditable	`chain-of-thought-scaffold`	Every inference is tagged with its type (deductive / abductive / …).

Rule of thumb: the right scaffold is the smallest one that works. Don't use a triad for a task a direct prompt would solve.

Quickstart

# clone
git clone https://github.com/neuron7xLab/prompt-x-lab.git
cd prompt-x-lab

# pick a module
cat 02_engineering/senior-code-reviewer.md

# copy the Identity, Core logic, and Constraints sections into your system prompt
# paste the diff you want reviewed as the user message
# read the output — if it's bad, file an issue with the exact input that broke it

No installation. No dependencies. No runtime. Just text.

Anti-Patterns This Repository Refuses to Ship

╳  prompt essays              long prose walls — models don't read them, they pattern-match on them
╳  vibey role-play             "you are a world-class expert" — this is a vibe, not a constraint
╳  emoji-heavy UX              cute, widely copied, and almost always padding
╳  "let's think step by step"  vague, mostly superseded — use a typed CoT scaffold instead
╳  self-referential templates  if your prompt talks about prompts, it's methodology not a module
╳  silent hallucination        refuse loudly; never degrade into plausible-sounding guesses

Repository Layout

prompt-x-lab/
├── README.md                      ← you are here
├── LICENSE                        ← MIT
├── CHANGELOG.md                   ← Keep a Changelog + SemVer
├── .gitignore
│
├── .github/
│   └── assets/
│       ├── banner-dark.svg        ← minimalist RGB-on-black banner
│       ├── banner-light.svg       ← same composition, paired
│       ├── divider.svg            ← RGB gradient divider
│       ├── eca-cognitive-engine.svg  ← neuro-fractal visual study
│       ├── crest-{360,720,1080}.webp ← Advanced Orchestration crest
│       └── crest.manifest.json    ← crest variant metadata
│
├── .metadata/
│   ├── taxonomy.json              ← machine-readable layer graph
│   └── manifest.yaml              ← module inventory with status + vectors
│
├── 00_foundation/                 ← primitives every module inherits
│   ├── README.md
│   ├── identity-primitive.md
│   ├── constraint-primitive.md
│   └── output-primitive.md
│
├── 01_cognition/                  ← thinking architectures
│   ├── README.md
│   ├── executive-engine.md
│   ├── creator-critic-verifier.md
│   └── chain-of-thought-scaffold.md
│
├── 02_engineering/                ← code synthesis, review, refactor, tests
│   ├── README.md
│   ├── senior-code-reviewer.md
│   ├── legacy-refactor-expert.md
│   └── test-generator.md
│
├── 03_personas/                   ← stateful interactive agents
│   ├── README.md
│   ├── socratic-tutor.md
│   └── strategic-advisor.md
│
├── 04_validation/                 ← adversarial output gates
│   ├── README.md
│   ├── hallucination-gate.md
│   └── logical-fallacy-checker.md
│
├── 05_orchestration/              ← Advanced Orchestration v1 (26 modules)
│   ├── README.md
│   ├── protocols/    (6)          ← SPST · DSIO · IOA · LRE · PGE · SMLRS
│   ├── agents/       (9)          ← PR automation agents
│   ├── frameworks/   (5)          ← flagship long-form frameworks
│   ├── crypto/       (3)          ← crypto & trading systems
│   └── research/     (3)          ← methodology & research protocols
│
├── 06_eca_engine/                 ← ECA Cognitive Engine v1.1 (34 files)
│   ├── README.md
│   ├── core/         (6)          ← prompt, proof tiers, config, templates
│   ├── runtime/      (4)          ← policy, fallback, router spec, budget
│   ├── benchmarks/   (3)          ← metrics, rubric, live protocol
│   ├── security/     (3)          ← model, guardrails, provenance
│   ├── schemas/      (2)          ← request + response envelopes
│   ├── legal/        (1)          ← EULA template
│   ├── docs/        (15)          ← architecture · calibration · ops
│   └── AUDIT.sha256               ← body audit (34 entries)
│
├── 07_kriterion/                  ← Kriterion v2026.4.5 (18 files)
│   ├── README.md
│   ├── protocols/    (6)          ← 6 security-role protocols
│   ├── schemas/      (9)          ← 9 canonical evaluation schemas
│   ├── methodology/  (3)          ← methodology · threat model · reasoning
│   └── AUDIT.sha256               ← body audit (18 entries)
│
├── templates/
│   └── base-module.md             ← start every new module here
│
├── src/pxl/                       ← Python package (mypy --strict, 24 files)
│   ├── models.py, assembly.py, providers.py, judge.py
│   ├── runner.py, validator.py, audit.py, badges.py, cli.py
│   ├── eca/                       ← ECA v1.1 typed subsystem
│   │   ├── schemas.py, config.py, router.py, scorer.py
│   │   ├── signer.py, validate.py, cli.py
│   │   ├── assets/                ← bundled YAML / JSON / TXT
│   │   └── datasets/              ← 180 req + 192 resp + holdouts
│   ├── kriterion/                 ← Kriterion minimalist kernel
│   │   ├── canonical.py           ← 180-line fail-closed primitive
│   │   ├── schemas.py, protocols.py
│   │   ├── benchmark.py, cli.py
│   │   ├── assets/                ← 9 schemas + 6 protocols
│   │   └── datasets/              ← 10 synthetic cases + manifest
│   └── py.typed
│
├── schemas/                       ← JSON Schemas — single source of truth
│   ├── module.schema.json
│   ├── eval-spec.schema.json
│   └── eval-result.schema.json
│
├── evals/                         ← evaluation harness
│   ├── specs/ (10 YAML, 20 cases)
│   └── results/ badges.json · run JSONs
│
├── tests/                         ← pytest suite (22 tests · 6 files)
│
├── docs/
│   ├── methodology.md             ← why this library exists
│   ├── naming-convention.md       ← SemVer + filename rules
│   ├── usage-guide.md             ← composition, scaffold selection
│   ├── composition-algebra.md     ← EBNF grammar + type rules
│   ├── evaluation-protocol.md     ← pass/fail epistemology
│   ├── references.bib             ← bibliography (BibTeX)
│   └── case-studies/              ← 3 concrete runs with full rubric traces
│
├── Makefile                       ← make validate · test · lint · typecheck · eval · audit
├── pyproject.toml                 ← project metadata, deps, tool config
├── CLAUDE.md                      ← development rules (Claude Code contract)
├── .pre-commit-config.yaml        ← hooks
└── .github/workflows/ci.yml       ← validate · test · lint · mypy · audit · eval-mock

Contributing

Start from templates/base-module.md. Every new module must ship with:

Required — no exceptions

A one-sentence Purpose (no "and").
An explicit Identity block (role + what it refuses to be).
A Core logic block (numbered, not prose).
A Constraints block with at least 3 forbidden modes.
An Output format with a literal refusal string.
A Test prompt — a concrete input.
An Expected behavior clause — reviewers check this, not vibes.

Modules without tests are rejected. No exceptions.

Version bump rules

Change	Bump
Output shape / refusal condition / identity changes	major
New test prompt, new edge case, new constraint	minor
Wording clarification, typo, example fix	patch

Deprecation: set status: deprecated, add a Deprecated: pointer to the replacement, do not delete. Old dependents still work.

Related Work

Project	What it is
`neuron7xLab/GeoSync`	Geometric market intelligence — Kuramoto · Ricci · thermodynamics · 57 invariants
`neuron7xLab/neurophase`	Phase synchronization as execution gate — brain × market oscillators
`neuron7xLab/neosynaptex`	γ-scaling diagnostics across biological, physical, cognitive substrates
`neuron7xLab/mycelium-fractal-net`	Morphogenetic field engine — reaction-diffusion + TDA + causal rules

Engineering Discipline

  pxl/                              ← the Python package
  ├── models.py       Pydantic v2 mirrors
  ├── assembly.py     section parser
  ├── providers.py    Anthropic · OpenAI · Mock
  ├── judge.py        LLM-as-judge rubric
  ├── runner.py       end-to-end harness
  ├── validator.py    frontmatter validator
  ├── audit.py        SHA256 layer-05 audit
  ├── badges.py       real badge generator
  └── cli.py          entry points

  schemas/
  ├── module.schema.json
  ├── eval-spec.schema.json
  └── eval-result.schema.json

  evals/
  ├── specs/          10 YAML specs (20 cases)
  └── results/        badges.json + JSON runs

  tests/              22 pytest tests, 10 files
  docs/               methodology · algebra · references.bib

prompt-x-lab v0.3.0 ships with a real engineering harness — not a demo, not a marketing claim. Every discipline the library preaches in its seed modules is enforced mechanically against its own content:

Frontmatter is a Pydantic model (pxl.models.ModuleFrontmatter) mirrored by a JSON Schema (schemas/module.schema.json). pxl-validate walks every .md file in layers 00–05 and fails loudly on the first schema violation.
Evaluation specs are Pydantic models + JSON Schema. Every seed module in layers 01–04 has exactly one spec under evals/specs/, each with a positive and an adversarial case.
The eval runner (pxl-eval) assembles the module's Identity + Core logic + Constraints + Output sections into a system prompt, calls Claude Opus (or GPT-4o, or a Mock provider), and scores the output with an LLM-as-judge rubric — also tested, also under pxl invariants.
Layer 05 is integrity-audited via SHA256 hash of every module's body (pxl-audit verify). A drift in any of the 26 orchestration modules fails CI.
Badges are computed from real JSON results, not hand-written. When there are no results, the badge says no-runs-yet. This is honest by construction.

Gate	Command	What it proves
`frontmatter`	`pxl-validate`	All 39 modules conform to the Pydantic schema.
`pytest`	`pytest -q`	22 unit tests covering validator, assembly, audit, judge, runner, models.
`ruff`	`ruff check src scripts evals tests`	Style and lint rules (E, F, I, B, UP, N, SIM, RUF, ANN).
`mypy --strict`	`mypy src`	Full type check across the 10 `pxl` source files.
`audit`	`python -m pxl.audit verify`	SHA256 body integrity of all 26 orchestration modules.
`eval · mock`	`pxl-eval --provider mock`	End-to-end harness plumbing (no API key required).
`eval · real`	`pxl-eval`	Live rubric evaluation against Claude Opus 4.6.

Composition algebra

docs/composition-algebra.md specifies, in EBNF and in type rules, how modules compose into a well-formed system prompt. Key invariants:

Grammar — identity, constraint, scaffold?, domain, output, gate*
Layer ordering — monotonic (with documented exceptions for gates)
Vector compatibility — no strategic + creative collisions
Refusal-path preservation — every stack keeps at least one literal REFUSED: path reachable from the Constraint block through the Output block.

A composition is typed (P, R) — positive invariants + refusal conditions. A reviewer reads off (P, R) in under a minute or the composition is not well-formed.

Evaluation protocol

docs/evaluation-protocol.md specifies the epistemology of pass/fail:

Strict threshold — a case passes iff every rubric item is satisfied (≥ 0.999). No partial credit.
Adversarial cases carry equal weight — every spec has at least one.
Judge under test — tests/test_judge.py feeds the judge known-good and known-bad outputs and asserts correct scoring.
Provider-agnosticism — one provider ≠ validated; cross-model validated means two providers from two vendors.
Out of scope — foundation primitives (meta-templates, §8.1) and layer 05 (multi-page runtime-bound, §8.2).

Bibliography & prior art

Every seed module now has a Prior art section naming its intellectual ancestors. Citations are collected in docs/references.bib — Peirce for inference types, Feathers for refactoring, Halmos for Socratic teaching, Kahneman for the Executive Engine, QuickCheck for property tests, Popper for falsifiability, Horowitz for advising.

A claim without a prior-art anchor is not allowed to ship. The rule is mechanical: if the module references a technique that is not in the bibliography, the module is rejected.

Case studies

docs/case-studies/ contains three concrete runs — the unbounded-cache PR review, the Fibonacci(n ≤ 10¹⁸) trap for Executive Engine, and the Apollo 11 hallucination gate. Each study shows the exact input, exact output, rubric trace, verdict, and the adversarial variant that would have broken it. Case studies are the second pillar of falsifiability in the repo (the first is the eval harness).

Layer 05 · Orchestration

  05_orchestration/
  ├── protocols/     6
  ├── agents/        9
  ├── frameworks/    5
  ├── crypto/        3
  └── research/      3
                    ──
                    26

Provenance: Advanced Orchestration v1 Status: integrated verbatim License: single-owner proprietary Packaging: prompt-x-lab native

The first four layers are primitives: short, hand-written, fit-on-one-screen. Layer 05 is the opposite — production-sized systems that would drown a foundation layer, adapted here without a single byte of content change.

Every module in 05_orchestration/ is a whole system on its own: a Codex PR agent, a scientific-simulator transformation protocol, a crypto order-flow framework, a research-methodology contract. Each carries its origin in the frontmatter (origin: Advanced Orchestration v1 bundle) and its original file name (source_file:).

Composition rule preserved: an orchestration module is still wrapped top-down by 00_foundation/ (identity + constraint + output). Layer 05 does not break the stack — it is the domain tier, sitting above scaffolds and below validation gates.

Category	Path	Modules	Contents
`PROTOCOLS`	`05_orchestration/protocols/`	6	SPST · DSIO · IOA · LRE · PGE · SMLRS — execution protocols for Codex/Principal-Eng level repo transformations.
`AGENTS`	`05_orchestration/agents/`	9	Pull-request automation agents: transform, audit, stabilise, and ship repository-scale changes deterministically.
`FRAMEWORKS`	`05_orchestration/frameworks/`	5	Flagship long-form frameworks — multi-phase, multi-contract, multi-artifact operators.
`CRYPTO`	`05_orchestration/crypto/`	3	Crypto & trading systems — order-flow, regime detection, quant pipeline integration.
`RESEARCH`	`05_orchestration/research/`	3	Methodology & research protocols — reproducibility, evidence-bound inference, falsification ladders.

Layer 06 · ECA Cognitive Engine

  06_eca_engine/                    34 files
  ├── core/           6  prompt, tiers, config, templates
  ├── runtime/        4  policy, fallback, router, budget
  ├── benchmarks/     3  metrics, rubric, protocol
  ├── security/       3  model, guardrails, provenance
  ├── schemas/        2  request, response envelopes
  ├── legal/          1  EULA template
  └── docs/          15  architecture · calibration ·
                         operations · release notes

  src/pxl/eca/                      typed subsystem
  ├── schemas.py      Pydantic envelopes (strict)
  ├── config.py       12 Pydantic config models
  ├── router.py       port of route_request
  ├── scorer.py       7-dim quality scorecard
  ├── signer.py       HMAC-SHA256 provenance
  ├── validate.py     full-stack replay
  ├── cli.py          pxl-eca entry point
  ├── assets/         13 bundled YAML/JSON/TXT
  └── datasets/       10 calibration datasets

ECA v1.1.0 · selected_iteration 27 · 77-iter calibrated 22 pytest tests · mypy --strict clean · zero FP

ECA is a production-candidate cognitive operating layer: one synchronised reasoning system routed across six modes (deep_analysis, executive_decision_brief, system_architecture_blueprint, human_performance_protocol, cognitive_error_audit, implementation_roadmap), with a seven-dimensional quality scorecard and calibrated shipping thresholds.

Unlike layer 05, which is a verbatim text copy, layer 06 is a full native integration:

34 content files under 06_eca_engine/ with prompt-x-lab frontmatter and source_sha256 provenance.
Typed Python port under src/pxl/eca/ — Pydantic v2 models for every config and envelope, pure functions for router/scorer/signer, mypy --strict clean across 18 source files.
CLI: pxl-eca info / validate / route / score / sign.
Reproduction tests (22 pytest cases) that replay the full calibration corpus and enforce the numerical contract as CI gates. The calibration chain is protected by code, not by documentation.
SHA256 audit of every body in the layer — python -m pxl.audit verify fails CI on drift.

Reproduced calibration numbers (full-corpus replay):

Metric	Holdout (orig)	Full-corpus (bundled)
Router synthetic accuracy	100%	99.44% (178/180)
Router adversarial accuracy	100%	100%
Scorer balanced accuracy	91.67%	90.62% (174/192)
Scorer F1	90.91%	89.66%
Scorer false positives	0	0

The zero-FP invariant is load-bearing: across the entire synthetic response corpus, the scorer has never green-lit a response it should have blocked. This is the strongest single property of the ECA quality gate.

Layer 07 · Kriterion

  07_kriterion/               18 files
  ├── protocols/    6  security-role protocols
  │                    SE-OPS · SSE · ESA
  │                    PSE · DSE · GPT-5.4 audit
  ├── schemas/      9  canonical evaluation
  │                    CanonicalArtifact
  │                    EvaluationResult
  │                    TaskScore · DomainScore
  │                    GateResult · Handoff
  │                    ReferenceInputBundle
  │                    ArtifactValidationResult
  │                    GovernanceInvariantRegistry
  └── methodology/  3  methodology · threat model
                       · anti-fragile reasoning

  src/pxl/kriterion/          typed kernel
  ├── canonical.py    the 180-line fail-closed
  │                   mathematical core: canonical
  │                   bytes, genesis/step hashes,
  │                   ExecutionChain builder
  ├── schemas.py      jsonschema + referencing
  ├── protocols.py    protocol loaders
  ├── benchmark.py    10-case reproduction
  └── cli.py          pxl-kriterion entry point

Kriterion v2026.4.5 · 10/10 reproduction · 40 tests

Kriterion's contribution is a single load-bearing idea: if every evaluation phase hashes its canonical input and links to the previous phase, the whole pipeline is a tamper-evident chain. Any modification to any phase invalidates every subsequent hash. Fail-closed by construction.

Layer 07 integrates only the reusable kernel:

Six protocols, adapted verbatim with source_sha256 provenance.
Nine JSON schemas — the canonical evaluation envelope hierarchy.
Three methodology documents — the signal, not the noise. No business copy, no HTML dashboard, no governance plumbing from the upstream bundle.
Typed Python kernel in src/pxl/kriterion/canonical.py — canonical_bytes, sha256_hex, build_genesis_hash, build_step_hash, and an ExecutionChain builder. Zero dependencies beyond the standard library. MIT-licensed independently of the content.
Ten-case reproduction — pxl-kriterion benchmark replays the upstream dataset_manifest.json hashes through the typed kernel and must match all ten byte-for-byte. Enforced by CI.

The kernel is reusable for any audit pipeline, not just Kriterion's. If you want fail-closed evaluation:

from pxl.kriterion import ExecutionChain, Phase

chain = ExecutionChain.start(my_bundle, contract_version="1.0.0")
for phase in Phase:
    chain.advance(phase, phase_input=phase_state[phase])
# chain.terminal_hash now proves the entire execution

Seven properties guarantee fail-closedness: canonical-form uniqueness, domain separation of genesis and step hashes, linear chain structure, format-version baking, contract-version baking, bundle-dependent genesis, and re-derivable terminal hash. Every property is enforced by a dedicated test in tests/test_kriterion_canonical.py.

License: content under AGCL-1.0 (community, non-commercial); canonical.py is MIT-licensed independently because the mathematical ideas (canonical JSON, domain separation, chain linking) are not copyrightable as such.

Visual Study — ECA Cognitive Engine v1.1

ECA Cognitive Engine v1.1 — Neuro-Fractal Recursive Architecture

_{Radial recursive expansion · Metatron's Cube core · seven fractal rings · golden-ratio Trinity arcs · bilateral symmetry — organic left, crystalline right · void-shift palette.}

     singularity      →   the input void
     flower of life   →   sacred substrate
     metatron spokes  →   13-point topology
     seven rings      →   stages I → VII
     trinity arcs     →   C · N · S @ φ
     L-system left    →   organic growth
     crystalline R    →   architectural lattice
     white criticals  →   9 emphasis nodes

No flowchart. No boxes. No rectangles. Every element obeys a mathematical rule:

Singularity — the central point from which structure unfolds.
Flower of Life + Metatron — the 19-circle sacred substrate, over-laid with hexagram and six-point spokes.
Seven concentric rings — each a fractal chain of interlocking module dots; ring radii follow a quasi-golden progression (200 → 830).
Trinity arcs — Cognitive · Neuro · System — three 80° arcs at r=900, separated by 40° golden gaps.
L-system branches — left side is bezier/organic, right side is polyline/crystalline. Bilateral symmetry intentional and broken.
Critical data points — nine true-white nodes across rings 3 · 5 · 7, forming two rotated triangles.
Mathematical etching — γ ≈ 1.0, φ = 1.618…, MWC, Σ, ∇, ∂/∂t, ℝⁿ — set in serif italic, opacity 0.45, like notation cut into glass.

Palette: #050505 void · #FF00FF → #8B0000 glow · #FFFFFF criticals. Stroke: 0.4 – 2.2pt. Every line load-bearing.

MIT · Solo · Ukraine 🇺🇦 · 2026

"Don't trust anyone. Don't even trust yourself." _{— Elon Musk, Lex Fridman Podcast #400}

This repo is a discipline, not a catalog. Every module earns its place.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
.metadata		.metadata
00_foundation		00_foundation
01_cognition		01_cognition
02_engineering		02_engineering
03_personas		03_personas
04_validation		04_validation
05_orchestration		05_orchestration
06_eca_engine		06_eca_engine
07_kriterion		07_kriterion
benchmarks		benchmarks
docs		docs
evals		evals
schemas		schemas
scripts		scripts
src/pxl		src/pxl
templates		templates
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

p r o m p t x l a b

First principles · applied

At scale · the arithmetic of cognitive infrastructure

At a glance

The Signal

Architecture

The Five Design Commitments

1. One module, one job

2. Explicit over implicit

3. Fail loudly

4. Cite the source

5. Version everything

The test for every module

Seed Modules

Composition Pattern

Choosing a Scaffold

Quickstart

Anti-Patterns This Repository Refuses to Ship

Repository Layout

Contributing

Related Work

Engineering Discipline

Composition algebra

Evaluation protocol

Bibliography & prior art

Case studies

Layer 05 · Orchestration

Layer 06 · ECA Cognitive Engine

Layer 07 · Kriterion

Visual Study — ECA Cognitive Engine v1.1

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`p r o m p t x l a b`

Packages