Skip to content

neuron7xLab/prompt-x-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

prompt x lab — cognitive library



prompt x lab tagline

p r o m p t x l a b

A curated library of high-fidelity prompts, cognitive architectures, and agent protocols. Every module is typed, versioned, tested, composable, and falsifiable.


version-0.8.0 layers-8 pytest-142 mypy-strict chains-5.3K musk-5step parallel-primitive license-MIT


Claude 4.6 Claude Thinking GPT-5.4 / o1 Llama 4 Ollama edge Python 3.12 Pydantic v2 Ruff pytest + hypothesis mkdocs-material Ukraine

One file. One job. One test. One refusal path. No essays. No emoji. No runtime.

First principles · applied

This repository is structured around Elon Musk's five-step engineering algorithm. Every release can be placed in exactly one of the five steps:

1. Make the requirements less dumb → we rewrote "ship as many prompts as possible" to "every prompt ships with a frontmatter schema, a test spec, a refusal condition, and a prior-art citation." 2. Delete the part or process → we removed three aspirational eval specs (foundation primitives are meta-templates), the tested badge that was not true, and the ~4,000-line upstream Kriterion reference runner (we kept the 180-line kernel). 3. Simplify or optimise → unified pxl CLI replaced six legacy entry points; canonical.py replaced 4,000 LOC with 180 LOC; ExecutionChain is a 25-line dataclass. 4. Accelerate cycle time → full local quality gate runs in ~15 s; full CI runs in ~1 m 6 s across 142 tests and 3-layer audit. 5. Automate → seven GitHub Actions jobs, a tag-triggered release workflow, and pxl dashboard regenerates every number from real artifacts on every call. Automation came last, not first.

See docs/first-principles.md for the full retrospective audit with deletions traced to commits, honest exceptions, and the discipline check ("you should have to add back at least 10% of what you delete").

At scale · the arithmetic of cognitive infrastructure

On a single commodity core, the canonical primitive produces 5,290 full tamper-evident seven-phase execution chains per second. Linear scaling holds because each chain is an independent, pure-function unit.

Cores Hardware class Full audit chains/sec
1 laptop 5.29 K
8 workstation 42 K
96 dual-socket server 508 K
10 K small Kubernetes cluster 52.9 M
100 K frontier HPC 529 M
1 M hypothetical hyperscale 5.29 B

Key observation: the world currently produces ~5 × 10⁵ LLM responses per second across all vendors combined. A single rackmount server running pxl.scale.batch_execution_chain at 96 cores produces 508 K chains/sec — enough to audit every LLM response on Earth, in real time, with compute cost < 1% of the inference cost that produced them.

The canonical primitive is never the bottleneck. The LLM is. This ratio holds to 10⁸ responses/sec and beyond, which is where the 2030 credible projections put the industry.

src/pxl/scale.py exposes the parallel primitive as a public API:

from pxl.scale import batch_execution_chain, parallel_audit_all, batch_canonical_hash

# Audit every integrated layer in parallel (3-layer, <1ms)
results = parallel_audit_all()

# Compute full 7-phase chains for a million bundles (embarrassingly parallel)
terminal_hashes = batch_execution_chain(
    bundles,
    contract_version="2026.04",
    max_workers=96,
)

Parallel versions are tested byte-for-byte against their serial counterparts. If the primitive diverges between cores, CI fails. Reproducibility is not negotiated.

See docs/scaling.md for the full scaling math, compute-cost analysis, and the argument for universal deployment.

At a glance

eight-layer cognitive architecture


$ pip install https://github.com/neuron7xLab/prompt-x-lab/releases/download/v0.7.0/prompt_x_lab-0.7.0-py3-none-any.whl

$ pxl dashboard
  ╭──────────────────────────────────────╮
  │  p r o m p t   x   l a b             │
  │  version 0.7.0  ·  production-stable │
  ╰──────────────────────────────────────╯
  ┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳─────────┳───────┓
  ┃ Layer            ┃ Kind       ┃ Modules ┃ Audit ┃
  ┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
  │ 00 FOUNDATION    │ primitives │    3    │   —   │
  │ 01 COGNITION     │ scaffolds  │    3    │   —   │
  │ 02 ENGINEERING   │ seed       │    3    │   —   │
  │ 03 PERSONAS      │ seed       │    2    │   —   │
  │ 04 VALIDATION    │ gates      │    2    │   —   │
  │ 05 ORCHESTRATION │ verbatim   │   26    │   ✓   │
  │ 06 ECA ENGINE    │ typed port │   34    │   ✓   │
  │ 07 KRITERION     │ kernel     │   18    │   ✓   │
  ├──────────────────┼────────────┼─────────┼───────┤
  │ TOTAL            │            │   91    │       │
  └──────────────────┴────────────┴─────────┴───────┘
  Subsystem health · ECA router 99.44% · scorer 90.62% · FP=0
                   · Kriterion 10/10 reproduction · canonical kernel 180 LOC
  ✓ validate 91 modules  ✓ audit 3 layers  · commit 7e50a5f
What you get Number How
Layers in the stack 8 foundation → cognition → engineering → personas → validation → orchestration → ECA → Kriterion
Total modules 91 hand-written seed (13) + verbatim (26) + typed ports (52)
pytest tests 129 unit · integration · 12 hypothesis property-based · 10 benchmarks
mypy --strict files 31 one-to-one with runtime imports
SHA-256 audit layers 3 78 hashed bodies, CI-verified on every push
step_hash latency 4.0 μs 250K ops/sec on commodity hardware — never the bottleneck
Full 7-phase chain 189 μs 5.3K audited evaluations/sec/core
ECA routing 18 μs 56K requests/sec/core
Kriterion reproduction 10/10 byte-for-byte match against upstream dataset_manifest.json
CLI entry points 7 pxl (unified) + 6 legacy aliases · python -m pxl works

The Signal

Prompt engineering is in the phase that software engineering was in before version control. Most prompts live in Notion pages, scratch files, and Slack messages. When they work, nobody knows why. When they break, nobody knows what changed.

Prompt X Lab is the opposite of that.

   layer 07 ── Kriterion (18 content · canonical primitive · benchmark)
      ▲
   layer 06 ── ECA cognitive engine (34 content · Python port · reproduction)
      ▲
   layer 05 ── orchestration (26 long-form systems)
      ▲
   layer 04 ── validation gates
      ▲
   layer 03 ── interactive personas
      ▲
   layer 02 ── engineering modules
      ▲
   layer 01 ── cognitive scaffolds
      ▲
   layer 00 ── foundation primitives
      ▲
   ────────────────────────────────
   the model

Lower layers compose into higher layers. Every module imports only from layers below it.

Every module here is:

  • Typed — declares category, vector, target models.
  • Versioned — filename embeds SemVer. No silent mutations.
  • Tested — ships with a test prompt + expected behavior. Modules without tests are rejected.
  • Composable — depends on lower-layer modules by reference.
  • Falsifiable — every claim the module makes is something a reviewer can check in five minutes.
  • Model-agnostic — tuned on Claude 4.6, GPT-5.4, Llama-4; tested across all three.

This is not a prompt zoo. It is an engineering library.

Architecture

                      ┌─────────────────────────────────────────────────────┐
                      │              P R O M P T   X   L A B                │
                      │   6-layer · 13 seed + 26 orchestration · text-only  │
                      └─────────────────────────┬───────────────────────────┘
                                                │
          ┌─────────────────────────────────────┼─────────────────────────────────────┐
          │                                     │                                     │
  ┌───────▼────────┐  ┌───────────────┐  ┌──────▼───────┐  ┌───────────────┐  ┌───────▼────────┐
  │ 00 FOUNDATION  │  │ 01 COGNITION  │  │ 02 ENGINEER  │  │ 03 PERSONAS   │  │ 04 VALIDATION  │
  │ identity       │  │ executive-eng │  │ senior-rev   │  │ socratic tutor│  │ hallucination  │
  │ constraint     │  │ creator-crit  │  │ legacy-rfctr │  │ strat advisor │  │ fallacy check  │
  │ output         │  │ CoT scaffold  │  │ test-gen     │  │               │  │                │
  └────────┬───────┘  └───────┬───────┘  └──────┬───────┘  └───────┬───────┘  └────────┬───────┘
           │                  │                 │                  │                   │
           └──────────────────┴─────────────────┼──────────────────┴───────────────────┘
                                                │
                            ┌───────────────────▼─────────────────────┐
                            │         05  O R C H E S T R A T I O N  │
                            │   protocols · agents · frameworks       │
                            │       · crypto · research               │
                            │   (26 production-grade long systems)    │
                            └───────────────────┬─────────────────────┘
                                                │
                            ┌───────────────────▼─────────────────────┐
                            │          COMPOSITION  PATTERN           │
                            │   identity + constraint + scaffold      │
                            │      + domain + output + gate           │
                            └─────────────────────────────────────────┘
Layer Path Count Purpose
FOUNDATION00_foundation/3Primitives every module inherits — identity, constraint, output shape.
COGNITION01_cognition/3Thinking scaffolds — executive-engine, creator-critic-verifier, CoT scaffold.
ENGINEERING02_engineering/3Senior code reviewer, legacy refactor surgeon, property-test generator.
PERSONAS03_personas/2Stateful interactive agents — Socratic tutor, strategic advisor.
VALIDATION04_validation/2Adversarial output gates — hallucination gate, fallacy checker.
ORCHESTRATION05_orchestration/26Production long-form systems — execution protocols, PR agents, flagship frameworks, crypto/trading, research methodology.
ECA ENGINE06_eca_engine/34ECA v1.1 cognitive engine — content, typed Python port, reproduction tests. 77-iter calibrated.
KRITERION07_kriterion/18Kriterion v2026.4.5 — fail-closed evaluation primitive, 6 protocols, 9 schemas, canonical hashing kernel, 10-case reproduction.

The Five Design Commitments

1. One module, one job

If a prompt does two things, split it. The test: can you describe what it does in one sentence without using "and"?

2. Explicit over implicit

"Be thoughtful" is banned. "Cite the exact line number" is required. Every module states its role, its forbidden modes, and its output shape.

3. Fail loudly

Every module has a literal refusal string. Graceful degradation into plausible-sounding hallucination is the default failure mode of prompt engineering — this repo refuses to ship it.

4. Cite the source

Modules reference the paper, heuristic, or incident they come from. Feathers for refactoring. Peirce for inference types. Halmos for Socratic teaching. No mystery meat.

5. Version everything

Breaking changes bump the filename version. executive-engine.mdexecutive-engine-v2.md. The old file gets status: deprecated but is never deleted — dependents still work.

The test for every module

Four questions — if any answer is no, the module is not ready:

  1. One-sentence description without "and"?
  2. A specific naive input handled correctly?
  3. At least one explicit refusal?
  4. A simpler version that still works?

Seed Modules

    ╭──────╮
   │  WHO   │
    ╰──┬───╯
       │
     identity

Identity
identity-primitive

    ╭──────╮
   │ FORBID │
    ╰──┬───╯
       │
    constraints

Constraint
constraint-primitive

    ╭──────╮
   │ SHAPE  │
    ╰──┬───╯
       │
     output

Output
output-primitive

   plan
    │
    ▼
   exec
    │
    ▼
  critic

Executive
executive-engine

  create
   × ↓
  critic
   × ↓
  verify

C-C-V Triad
creator-critic-verifier

   obs
    │
   infer
    │
   open?
    │
   decide

CoT Scaffold
chain-of-thought

   ╭───╮
   │ PR │
   ╰─┬─╯
     ▼
   review

Senior Reviewer
senior-code-reviewer

   legacy
     │
  char-tests
     │
   refactor

Refactor Surgeon
legacy-refactor-expert

  sig + doc
     │
  invariants
     │
   properties

Test Generator
test-generator

   ?
    │
    ?
    │
    ?
   insight

Socratic Tutor
socratic-tutor

  restate
    │
  unsaid
    │
  decide
    │
  24h act

Strategic Advisor
strategic-advisor

 ctx + draft
     │
  audit
     │
  PASS/FAIL

Hallucination Gate
hallucination-gate

  argument
     │
  taxonomy
     │
  steelman

Fallacy Checker
logical-fallacy-checker

   ┌───┐
   │ + │
   │ → │
   └───┘

Your Module
start from template

   ┌───┐
   │ ✕ │
   │ ✕ │
   └───┘

Taxonomy
taxonomy.json

Composition Pattern

Modules are designed to stack. A typical production prompt is assembled top-down from lower to higher layers:

┌─────────────────────────────────────────────────────────────────────┐
│  IDENTITY       ← 00_foundation/identity-primitive.md               │
│    Who the model is. What it refuses to be.                          │
├─────────────────────────────────────────────────────────────────────┤
│  CONSTRAINTS    ← 00_foundation/constraint-primitive.md              │
│    The 3 failure modes it must never produce.                        │
├─────────────────────────────────────────────────────────────────────┤
│  SCAFFOLD       ← 01_cognition/executive-engine.md                   │
│    How it thinks: planner → executor → critic.                       │
├─────────────────────────────────────────────────────────────────────┤
│  DOMAIN         ← 02_engineering/senior-code-reviewer.md             │
│    What it does today: review this PR.                               │
├─────────────────────────────────────────────────────────────────────┤
│  OUTPUT         ← 00_foundation/output-primitive.md                  │
│    Exact shape: sections, lengths, refusal string.                   │
├─────────────────────────────────────────────────────────────────────┤
│  GATE           ← 04_validation/hallucination-gate.md                │
│    Adversarial filter. PASS or REFUSED.                              │
└─────────────────────────────────────────────────────────────────────┘
                               │
                               ▼
                        trusted output

The stack reads top-down: who you are → what you must not do → how you think → what you're doing today → how you answer → who double-checks the answer.

Choosing a Scaffold

Situation Use this Why
Task is well-defined; first answer is usually right No scaffold. Direct prompt. Scaffolds cost tokens. Don't pay for what you don't need.
Non-trivial task where the model could miss a failure mode executive-engine Planner + Executor + Critic catches its own mistakes.
High-stakes decision where a polished wrong answer is expensive creator-critic-verifier Adversarial triad; the Critic is structurally unable to rubber-stamp.
Reasoning-heavy task where every step must be auditable chain-of-thought-scaffold Every inference is tagged with its type (deductive / abductive / …).

Rule of thumb: the right scaffold is the smallest one that works. Don't use a triad for a task a direct prompt would solve.

Quickstart

# clone
git clone https://github.com/neuron7xLab/prompt-x-lab.git
cd prompt-x-lab

# pick a module
cat 02_engineering/senior-code-reviewer.md

# copy the Identity, Core logic, and Constraints sections into your system prompt
# paste the diff you want reviewed as the user message
# read the output — if it's bad, file an issue with the exact input that broke it

No installation. No dependencies. No runtime. Just text.

Anti-Patterns This Repository Refuses to Ship

╳  prompt essays              long prose walls — models don't read them, they pattern-match on them
╳  vibey role-play             "you are a world-class expert" — this is a vibe, not a constraint
╳  emoji-heavy UX              cute, widely copied, and almost always padding
╳  "let's think step by step"  vague, mostly superseded — use a typed CoT scaffold instead
╳  self-referential templates  if your prompt talks about prompts, it's methodology not a module
╳  silent hallucination        refuse loudly; never degrade into plausible-sounding guesses

Repository Layout

prompt-x-lab/
├── README.md                      ← you are here
├── LICENSE                        ← MIT
├── CHANGELOG.md                   ← Keep a Changelog + SemVer
├── .gitignore
│
├── .github/
│   └── assets/
│       ├── banner-dark.svg        ← minimalist RGB-on-black banner
│       ├── banner-light.svg       ← same composition, paired
│       ├── divider.svg            ← RGB gradient divider
│       ├── eca-cognitive-engine.svg  ← neuro-fractal visual study
│       ├── crest-{360,720,1080}.webp ← Advanced Orchestration crest
│       └── crest.manifest.json    ← crest variant metadata
│
├── .metadata/
│   ├── taxonomy.json              ← machine-readable layer graph
│   └── manifest.yaml              ← module inventory with status + vectors
│
├── 00_foundation/                 ← primitives every module inherits
│   ├── README.md
│   ├── identity-primitive.md
│   ├── constraint-primitive.md
│   └── output-primitive.md
│
├── 01_cognition/                  ← thinking architectures
│   ├── README.md
│   ├── executive-engine.md
│   ├── creator-critic-verifier.md
│   └── chain-of-thought-scaffold.md
│
├── 02_engineering/                ← code synthesis, review, refactor, tests
│   ├── README.md
│   ├── senior-code-reviewer.md
│   ├── legacy-refactor-expert.md
│   └── test-generator.md
│
├── 03_personas/                   ← stateful interactive agents
│   ├── README.md
│   ├── socratic-tutor.md
│   └── strategic-advisor.md
│
├── 04_validation/                 ← adversarial output gates
│   ├── README.md
│   ├── hallucination-gate.md
│   └── logical-fallacy-checker.md
│
├── 05_orchestration/              ← Advanced Orchestration v1 (26 modules)
│   ├── README.md
│   ├── protocols/    (6)          ← SPST · DSIO · IOA · LRE · PGE · SMLRS
│   ├── agents/       (9)          ← PR automation agents
│   ├── frameworks/   (5)          ← flagship long-form frameworks
│   ├── crypto/       (3)          ← crypto & trading systems
│   └── research/     (3)          ← methodology & research protocols
│
├── 06_eca_engine/                 ← ECA Cognitive Engine v1.1 (34 files)
│   ├── README.md
│   ├── core/         (6)          ← prompt, proof tiers, config, templates
│   ├── runtime/      (4)          ← policy, fallback, router spec, budget
│   ├── benchmarks/   (3)          ← metrics, rubric, live protocol
│   ├── security/     (3)          ← model, guardrails, provenance
│   ├── schemas/      (2)          ← request + response envelopes
│   ├── legal/        (1)          ← EULA template
│   ├── docs/        (15)          ← architecture · calibration · ops
│   └── AUDIT.sha256               ← body audit (34 entries)
│
├── 07_kriterion/                  ← Kriterion v2026.4.5 (18 files)
│   ├── README.md
│   ├── protocols/    (6)          ← 6 security-role protocols
│   ├── schemas/      (9)          ← 9 canonical evaluation schemas
│   ├── methodology/  (3)          ← methodology · threat model · reasoning
│   └── AUDIT.sha256               ← body audit (18 entries)
│
├── templates/
│   └── base-module.md             ← start every new module here
│
├── src/pxl/                       ← Python package (mypy --strict, 24 files)
│   ├── models.py, assembly.py, providers.py, judge.py
│   ├── runner.py, validator.py, audit.py, badges.py, cli.py
│   ├── eca/                       ← ECA v1.1 typed subsystem
│   │   ├── schemas.py, config.py, router.py, scorer.py
│   │   ├── signer.py, validate.py, cli.py
│   │   ├── assets/                ← bundled YAML / JSON / TXT
│   │   └── datasets/              ← 180 req + 192 resp + holdouts
│   ├── kriterion/                 ← Kriterion minimalist kernel
│   │   ├── canonical.py           ← 180-line fail-closed primitive
│   │   ├── schemas.py, protocols.py
│   │   ├── benchmark.py, cli.py
│   │   ├── assets/                ← 9 schemas + 6 protocols
│   │   └── datasets/              ← 10 synthetic cases + manifest
│   └── py.typed
│
├── schemas/                       ← JSON Schemas — single source of truth
│   ├── module.schema.json
│   ├── eval-spec.schema.json
│   └── eval-result.schema.json
│
├── evals/                         ← evaluation harness
│   ├── specs/ (10 YAML, 20 cases)
│   └── results/ badges.json · run JSONs
│
├── tests/                         ← pytest suite (22 tests · 6 files)
│
├── docs/
│   ├── methodology.md             ← why this library exists
│   ├── naming-convention.md       ← SemVer + filename rules
│   ├── usage-guide.md             ← composition, scaffold selection
│   ├── composition-algebra.md     ← EBNF grammar + type rules
│   ├── evaluation-protocol.md     ← pass/fail epistemology
│   ├── references.bib             ← bibliography (BibTeX)
│   └── case-studies/              ← 3 concrete runs with full rubric traces
│
├── Makefile                       ← make validate · test · lint · typecheck · eval · audit
├── pyproject.toml                 ← project metadata, deps, tool config
├── CLAUDE.md                      ← development rules (Claude Code contract)
├── .pre-commit-config.yaml        ← hooks
└── .github/workflows/ci.yml       ← validate · test · lint · mypy · audit · eval-mock

Contributing

Start from templates/base-module.md. Every new module must ship with:

Required — no exceptions

  1. A one-sentence Purpose (no "and").
  2. An explicit Identity block (role + what it refuses to be).
  3. A Core logic block (numbered, not prose).
  4. A Constraints block with at least 3 forbidden modes.
  5. An Output format with a literal refusal string.
  6. A Test prompt — a concrete input.
  7. An Expected behavior clause — reviewers check this, not vibes.

Modules without tests are rejected. No exceptions.

Version bump rules

Change Bump
Output shape / refusal condition / identity changes major
New test prompt, new edge case, new constraint minor
Wording clarification, typo, example fix patch

Deprecation: set status: deprecated, add a Deprecated: pointer to the replacement, do not delete. Old dependents still work.

Related Work

Project What it is
neuron7xLab/GeoSync Geometric market intelligence — Kuramoto · Ricci · thermodynamics · 57 invariants
neuron7xLab/neurophase Phase synchronization as execution gate — brain × market oscillators
neuron7xLab/neosynaptex γ-scaling diagnostics across biological, physical, cognitive substrates
neuron7xLab/mycelium-fractal-net Morphogenetic field engine — reaction-diffusion + TDA + causal rules

Engineering Discipline

  pxl/                              ← the Python package
  ├── models.py       Pydantic v2 mirrors
  ├── assembly.py     section parser
  ├── providers.py    Anthropic · OpenAI · Mock
  ├── judge.py        LLM-as-judge rubric
  ├── runner.py       end-to-end harness
  ├── validator.py    frontmatter validator
  ├── audit.py        SHA256 layer-05 audit
  ├── badges.py       real badge generator
  └── cli.py          entry points

  schemas/
  ├── module.schema.json
  ├── eval-spec.schema.json
  └── eval-result.schema.json

  evals/
  ├── specs/          10 YAML specs (20 cases)
  └── results/        badges.json + JSON runs

  tests/              22 pytest tests, 10 files
  docs/               methodology · algebra · references.bib

prompt-x-lab v0.3.0 ships with a real engineering harness — not a demo, not a marketing claim. Every discipline the library preaches in its seed modules is enforced mechanically against its own content:

  • Frontmatter is a Pydantic model (pxl.models.ModuleFrontmatter) mirrored by a JSON Schema (schemas/module.schema.json). pxl-validate walks every .md file in layers 00–05 and fails loudly on the first schema violation.
  • Evaluation specs are Pydantic models + JSON Schema. Every seed module in layers 01–04 has exactly one spec under evals/specs/, each with a positive and an adversarial case.
  • The eval runner (pxl-eval) assembles the module's Identity + Core logic + Constraints + Output sections into a system prompt, calls Claude Opus (or GPT-4o, or a Mock provider), and scores the output with an LLM-as-judge rubric — also tested, also under pxl invariants.
  • Layer 05 is integrity-audited via SHA256 hash of every module's body (pxl-audit verify). A drift in any of the 26 orchestration modules fails CI.
  • Badges are computed from real JSON results, not hand-written. When there are no results, the badge says no-runs-yet. This is honest by construction.
Gate Command What it proves
frontmatterpxl-validateAll 39 modules conform to the Pydantic schema.
pytestpytest -q22 unit tests covering validator, assembly, audit, judge, runner, models.
ruffruff check src scripts evals testsStyle and lint rules (E, F, I, B, UP, N, SIM, RUF, ANN).
mypy --strictmypy srcFull type check across the 10 `pxl` source files.
auditpython -m pxl.audit verifySHA256 body integrity of all 26 orchestration modules.
eval · mockpxl-eval --provider mockEnd-to-end harness plumbing (no API key required).
eval · realpxl-evalLive rubric evaluation against Claude Opus 4.6.

Composition algebra

docs/composition-algebra.md specifies, in EBNF and in type rules, how modules compose into a well-formed system prompt. Key invariants:

  1. Grammaridentity, constraint, scaffold?, domain, output, gate*
  2. Layer ordering — monotonic (with documented exceptions for gates)
  3. Vector compatibility — no strategic + creative collisions
  4. Refusal-path preservation — every stack keeps at least one literal REFUSED: path reachable from the Constraint block through the Output block.

A composition is typed (P, R) — positive invariants + refusal conditions. A reviewer reads off (P, R) in under a minute or the composition is not well-formed.

Evaluation protocol

docs/evaluation-protocol.md specifies the epistemology of pass/fail:

  • Strict threshold — a case passes iff every rubric item is satisfied (≥ 0.999). No partial credit.
  • Adversarial cases carry equal weight — every spec has at least one.
  • Judge under testtests/test_judge.py feeds the judge known-good and known-bad outputs and asserts correct scoring.
  • Provider-agnosticism — one provider ≠ validated; cross-model validated means two providers from two vendors.
  • Out of scope — foundation primitives (meta-templates, §8.1) and layer 05 (multi-page runtime-bound, §8.2).

Bibliography & prior art

Every seed module now has a Prior art section naming its intellectual ancestors. Citations are collected in docs/references.bib — Peirce for inference types, Feathers for refactoring, Halmos for Socratic teaching, Kahneman for the Executive Engine, QuickCheck for property tests, Popper for falsifiability, Horowitz for advising.

A claim without a prior-art anchor is not allowed to ship. The rule is mechanical: if the module references a technique that is not in the bibliography, the module is rejected.

Case studies

docs/case-studies/ contains three concrete runs — the unbounded-cache PR review, the Fibonacci(n ≤ 10¹⁸) trap for Executive Engine, and the Apollo 11 hallucination gate. Each study shows the exact input, exact output, rubric trace, verdict, and the adversarial variant that would have broken it. Case studies are the second pillar of falsifiability in the repo (the first is the eval harness).

Layer 05 · Orchestration

  05_orchestration/
  ├── protocols/     6
  ├── agents/        9
  ├── frameworks/    5
  ├── crypto/        3
  └── research/      3
                    ──
                    26

Provenance: Advanced Orchestration v1 Status: integrated verbatim License: single-owner proprietary Packaging: prompt-x-lab native

The first four layers are primitives: short, hand-written, fit-on-one-screen. Layer 05 is the opposite — production-sized systems that would drown a foundation layer, adapted here without a single byte of content change.

Every module in 05_orchestration/ is a whole system on its own: a Codex PR agent, a scientific-simulator transformation protocol, a crypto order-flow framework, a research-methodology contract. Each carries its origin in the frontmatter (origin: Advanced Orchestration v1 bundle) and its original file name (source_file:).

Composition rule preserved: an orchestration module is still wrapped top-down by 00_foundation/ (identity + constraint + output). Layer 05 does not break the stack — it is the domain tier, sitting above scaffolds and below validation gates.

Category Path Modules Contents
PROTOCOLS 05_orchestration/protocols/ 6 SPST · DSIO · IOA · LRE · PGE · SMLRS — execution protocols for Codex/Principal-Eng level repo transformations.
AGENTS 05_orchestration/agents/ 9 Pull-request automation agents: transform, audit, stabilise, and ship repository-scale changes deterministically.
FRAMEWORKS 05_orchestration/frameworks/ 5 Flagship long-form frameworks — multi-phase, multi-contract, multi-artifact operators.
CRYPTO 05_orchestration/crypto/ 3 Crypto & trading systems — order-flow, regime detection, quant pipeline integration.
RESEARCH 05_orchestration/research/ 3 Methodology & research protocols — reproducibility, evidence-bound inference, falsification ladders.

Layer 06 · ECA Cognitive Engine

  06_eca_engine/                    34 files
  ├── core/           6  prompt, tiers, config, templates
  ├── runtime/        4  policy, fallback, router, budget
  ├── benchmarks/     3  metrics, rubric, protocol
  ├── security/       3  model, guardrails, provenance
  ├── schemas/        2  request, response envelopes
  ├── legal/          1  EULA template
  └── docs/          15  architecture · calibration ·
                         operations · release notes

  src/pxl/eca/                      typed subsystem
  ├── schemas.py      Pydantic envelopes (strict)
  ├── config.py       12 Pydantic config models
  ├── router.py       port of route_request
  ├── scorer.py       7-dim quality scorecard
  ├── signer.py       HMAC-SHA256 provenance
  ├── validate.py     full-stack replay
  ├── cli.py          pxl-eca entry point
  ├── assets/         13 bundled YAML/JSON/TXT
  └── datasets/       10 calibration datasets

ECA v1.1.0 · selected_iteration 27 · 77-iter calibrated 22 pytest tests · mypy --strict clean · zero FP

ECA is a production-candidate cognitive operating layer: one synchronised reasoning system routed across six modes (deep_analysis, executive_decision_brief, system_architecture_blueprint, human_performance_protocol, cognitive_error_audit, implementation_roadmap), with a seven-dimensional quality scorecard and calibrated shipping thresholds.

Unlike layer 05, which is a verbatim text copy, layer 06 is a full native integration:

  • 34 content files under 06_eca_engine/ with prompt-x-lab frontmatter and source_sha256 provenance.
  • Typed Python port under src/pxl/eca/ — Pydantic v2 models for every config and envelope, pure functions for router/scorer/signer, mypy --strict clean across 18 source files.
  • CLI: pxl-eca info / validate / route / score / sign.
  • Reproduction tests (22 pytest cases) that replay the full calibration corpus and enforce the numerical contract as CI gates. The calibration chain is protected by code, not by documentation.
  • SHA256 audit of every body in the layer — python -m pxl.audit verify fails CI on drift.

Reproduced calibration numbers (full-corpus replay):

Metric Holdout (orig) Full-corpus (bundled)
Router synthetic accuracy 100% 99.44% (178/180)
Router adversarial accuracy 100% 100%
Scorer balanced accuracy 91.67% 90.62% (174/192)
Scorer F1 90.91% 89.66%
Scorer false positives 0 0

The zero-FP invariant is load-bearing: across the entire synthetic response corpus, the scorer has never green-lit a response it should have blocked. This is the strongest single property of the ECA quality gate.

Layer 07 · Kriterion

  07_kriterion/               18 files
  ├── protocols/    6  security-role protocols
  │                    SE-OPS · SSE · ESA
  │                    PSE · DSE · GPT-5.4 audit
  ├── schemas/      9  canonical evaluation
  │                    CanonicalArtifact
  │                    EvaluationResult
  │                    TaskScore · DomainScore
  │                    GateResult · Handoff
  │                    ReferenceInputBundle
  │                    ArtifactValidationResult
  │                    GovernanceInvariantRegistry
  └── methodology/  3  methodology · threat model
                       · anti-fragile reasoning

  src/pxl/kriterion/          typed kernel
  ├── canonical.py    the 180-line fail-closed
  │                   mathematical core: canonical
  │                   bytes, genesis/step hashes,
  │                   ExecutionChain builder
  ├── schemas.py      jsonschema + referencing
  ├── protocols.py    protocol loaders
  ├── benchmark.py    10-case reproduction
  └── cli.py          pxl-kriterion entry point

Kriterion v2026.4.5 · 10/10 reproduction · 40 tests

Kriterion's contribution is a single load-bearing idea: if every evaluation phase hashes its canonical input and links to the previous phase, the whole pipeline is a tamper-evident chain. Any modification to any phase invalidates every subsequent hash. Fail-closed by construction.

Layer 07 integrates only the reusable kernel:

  • Six protocols, adapted verbatim with source_sha256 provenance.
  • Nine JSON schemas — the canonical evaluation envelope hierarchy.
  • Three methodology documents — the signal, not the noise. No business copy, no HTML dashboard, no governance plumbing from the upstream bundle.
  • Typed Python kernel in src/pxl/kriterion/canonical.pycanonical_bytes, sha256_hex, build_genesis_hash, build_step_hash, and an ExecutionChain builder. Zero dependencies beyond the standard library. MIT-licensed independently of the content.
  • Ten-case reproductionpxl-kriterion benchmark replays the upstream dataset_manifest.json hashes through the typed kernel and must match all ten byte-for-byte. Enforced by CI.

The kernel is reusable for any audit pipeline, not just Kriterion's. If you want fail-closed evaluation:

from pxl.kriterion import ExecutionChain, Phase

chain = ExecutionChain.start(my_bundle, contract_version="1.0.0")
for phase in Phase:
    chain.advance(phase, phase_input=phase_state[phase])
# chain.terminal_hash now proves the entire execution

Seven properties guarantee fail-closedness: canonical-form uniqueness, domain separation of genesis and step hashes, linear chain structure, format-version baking, contract-version baking, bundle-dependent genesis, and re-derivable terminal hash. Every property is enforced by a dedicated test in tests/test_kriterion_canonical.py.

License: content under AGCL-1.0 (community, non-commercial); canonical.py is MIT-licensed independently because the mathematical ideas (canonical JSON, domain separation, chain linking) are not copyrightable as such.

Visual Study — ECA Cognitive Engine v1.1

ECA Cognitive Engine v1.1 — Neuro-Fractal Recursive Architecture



Radial recursive expansion · Metatron's Cube core · seven fractal rings · golden-ratio Trinity arcs · bilateral symmetry — organic left, crystalline right · void-shift palette.


     singularity      →   the input void
     flower of life   →   sacred substrate
     metatron spokes  →   13-point topology
     seven rings      →   stages I → VII
     trinity arcs     →   C · N · S @ φ
     L-system left    →   organic growth
     crystalline R    →   architectural lattice
     white criticals  →   9 emphasis nodes

No flowchart. No boxes. No rectangles. Every element obeys a mathematical rule:

  • Singularity — the central point from which structure unfolds.
  • Flower of Life + Metatron — the 19-circle sacred substrate, over-laid with hexagram and six-point spokes.
  • Seven concentric rings — each a fractal chain of interlocking module dots; ring radii follow a quasi-golden progression (200 → 830).
  • Trinity arcs — Cognitive · Neuro · System — three 80° arcs at r=900, separated by 40° golden gaps.
  • L-system branches — left side is bezier/organic, right side is polyline/crystalline. Bilateral symmetry intentional and broken.
  • Critical data points — nine true-white nodes across rings 3 · 5 · 7, forming two rotated triangles.
  • Mathematical etchingγ ≈ 1.0, φ = 1.618…, MWC, Σ, , ∂/∂t, ℝⁿ — set in serif italic, opacity 0.45, like notation cut into glass.

Palette: #050505 void · #FF00FF → #8B0000 glow · #FFFFFF criticals. Stroke: 0.4 – 2.2pt. Every line load-bearing.

MIT · Solo · Ukraine 🇺🇦 · 2026

"Don't trust anyone. Don't even trust yourself." — Elon Musk, Lex Fridman Podcast #400



This repo is a discipline, not a catalog. Every module earns its place.

About

A systematic library of high-fidelity prompts, cognitive architectures, and agent protocols for frontier language models.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors