HQIV · disregardfiat · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026
diff --git a/.github/workflows/hqiv-arena.yml b/.github/workflows/hqiv-arena.yml
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,137 @@
+# Contributing to pyhqiv + HQIV Arena
+
+Thank you for helping make HQIV a living, crowdsourced physics improvement engine.
+
+## Core Principles
+
+- **main is sacred and always certified.** Every commit on `main` must pass the full Lean lake build (including SO(8) closure, Spin(8) triality, GR+SM unification, horizon curvature, lapse relations, and all other proved theorems). These are certificates. No exceptions.
+- **Lean ↔ Python bidirectional alignment is a hard gate.** Numerical results in pyhqiv must be consistent with formally derived values from hqiv-lean (within the tolerances defined in the alignment script). Functional helpers only — no hard-coded constants in scoring/alignment logic.
+- **New feature → new test.** Adding capability (phase diagram generator, new fluid observable, improved lattice combinatorics, new resonance channel, …) requires corresponding new tests. Those tests become part of the permanent suite upon merge.
+- **"Sigma everywhere".** We reward measurable, broad reductions in error/variance (σ) across multiple physical observables. Large regressions on any protected core metric (Ω_k horizons, lapse, mode counts, proton anchor, so(8) dimension, derived masses, …) are heavily penalized or cause the submission to be ineligible.
+
+## Branch & PR Workflow (HQIV Arena)
+
+All development happens on branches:
+
+- `feat/`, `arena/`, `benchmark/`, or your-name/...
+- CI (`hqiv-arena.yml`) runs on every push and every PR to these branches (and main).
+- Leaderboard + scoring work on branches (you see your provisional rank and score in the PR).
+- Only high-quality, aligned, improving changes that keep main green are merged.
+
+### The Five-Stage Pipeline (all must pass for a score)
+
+1. **Lean Certificate Gate** (hard)
+   - Full `lake build HQIVSO8Closure` (or equivalent heavy targets covering all major theorems).
+   - Zero `sorry`s. Proof-checking time is recorded.
+   - See the workflow in hqiv-lean and the lean job in pyhqiv's arena workflow.
+
+2. **Lean ↔ Python Alignment Gate** (hard)
+   - `python scripts/validate_hqiv_alignment.py`
+   - Must pass 100%. Uses Lean-exported `witnesses.json` + functional mirror checks in pyhqiv (lightcone, metric, so8 generators, etc.).
+   - If you changed a Lean definition that affects a numerical value, update the export and the py mirror (or the alignment will fail).
+
+3. **Python Test Gate** (hard)
+   - Full `pytest` (including paper validation, reproducibility checks, and your new tests).
+
+4. **Scoring & Evaluation**
+   - Only reached if 1-3 are green.
+   - Computes a vector of metrics + deltas vs the current baseline on main.
+   - "Sigma everywhere" logic: broad improvement rewarded; protected regressions penalized.
+   - Produces `arena_results.json` (artifact + comment on PR).
+
+5. **Leaderboard & Badges**
+   - On merge to main the results are committed to `arena/leaderboard.json` (and `arena/baseline.json` for future deltas).
+   - Badges are awarded automatically (see below).
+
+## Adding a New Benchmark / Observable (the standard workflow)
+
+1. Implement the physics (new module or extension).
+2. Add tests that validate physical consistency (e.g. for a phase diagram: phase boundaries, thermodynamic stability, thermodynamic limit checks, consistency with Lean-derived shell quantities).
+3. (Recommended) Register a new Arena metric so the improvement is scored:
+
+   ```python
+   # in your new test file or a small arena registration module
+   from pyhqiv.arena.metrics import register_metric, Metric
+
+   def my_phase_quality() -> float:
+       # deterministic compute of a quality / error metric
+       ...
+
+   register_metric(Metric(
+       name="phase_diagram_consistency_v1",
+       compute=my_phase_quality,
+       reference=lambda: 0.0,   # or load from witness / golden
+       protected=False,
+       weight=1.0,
+       unit="",
+       desc="Phase boundary + thermodynamic stability quality (new in this PR)",
+   ))
+   ```
+
+4. The new metric will automatically participate in scoring on the next run.
+5. Open PR. CI will show the impact on overall score and whether it improves σ.
+
+Templates live in `arena/templates/`.
+
+## Badges (awarded automatically on merge)
+
+- **Merged Features** — impactful capability landed on main.
+- **Sigma Improver** — consistent broad error reduction with no protected regressions.
+- **Highest Position Held** — record holder or longest cumulative time at #1.
+- **Best New Test Suite** — high-quality new tests that increase physical coverage.
+- **Efficiency Leader** — excellent score per compute / proof time.
+- **Lean Certificate Champion** — multiple full certified builds on work that merged.
+- **Alignment Guardian** — extended or hardened the Lean ↔ py validation.
+
+## Local Development & Testing the Arena
+
+```bash
+# alignment gate (uses committed + overlay witnesses)
+python3 scripts/validate_hqiv_alignment.py --verbose
+
+# scoring (compares to arena/baseline.json if present)
+python3 scripts/arena/compute_score.py --out /tmp/arena.json --print-badges
+
+# update leaderboard locally (dry)
+python3 scripts/arena/update_leaderboard.py --results /tmp/arena.json --dry-run
+```
+
+To simulate a fresh Lean export:
+
+```bash
+cd /path/to/hqiv-lean
+lake env lean -q --run scripts/export_witnesses.lean
+# then run alignment from pyhqiv checkout (the overlay loader will pick it up)
+```
+
+## Determinism & Reproducibility
+
+- All Arena benchmarks must be deterministic or use fixed seeds + report statistics.
+- Golden / reference outputs for new benchmarks should be committed (under `tests/data/` or `arena/goldens/`).
+- Never rely on wall time or non-reproducible network/FS state in scoring.
+
+## Documentation & Alectryon
+
+- The Lean side maintains Alectryon / docgen in its own CI (see hqiv-lean workflows).
+- When you add new Lean modules that are cited by papers or the story, make sure the appropriate `paper_*` lake targets and doc facets still build.
+
+## Questions / Coordination
+
+For large cross-repo changes (Lean + py), open matching branches (`feat/xyz` in both) and reference each other. The Arena CI tries to detect same-named branches for alignment.
+
+Welcome to the Arena. Let's reduce σ together.
+
+## Using the hqiv-arena CLI (recommended for agents and rapid iteration)
+
+```bash
+pip install -e ".[dev]"
+hqiv-arena install-skill   # for your coding agent
+hqiv-arena clone my-hqiv-work
+cd my-hqiv-work/pyhqiv
+hqiv-arena setup
+hqiv-arena run
+# ... edit code + add tests ...
+hqiv-arena submit --note-file progress.md --model "Claude 4 Opus"
+```
+
+Full reference lives in `arena/SKILL.md` (and gets installed to `~/.agents/skills/hqiv-arena/SKILL.md` etc.).
diff --git a/README.md b/README.md
@@ -352,6 +352,20 @@ Runs ruff (lint + format), mypy, and generic hooks on commit. Config: `.pre-comm
 
 For releases, the CI build runs `scripts/update_citation_cff.py` to set `CITATION.cff` version and `date-released` from the current tag/date. You can run it manually with `--version X.Y.Z --date YYYY-MM-DD` to sync before a release.
 
+## HQIV Arena
+
+Branch-based, CI-scored physics improvement platform. **Live leaderboard:** [disregardfiat.tech/#arena](https://disregardfiat.tech/#arena).
+
+```bash
+pip install -e ".[dev]"
+hqiv-arena install-skill   # agent usage guide
+hqiv-arena clone ./hqiv-workspace && cd hqiv-workspace/pyhqiv
+hqiv-arena setup && hqiv-arena run
+hqiv-arena submit --note-file progress.md --model "Your agent" --claimed-score 1000
+```
+
+See [CONTRIBUTING.md](CONTRIBUTING.md) (Arena section) and `.github/workflows/hqiv-arena.yml` for the five-stage pipeline (Lean cert → alignment → tests → scoring → leaderboard).
+
 ## Contributing
 
 Public contribution and feedback are greatly appreciated. Please open issues or pull requests on [GitHub](https://github.com/disregardfiat/pyhqiv). All features are experimental; we welcome bug reports, documentation improvements, and suggestions.

diff --git a/arena/README.md b/arena/README.md
@@ -0,0 +1,27 @@
+# HQIV Arena (local data + docs)
+
+This directory contains:
+
+- `leaderboard.json` — committed public leaderboard (updated by CI on merges to main)
+- `baseline.json` — last green main results, used by branches for delta scoring
+- `templates/` — contributor templates for new benchmarks and Lean extensions
+- `goldens/` — optional committed reference outputs for new deterministic benchmarks
+
+The actual implementation lives in:
+
+- `src/pyhqiv/arena/` (metrics, scoring, badges — importable)
+- `scripts/validate_hqiv_alignment.py` (the hard Lean ↔ Python gate)
+- `scripts/arena/compute_score.py` and `update_leaderboard.py`
+- `.github/workflows/hqiv-arena.yml`
+
+See the top-level `CONTRIBUTING.md` for the full workflow, gates, and "sigma everywhere" philosophy.
+
+The live leaderboard is also rendered at https://disregardfiat.tech/#arena (pulls the JSON from this repo's main branch).
+
+## CLI
+
+After `pip install -e .[dev]` (or from the source tree with the wrapper), the command `hqiv-arena` is available.
+
+See the `SKILL.md` (in this dir or installed to your agent's skills folder) for full usage, especially `clone`, `run`, `submit`, `sync`, and `install-skill`.
+
+The CLI is the primary way solvers (humans or AI agents) participate in the Arena.
diff --git a/arena/baseline.json b/arena/baseline.json
@@ -0,0 +1,6 @@
+{
+  "overall_score": 1000.0,
+  "sigma_weighted": 0.0,
+  "metrics": [],
+  "note": "Baseline from last green main. Branches compare deltas against this."
+}
diff --git a/arena/leaderboard.json b/arena/leaderboard.json
@@ -0,0 +1,8 @@
+{
+  "entries": [],
+  "current_best": null,
+  "history": [],
+  "badges": {},
+  "schema_version": 1,
+  "note": "Updated automatically by HQIV Arena CI on merges to main. See CONTRIBUTING.md."
+}
diff --git a/arena/templates/new_benchmark_test.py.template b/arena/templates/new_benchmark_test.py.template
@@ -0,0 +1,77 @@
+"""
+Template for a new HQIV Arena benchmark + test.
+
+Copy to tests/test_my_new_thing.py (or appropriate existing test file),
+implement the physics, add the tests, and (recommended) register an Arena metric
+so improvements are scored on the public leaderboard.
+
+Rules:
+- Must be deterministic (fixed seeds, no wall-time, no external downloads in the hot path).
+- New capability must be accompanied by physical-consistency tests (not just "runs").
+- If it touches Lean-derived quantities, the alignment gate must still pass (or you extend the witnesses + mirrors).
+"""
+
+from __future__ import annotations
+
+import math
+
+import pytest
+
+# Example: suppose we added a phase diagram generator in pyhqiv.phase
+# from pyhqiv.phase import compute_phase_diagram, phase_boundary_stability
+
+
+def test_phase_diagram_runs_and_is_deterministic():
+    # Replace with real call. Use fixed parameters.
+    # diag = compute_phase_diagram(m_range=(0, 20), T0=1.8, seed=42)
+    # assert diag is not None
+    # assert len(diag.boundaries) > 0
+    assert True  # placeholder
+
+
+def test_phase_boundaries_thermodynamically_consistent():
+    # Real checks would include:
+    # - boundaries are monotonic or satisfy Gibbs phase rule in the regime
+    # - stability (second derivative signs, or positive definite Hessian proxy)
+    # - consistency with Lean-derived shell quantities (reference_m, omega_k, curvature)
+    # - no NaN / inf on the grid
+    assert True  # placeholder
+
+
+def test_phase_quality_is_registered_in_arena():
+    """
+    Optional but recommended for scoring impact.
+
+    After implementing a real quality metric, register it so "new feature → new score".
+    The registration can live here or in a small dedicated module imported by CI.
+    """
+    try:
+        from pyhqiv.arena.metrics import register_metric, Metric
+
+        def _phase_quality() -> float:
+            # Compute a single scalar quality / error that lower-is-better or higher-is-better
+            # (the scoring engine treats it as "value vs reference"; you choose the sign via reference).
+            # Must be deterministic.
+            # Example: return 1.0 - boundary_error_fraction(...)
+            return 0.96
+
+        register_metric(
+            Metric(
+                name="phase_diagram_consistency_v1",
+                compute=_phase_quality,
+                reference=lambda: 0.95,  # or load from a golden / witness
+                protected=False,
+                weight=1.0,
+                unit="",
+                tolerance=0.005,
+                desc="Phase diagram physical consistency (boundaries + thermodynamic stability)",
+            )
+        )
+        # If we got here, registration succeeded for this process
+        assert True
+    except Exception as e:
+        pytest.skip(f"arena metrics not available in this env: {e}")
+
+
+# If your benchmark produces golden/reference outputs, commit them under
+# arena/goldens/ or tests/data/ and load them here for reproducibility checks.
diff --git a/pyproject.toml b/pyproject.toml
@@ -50,13 +50,19 @@ Documentation = "https://github.com/disregardfiat/pyhqiv"
 Repository = "https://github.com/disregardfiat/pyhqiv"
 "Bug Tracker" = "https://github.com/disregardfiat/pyhqiv/issues"
 
+[project.scripts]
+hqiv-arena = "pyhqiv.arena_cli:main"
+
 [tool.setuptools_scm]
 write_to = "src/pyhqiv/_version.py"
 fallback_version = "0.4.0+unknown"
 
 [tool.setuptools.packages.find]
 where = ["src"]
 
+[tool.setuptools.package-data]
+pyhqiv = ["witnesses.json", "so8_generators.json", "arena/SKILL.md"]
+
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 pythonpath = ["src"]