Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
432 changes: 432 additions & 0 deletions .github/workflows/hqiv-arena.yml

Large diffs are not rendered by default.

137 changes: 137 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Contributing to pyhqiv + HQIV Arena

Thank you for helping make HQIV a living, crowdsourced physics improvement engine.

## Core Principles

- **main is sacred and always certified.** Every commit on `main` must pass the full Lean lake build (including SO(8) closure, Spin(8) triality, GR+SM unification, horizon curvature, lapse relations, and all other proved theorems). These are certificates. No exceptions.
- **Lean ↔ Python bidirectional alignment is a hard gate.** Numerical results in pyhqiv must be consistent with formally derived values from hqiv-lean (within the tolerances defined in the alignment script). Functional helpers only — no hard-coded constants in scoring/alignment logic.
- **New feature → new test.** Adding capability (phase diagram generator, new fluid observable, improved lattice combinatorics, new resonance channel, …) requires corresponding new tests. Those tests become part of the permanent suite upon merge.
- **"Sigma everywhere".** We reward measurable, broad reductions in error/variance (σ) across multiple physical observables. Large regressions on any protected core metric (Ω_k horizons, lapse, mode counts, proton anchor, so(8) dimension, derived masses, …) are heavily penalized or cause the submission to be ineligible.

## Branch & PR Workflow (HQIV Arena)

All development happens on branches:

- `feat/`, `arena/`, `benchmark/`, or your-name/...
- CI (`hqiv-arena.yml`) runs on every push and every PR to these branches (and main).
- Leaderboard + scoring work on branches (you see your provisional rank and score in the PR).
- Only high-quality, aligned, improving changes that keep main green are merged.

### The Five-Stage Pipeline (all must pass for a score)

1. **Lean Certificate Gate** (hard)
- Full `lake build HQIVSO8Closure` (or equivalent heavy targets covering all major theorems).
- Zero `sorry`s. Proof-checking time is recorded.
- See the workflow in hqiv-lean and the lean job in pyhqiv's arena workflow.

2. **Lean ↔ Python Alignment Gate** (hard)
- `python scripts/validate_hqiv_alignment.py`
- Must pass 100%. Uses Lean-exported `witnesses.json` + functional mirror checks in pyhqiv (lightcone, metric, so8 generators, etc.).
- If you changed a Lean definition that affects a numerical value, update the export and the py mirror (or the alignment will fail).

3. **Python Test Gate** (hard)
- Full `pytest` (including paper validation, reproducibility checks, and your new tests).

4. **Scoring & Evaluation**
- Only reached if 1-3 are green.
- Computes a vector of metrics + deltas vs the current baseline on main.
- "Sigma everywhere" logic: broad improvement rewarded; protected regressions penalized.
- Produces `arena_results.json` (artifact + comment on PR).

5. **Leaderboard & Badges**
- On merge to main the results are committed to `arena/leaderboard.json` (and `arena/baseline.json` for future deltas).
- Badges are awarded automatically (see below).

## Adding a New Benchmark / Observable (the standard workflow)

1. Implement the physics (new module or extension).
2. Add tests that validate physical consistency (e.g. for a phase diagram: phase boundaries, thermodynamic stability, thermodynamic limit checks, consistency with Lean-derived shell quantities).
3. (Recommended) Register a new Arena metric so the improvement is scored:

```python
# in your new test file or a small arena registration module
from pyhqiv.arena.metrics import register_metric, Metric

def my_phase_quality() -> float:
# deterministic compute of a quality / error metric
...

register_metric(Metric(
name="phase_diagram_consistency_v1",
compute=my_phase_quality,
reference=lambda: 0.0, # or load from witness / golden
protected=False,
weight=1.0,
unit="",
desc="Phase boundary + thermodynamic stability quality (new in this PR)",
))
```

4. The new metric will automatically participate in scoring on the next run.
5. Open PR. CI will show the impact on overall score and whether it improves σ.

Templates live in `arena/templates/`.

## Badges (awarded automatically on merge)

- **Merged Features** — impactful capability landed on main.
- **Sigma Improver** — consistent broad error reduction with no protected regressions.
- **Highest Position Held** — record holder or longest cumulative time at #1.
- **Best New Test Suite** — high-quality new tests that increase physical coverage.
- **Efficiency Leader** — excellent score per compute / proof time.
- **Lean Certificate Champion** — multiple full certified builds on work that merged.
- **Alignment Guardian** — extended or hardened the Lean ↔ py validation.

## Local Development & Testing the Arena

```bash
# alignment gate (uses committed + overlay witnesses)
python3 scripts/validate_hqiv_alignment.py --verbose

# scoring (compares to arena/baseline.json if present)
python3 scripts/arena/compute_score.py --out /tmp/arena.json --print-badges

# update leaderboard locally (dry)
python3 scripts/arena/update_leaderboard.py --results /tmp/arena.json --dry-run
```

To simulate a fresh Lean export:

```bash
cd /path/to/hqiv-lean
lake env lean -q --run scripts/export_witnesses.lean
# then run alignment from pyhqiv checkout (the overlay loader will pick it up)
```

## Determinism & Reproducibility

- All Arena benchmarks must be deterministic or use fixed seeds + report statistics.
- Golden / reference outputs for new benchmarks should be committed (under `tests/data/` or `arena/goldens/`).
- Never rely on wall time or non-reproducible network/FS state in scoring.

## Documentation & Alectryon

- The Lean side maintains Alectryon / docgen in its own CI (see hqiv-lean workflows).
- When you add new Lean modules that are cited by papers or the story, make sure the appropriate `paper_*` lake targets and doc facets still build.

## Questions / Coordination

For large cross-repo changes (Lean + py), open matching branches (`feat/xyz` in both) and reference each other. The Arena CI tries to detect same-named branches for alignment.

Welcome to the Arena. Let's reduce σ together.

## Using the hqiv-arena CLI (recommended for agents and rapid iteration)

```bash
pip install -e ".[dev]"
hqiv-arena install-skill # for your coding agent
hqiv-arena clone my-hqiv-work
cd my-hqiv-work/pyhqiv
hqiv-arena setup
hqiv-arena run
# ... edit code + add tests ...
hqiv-arena submit --note-file progress.md --model "Claude 4 Opus"
```

Full reference lives in `arena/SKILL.md` (and gets installed to `~/.agents/skills/hqiv-arena/SKILL.md` etc.).
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,20 @@ Runs ruff (lint + format), mypy, and generic hooks on commit. Config: `.pre-comm

For releases, the CI build runs `scripts/update_citation_cff.py` to set `CITATION.cff` version and `date-released` from the current tag/date. You can run it manually with `--version X.Y.Z --date YYYY-MM-DD` to sync before a release.

## HQIV Arena

Branch-based, CI-scored physics improvement platform. **Live leaderboard:** [disregardfiat.tech/#arena](https://disregardfiat.tech/#arena).

```bash
pip install -e ".[dev]"
hqiv-arena install-skill # agent usage guide
hqiv-arena clone ./hqiv-workspace && cd hqiv-workspace/pyhqiv
hqiv-arena setup && hqiv-arena run
hqiv-arena submit --note-file progress.md --model "Your agent" --claimed-score 1000
```

See [CONTRIBUTING.md](CONTRIBUTING.md) (Arena section) and `.github/workflows/hqiv-arena.yml` for the five-stage pipeline (Lean cert → alignment → tests → scoring → leaderboard).

## Contributing

Public contribution and feedback are greatly appreciated. Please open issues or pull requests on [GitHub](https://github.com/disregardfiat/pyhqiv). All features are experimental; we welcome bug reports, documentation improvements, and suggestions.
Expand Down
27 changes: 27 additions & 0 deletions arena/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# HQIV Arena (local data + docs)

This directory contains:

- `leaderboard.json` — committed public leaderboard (updated by CI on merges to main)
- `baseline.json` — last green main results, used by branches for delta scoring
- `templates/` — contributor templates for new benchmarks and Lean extensions
- `goldens/` — optional committed reference outputs for new deterministic benchmarks

The actual implementation lives in:

- `src/pyhqiv/arena/` (metrics, scoring, badges — importable)
- `scripts/validate_hqiv_alignment.py` (the hard Lean ↔ Python gate)
- `scripts/arena/compute_score.py` and `update_leaderboard.py`
- `.github/workflows/hqiv-arena.yml`

See the top-level `CONTRIBUTING.md` for the full workflow, gates, and "sigma everywhere" philosophy.

The live leaderboard is also rendered at https://disregardfiat.tech/#arena (pulls the JSON from this repo's main branch).

## CLI

After `pip install -e .[dev]` (or from the source tree with the wrapper), the command `hqiv-arena` is available.

See the `SKILL.md` (in this dir or installed to your agent's skills folder) for full usage, especially `clone`, `run`, `submit`, `sync`, and `install-skill`.

The CLI is the primary way solvers (humans or AI agents) participate in the Arena.
6 changes: 6 additions & 0 deletions arena/baseline.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"overall_score": 1000.0,
"sigma_weighted": 0.0,
"metrics": [],
"note": "Baseline from last green main. Branches compare deltas against this."
}
8 changes: 8 additions & 0 deletions arena/leaderboard.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"entries": [],
"current_best": null,
"history": [],
"badges": {},
"schema_version": 1,
"note": "Updated automatically by HQIV Arena CI on merges to main. See CONTRIBUTING.md."
}
77 changes: 77 additions & 0 deletions arena/templates/new_benchmark_test.py.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
"""
Template for a new HQIV Arena benchmark + test.

Copy to tests/test_my_new_thing.py (or appropriate existing test file),
implement the physics, add the tests, and (recommended) register an Arena metric
so improvements are scored on the public leaderboard.

Rules:
- Must be deterministic (fixed seeds, no wall-time, no external downloads in the hot path).
- New capability must be accompanied by physical-consistency tests (not just "runs").
- If it touches Lean-derived quantities, the alignment gate must still pass (or you extend the witnesses + mirrors).
"""

from __future__ import annotations

import math

import pytest

# Example: suppose we added a phase diagram generator in pyhqiv.phase
# from pyhqiv.phase import compute_phase_diagram, phase_boundary_stability


def test_phase_diagram_runs_and_is_deterministic():
# Replace with real call. Use fixed parameters.
# diag = compute_phase_diagram(m_range=(0, 20), T0=1.8, seed=42)
# assert diag is not None
# assert len(diag.boundaries) > 0
assert True # placeholder


def test_phase_boundaries_thermodynamically_consistent():
# Real checks would include:
# - boundaries are monotonic or satisfy Gibbs phase rule in the regime
# - stability (second derivative signs, or positive definite Hessian proxy)
# - consistency with Lean-derived shell quantities (reference_m, omega_k, curvature)
# - no NaN / inf on the grid
assert True # placeholder


def test_phase_quality_is_registered_in_arena():
"""
Optional but recommended for scoring impact.

After implementing a real quality metric, register it so "new feature → new score".
The registration can live here or in a small dedicated module imported by CI.
"""
try:
from pyhqiv.arena.metrics import register_metric, Metric

def _phase_quality() -> float:
# Compute a single scalar quality / error that lower-is-better or higher-is-better
# (the scoring engine treats it as "value vs reference"; you choose the sign via reference).
# Must be deterministic.
# Example: return 1.0 - boundary_error_fraction(...)
return 0.96

register_metric(
Metric(
name="phase_diagram_consistency_v1",
compute=_phase_quality,
reference=lambda: 0.95, # or load from a golden / witness
protected=False,
weight=1.0,
unit="",
tolerance=0.005,
desc="Phase diagram physical consistency (boundaries + thermodynamic stability)",
)
)
# If we got here, registration succeeded for this process
assert True
except Exception as e:
pytest.skip(f"arena metrics not available in this env: {e}")


# If your benchmark produces golden/reference outputs, commit them under
# arena/goldens/ or tests/data/ and load them here for reproducibility checks.
6 changes: 6 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,19 @@ Documentation = "https://github.com/disregardfiat/pyhqiv"
Repository = "https://github.com/disregardfiat/pyhqiv"
"Bug Tracker" = "https://github.com/disregardfiat/pyhqiv/issues"

[project.scripts]
hqiv-arena = "pyhqiv.arena_cli:main"

[tool.setuptools_scm]
write_to = "src/pyhqiv/_version.py"
fallback_version = "0.4.0+unknown"

[tool.setuptools.packages.find]
where = ["src"]

[tool.setuptools.package-data]
pyhqiv = ["witnesses.json", "so8_generators.json", "arena/SKILL.md"]

[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["src"]
Expand Down
Loading
Loading