ailiance · electron-rare · May 19, 2026 · May 19, 2026 · May 19, 2026 · May 19, 2026
diff --git a/mascarade-eval/docs/grist-native-views-recipe.md b/mascarade-eval/docs/grist-native-views-recipe.md
@@ -0,0 +1,71 @@
+# Grist native review views — operator recipe
+
+Manual Grist UI steps for the parts of the human-review layer that are
+not API-scriptable. Run the schema migration first
+(`python -m mascarade_eval.grist.cli schema`) so the review columns
+exist.
+
+## 1. review_status choice colors
+
+For each table carrying `review_status` (`Heldout_Items`, `Datasets`
+in doc *ailiance-llm-workflow*; `Mascarade_Eval_Items`,
+`Bench_31_domains` in doc *mascarade-data*, plus `Mascarade_Training`):
+
+1. Open the table, click the `review_status` column header → **Column
+   options**.
+2. Under **CHOICES**, confirm the four values are present: `pending`,
+   `validated`, `rejected`, `needs_fix`.
+3. Set the chip color of each: pending = grey `#E8E8E8`,
+   validated = green `#C6E5B3`, rejected = red `#F2B5B5`,
+   needs_fix = amber `#F5D9A6`.
+
+## 2. Bench_31_domains review page (doc mascarade-data)
+
+1. **Add Page** → name it `Bench review`.
+2. Add a **Table** widget bound to `Bench_31_domains`.
+3. Add a filter on `review_status` and a second on `domain`; save the
+   view so the filters persist.
+4. Conditional formatting (column header → **Column options** →
+   **Add conditional style**):
+   - `judge_score`: red when `$judge_score < 50`, amber when
+     `$judge_score < 70`, green otherwise.
+   - `validator_score`: red when `$validator_score < 50`, green when
+     `$validator_score >= 70`.
+   - `ppl`: red when `$ppl > 20`, amber when `$ppl > 10`.
+5. Add a **Card List** widget on the same page bound to
+   `Bench_31_domains`, linked to the table widget, showing `model`,
+   `domain`, `judge_score`, `judge_rationale`, `validator_score`,
+   `review_status`, `reviewer`, `review_note` — this is the per-row
+   review surface.
+
+## 3. Datasets review view (doc ailiance-llm-workflow)
+
+1. **Add Page** → `Datasets review`.
+2. Add a **Table** widget bound to `Datasets`, filtered on
+   `review_status`.
+3. Show `domain`, `name`, `n_rows`, `license`, `hf_dataset_id`,
+   `review_status`, `reviewer`, `review_note`.
+
+## 4. Read-only scoreboards
+
+For `Bench_public`, `Bench_niches_ppl`, `Bench_gateway`,
+`Bench_lift_v1`, `Bench_lift_v2`: add one page `Scoreboards` with a
+Table widget per table. Apply conditional formatting on the score
+columns (green high / red low) as in section 2. No review columns —
+these tables are reference only.
+
+## 5. Bench entry form (doc mascarade-data)
+
+1. **Add Page** → `Bench entry`.
+2. Add a **Form** widget bound to `Bench_31_domains`.
+3. Keep only these fields on the form: `model`, `domain`, `ppl`,
+   `task_score`, `task_metric`, `judge_score`, `source`, `date`.
+   Remove pipeline-only fields (`validator_image_digest`, `run_id`,
+   `host`, `runtime_s`, `tokens_per_s`, …).
+4. Click **Publish** and copy the share URL — this is the manual
+   bench-result entry form. Automated runs keep writing via the API.
+
+## 6. Clean-up
+
+Delete the empty default `Table1` (columns A/B/C) in each of the three
+documents.
diff --git a/mascarade-eval/docs/grist-widget-setup.md b/mascarade-eval/docs/grist-widget-setup.md
@@ -0,0 +1,81 @@
+# Review Console widget — hosting, wiring, smoke test
+
+The widget at `widgets/review-console/index.html` is a static file. It
+must be served over HTTPS and registered in Grist as a Custom URL
+widget.
+
+## 1. Hosting
+
+The widget is served by a dedicated `review-widget` nginx container in
+`/home/electron/saillant-sites/` on electron-server, exposed through
+traefik on the existing `admin.ailiance.fr` hostname under `/review`
+(a `Host && PathPrefix` router — no new cloudflared hostname needed).
+
+Compose service (`saillant-sites/docker-compose.yml`):
+
+```yaml
+  review-widget:
+    image: nginx:alpine
+    container_name: review-widget
+    restart: unless-stopped
+    networks: [traefik]
+    labels:
+      - traefik.enable=true
+      - traefik.docker.network=traefik
+      - traefik.http.routers.review-admin.rule=Host(`admin.ailiance.fr`) && PathPrefix(`/review`)
+      - traefik.http.routers.review-admin.entrypoints=websecure
+      - traefik.http.routers.review-admin.tls.certresolver=letsencrypt
+      - traefik.http.routers.review-admin.service=review-widget
+      - traefik.http.services.review-widget.loadbalancer.server.port=80
+    volumes:
+      - ./train-static:/usr/share/nginx/html:ro
+```
+
+The widget file lives at `saillant-sites/train-static/review/index.html`.
+Redeploy after editing the widget (nginx serves the mount live, no
+restart):
+
+```bash
+scp widgets/review-console/index.html \
+    electron-server:/home/electron/saillant-sites/train-static/review/index.html
+```
+
+Live URL (verified `HTTP 200`): `https://admin.ailiance.fr/review/`
+
+## 2. Add a review page in Grist
+
+In doc *ailiance-llm-workflow*:
+
+1. **Add Page** → `Heldout review`.
+2. Add a **Custom** widget. Select **Custom URL** and paste
+   `https://admin.ailiance.fr/review/`.
+3. Bind the widget to the `Heldout_Items` table.
+4. When prompted, grant the widget **Full document access** (it must
+   write the review columns).
+5. Open the widget's **Column mapping**:
+   - `primary` → `prompt`
+   - `secondary` → `reference`
+   - `context` → `domain`, `source`
+
+Repeat for the future `Mascarade_Training` table (map `primary` →
+`user_msg`, `secondary` → `assistant_msg`) and for
+`Mascarade_Eval_Items` in doc *mascarade-data* (map `primary` →
+`question`, `secondary` → `reference`).
+
+## 3. Smoke-test checklist
+
+On the `Heldout review` page:
+
+- [ ] The progress line shows `revus 0 / 400 — en attente 400`.
+- [ ] The first pending item's prompt and reference render in full.
+- [ ] Pressing `V` writes `review_status = validated`, `reviewer`,
+      `reviewed_at` (ISO-8601) and advances to the next item; the
+      progress counter increments.
+- [ ] Pressing `R` and `F` write `rejected` / `needs_fix`.
+- [ ] A value typed in the note field lands in `review_note` and the
+      field clears after the decision.
+- [ ] `S` / `→` skips without writing.
+- [ ] After every pending row is decided, the widget shows
+      "Tous les items en attente sont revus ✓".
+- [ ] Re-running `python -m mascarade_eval.grist.cli export --domain
+      <d>` ships only the rows marked `validated`.
diff --git a/mascarade-eval/mascarade_eval/grist/README.md b/mascarade-eval/mascarade_eval/grist/README.md
@@ -0,0 +1,36 @@
+# mascarade_eval.grist — Grist-backed dataset management
+
+Grist is the canonical source of truth for the mascarade training corpus.
+Mining ingests in insert-only mode (edits made in Grist are never
+overwritten); training and HF publication consume a deterministic export.
+
+## One-time setup
+
+1. Create an empty Grist doc "Mascarade Training" at grist.saillant.cc.
+2. Add `GRIST_DOC_TRAINING=<doc-id>` to `~/.config/electron-rare/grist.env`
+   (the file already holds `GRIST_API_KEY`).
+
+## Commands
+
+Run with `uv run python -m mascarade_eval.grist.cli <subcommand>`.
+
+- `migrate --domain kicad` — backfill a domain's HF training data into
+  Grist (insert-only). Run once per domain to seed the doc.
+- `ingest --domain kicad --jsonl mine.jsonl` — insert-only ingest of a
+  new mining/curation file. Existing rows are never touched.
+- `export --domain kicad` — write a hashed `.jsonl` snapshot to
+  `exports/` and log a row in the `Exports` table.
+- `publish --snapshot exports/kicad.<ts>.jsonl --hf-dataset
+  Ailiance-fr/mascarade-kicad-dataset --filename kicad_chat.jsonl` —
+  upload a snapshot to its HF dataset repo.
+
+Add `--dry-run` to `ingest`, `export`, or `migrate` to preview without
+writing to Grist or disk.
+
+## Human review
+
+Edit rows directly in the Grist UI. Each row carries a `review_status`
+(`pending` / `validated` / `rejected` / `needs_fix`); `export` ships only
+`validated` rows. Pass `--include-pending` to `export` to also include
+rows still awaiting review. See `docs/grist-native-views-recipe.md` and
+`docs/grist-widget-setup.md` for the review surfaces.
diff --git a/mascarade-eval/mascarade_eval/grist/__init__.py b/mascarade-eval/mascarade_eval/grist/__init__.py
@@ -0,0 +1,48 @@
+# mascarade_eval/grist/__init__.py
+"""Grist-backed dataset management for the mascarade training corpus.
+
+Grist is the canonical source of truth. Mining ingests in insert-only
+mode (human edits in Grist are never overwritten); training and HF
+publication consume a deterministic export of human-validated rows.
+"""
+from pathlib import Path
+
+GRIST_BASE = "https://grist.saillant.cc/api"
+
+# Known existing docs. The training doc ID is provided at runtime via
+# --doc or the GRIST_DOC_TRAINING env/file value.
+DOC_HELDOUT = "eGbbrpzN3TeLq3sUd2YFA2"      # ailiance-llm-workflow
+DOC_MASCARADE = "dhyrySCayizD1PNqCNhCPN"    # mascarade-data
+
+KEY_FILE = Path.home() / ".config" / "electron-rare" / "grist.env"
+
+TRAINING_TABLE = "Mascarade_Training"
+REGISTRY_TABLE = "Datasets_Registry"
+EXPORTS_TABLE = "Exports"
+
+# Human-review columns appended to every validation-target table.
+REVIEW_COLUMNS = ("review_status", "reviewer", "reviewed_at", "review_note")
+REVIEW_STATUSES = ("pending", "validated", "rejected", "needs_fix")
+REVIEWER_CHOICES = ("clems",)
+
+# Existing tables that receive the review columns, keyed by doc ID.
+REVIEW_TARGETS = {
+    DOC_HELDOUT: ("Heldout_Items", "Datasets"),
+    DOC_MASCARADE: ("Mascarade_Eval_Items", "Bench_31_domains"),
+}
+
+TRAINING_COLUMNS = (
+    "item_key", "domain", "system", "user_msg", "assistant_msg",
+    "extra_turns", "source", "notes",
+) + REVIEW_COLUMNS
+REGISTRY_COLUMNS = (
+    "name", "family", "domain", "hf_dataset_id", "license",
+    "n_items", "notes",
+)
+EXPORTS_COLUMNS = (
+    "export_id", "domain", "created_at", "n_items", "content_hash",
+    "output_file", "hf_dataset_id",
+)
+
+_ROOT = Path(__file__).resolve().parent.parent.parent  # .../mascarade-eval
+EXPORTS_DIR = _ROOT / "exports"
diff --git a/mascarade-eval/mascarade_eval/grist/cli.py b/mascarade-eval/mascarade_eval/grist/cli.py
@@ -0,0 +1,134 @@
+# mascarade_eval/grist/cli.py
+"""CLI for Grist-backed dataset management: ingest / export / migrate / publish.
+
+Run: python -m mascarade_eval.grist.cli <subcommand> [options]
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+from . import EXPORTS_DIR, TRAINING_COLUMNS, TRAINING_TABLE
+from .client import GristClient, load_doc_id
+from .export import export_domain
+from .ingest import item_key, ingest_rows
+from .migrate import flatten_messages, migrate_domain
+from .publish import publish_snapshot
+
+
+def build_parser() -> argparse.ArgumentParser:
+    ap = argparse.ArgumentParser(prog="grist-dataset", description=__doc__)
+    sub = ap.add_subparsers(dest="command", required=True)
+
+    p_ing = sub.add_parser("ingest", help="insert-only ingest a .jsonl")
+    p_ing.add_argument("--doc")
+    p_ing.add_argument("--jsonl", required=True)
+    p_ing.add_argument("--domain", required=True)
+    p_ing.add_argument("--dry-run", action="store_true")
+
+    p_exp = sub.add_parser("export", help="export a domain to a snapshot")
+    p_exp.add_argument("--doc")
+    p_exp.add_argument("--domain", required=True)
+    p_exp.add_argument("--dry-run", action="store_true")
+    p_exp.add_argument("--include-pending", action="store_true",
+                       help="also export rows still pending review")
+
+    p_mig = sub.add_parser("migrate", help="backfill a domain from HF")
+    p_mig.add_argument("--doc")
+    p_mig.add_argument("--domain", required=True)
+    p_mig.add_argument("--dry-run", action="store_true")
+
+    p_pub = sub.add_parser("publish", help="upload a snapshot to HF")
+    p_pub.add_argument("--snapshot", required=True)
+    p_pub.add_argument("--hf-dataset", required=True)
+    p_pub.add_argument("--filename", required=True)
+
+    sub.add_parser("schema", help="add review columns to existing tables")
+
+    return ap
+
+
+def resolve_doc(doc_arg: str | None) -> str:
+    """Return the doc ID from --doc or the GRIST_DOC_TRAINING env/file value.
+
+    Exits the program (sys.exit) if neither source provides a doc ID.
+    """
+    if doc_arg:
+        return doc_arg
+    doc = load_doc_id("GRIST_DOC_TRAINING")
+    if not doc:
+        sys.exit("no doc ID: pass --doc or set GRIST_DOC_TRAINING")
+    return doc
+
+
+def _ingest_jsonl_rows(domain: str, jsonl_path: str) -> list[dict]:
+    try:
+        text = Path(jsonl_path).read_text(encoding="utf-8")
+    except FileNotFoundError:
+        sys.exit(f"file not found: {jsonl_path}")
+    except UnicodeDecodeError as exc:
+        sys.exit(f"cannot decode {jsonl_path}: {exc}")
+    rows: list[dict] = []
+    for line in text.splitlines():
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            record = json.loads(line)
+        except json.JSONDecodeError as exc:
+            print(f"[warn] skipped malformed line: {exc}", file=sys.stderr)
+            continue
+        flat = flatten_messages(record)
+        rows.append({
+            "item_key": item_key(domain, flat["user_msg"]),
+            "domain": domain,
+            "system": flat["system"],
+            "user_msg": flat["user_msg"],
+            "assistant_msg": flat["assistant_msg"],
+            "extra_turns": flat["extra_turns"],
+            "source": record.get("source", ""),
+            "notes": "",
+            "review_status": "pending",
+        })
+    return rows
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = build_parser().parse_args(argv)
+
+    if args.command == "publish":
+        publish_snapshot(args.snapshot, args.hf_dataset, args.filename)
+        print(f"published {args.snapshot} -> {args.hf_dataset}")
+        return 0
+
+    if args.command == "schema":
+        from . import REVIEW_TARGETS
+        from .schema import migrate_doc
+        for doc_id, tables in REVIEW_TARGETS.items():
+            doc_client = GristClient.from_env(doc_id)
+            report = migrate_doc(doc_client, tables)
+            print(f"schema {doc_id}: {report}")
+        return 0
+
+    client = GristClient.from_env(resolve_doc(args.doc))
+
+    if args.command == "ingest":
+        rows = _ingest_jsonl_rows(args.domain, args.jsonl)
+        report = ingest_rows(client, TRAINING_TABLE, TRAINING_COLUMNS, rows,
+                             dry_run=args.dry_run)
+        print(f"ingest {args.domain}: {report}")
+    elif args.command == "export":
+        report = export_domain(client, args.domain, EXPORTS_DIR,
+                               dry_run=args.dry_run,
+                               include_pending=args.include_pending)
+        print(f"export {args.domain}: {report}")
+    elif args.command == "migrate":
+        report = migrate_domain(client, args.domain, dry_run=args.dry_run)
+        print(f"migrate {args.domain}: {report}")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())