Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
8554b0b
feat(grist): add constants and init scaffold
electron-rare May 19, 2026
42e7af7
feat(grist): add Grist REST client
electron-rare May 19, 2026
14dae57
feat(grist): message flatten/rebuild transforms
electron-rare May 19, 2026
ae73ef7
feat(grist): add insert-only ingestion core
electron-rare May 19, 2026
a168b17
feat(grist): add deterministic export
electron-rare May 19, 2026
a34041e
fix(grist): drop orphan snapshot on export failure
electron-rare May 19, 2026
2d572ef
feat(grist): wire HF backfill into ingestion
electron-rare May 19, 2026
0d86e67
feat(grist): add HuggingFace snapshot publisher
electron-rare May 19, 2026
2ab86ba
feat(grist): add dataset management CLI
electron-rare May 19, 2026
50c3c0e
fix(grist): clean exit on missing ingest file
electron-rare May 19, 2026
3423aaa
test(grist): add round-trip check and docs
electron-rare May 19, 2026
af8a305
feat(grist): add review-status constants
electron-rare May 19, 2026
40bd1b2
refactor(grist): producers write review_status
electron-rare May 19, 2026
f4b6158
feat(grist): add column DDL to client
electron-rare May 19, 2026
4a3a071
feat(grist): add review-column schema migration
electron-rare May 19, 2026
303bc13
feat(grist): add schema CLI subcommand
electron-rare May 19, 2026
f942b61
feat(grist): gate export on review_status
electron-rare May 19, 2026
7546cee
docs(grist): add native views and form recipe
electron-rare May 19, 2026
5f1ba2a
feat(grist): add review console widget
electron-rare May 19, 2026
038ba88
docs(grist): add widget setup recipe
electron-rare May 19, 2026
e680d67
docs(grist): point widget recipe at live URL
electron-rare May 19, 2026
6e45058
docs(grist): host widget on admin.ailiance.fr
electron-rare May 19, 2026
794e849
fix(grist): widget reads all columns for queue
electron-rare May 19, 2026
8a0a8cb
fix(grist): widget reads full table via docApi
electron-rare May 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions mascarade-eval/docs/grist-native-views-recipe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Grist native review views — operator recipe

Manual Grist UI steps for the parts of the human-review layer that are
not API-scriptable. Run the schema migration first
(`python -m mascarade_eval.grist.cli schema`) so the review columns
exist.

## 1. review_status choice colors

For each table carrying `review_status` (`Heldout_Items`, `Datasets`
in doc *ailiance-llm-workflow*; `Mascarade_Eval_Items`,
`Bench_31_domains` in doc *mascarade-data*, plus `Mascarade_Training`):

1. Open the table, click the `review_status` column header → **Column
options**.
2. Under **CHOICES**, confirm the four values are present: `pending`,
`validated`, `rejected`, `needs_fix`.
3. Set the chip color of each: pending = grey `#E8E8E8`,
validated = green `#C6E5B3`, rejected = red `#F2B5B5`,
needs_fix = amber `#F5D9A6`.

## 2. Bench_31_domains review page (doc mascarade-data)

1. **Add Page** → name it `Bench review`.
2. Add a **Table** widget bound to `Bench_31_domains`.
3. Add a filter on `review_status` and a second on `domain`; save the
view so the filters persist.
4. Conditional formatting (column header → **Column options** →
**Add conditional style**):
- `judge_score`: red when `$judge_score < 50`, amber when
`$judge_score < 70`, green otherwise.
- `validator_score`: red when `$validator_score < 50`, green when
`$validator_score >= 70`.
- `ppl`: red when `$ppl > 20`, amber when `$ppl > 10`.
5. Add a **Card List** widget on the same page bound to
`Bench_31_domains`, linked to the table widget, showing `model`,
`domain`, `judge_score`, `judge_rationale`, `validator_score`,
`review_status`, `reviewer`, `review_note` — this is the per-row
review surface.

## 3. Datasets review view (doc ailiance-llm-workflow)

1. **Add Page** → `Datasets review`.
2. Add a **Table** widget bound to `Datasets`, filtered on
`review_status`.
3. Show `domain`, `name`, `n_rows`, `license`, `hf_dataset_id`,
`review_status`, `reviewer`, `review_note`.

## 4. Read-only scoreboards

For `Bench_public`, `Bench_niches_ppl`, `Bench_gateway`,
`Bench_lift_v1`, `Bench_lift_v2`: add one page `Scoreboards` with a
Table widget per table. Apply conditional formatting on the score
columns (green high / red low) as in section 2. No review columns —
these tables are reference only.

## 5. Bench entry form (doc mascarade-data)

1. **Add Page** → `Bench entry`.
2. Add a **Form** widget bound to `Bench_31_domains`.
3. Keep only these fields on the form: `model`, `domain`, `ppl`,
`task_score`, `task_metric`, `judge_score`, `source`, `date`.
Remove pipeline-only fields (`validator_image_digest`, `run_id`,
`host`, `runtime_s`, `tokens_per_s`, …).
4. Click **Publish** and copy the share URL — this is the manual
bench-result entry form. Automated runs keep writing via the API.

## 6. Clean-up

Delete the empty default `Table1` (columns A/B/C) in each of the three
documents.
81 changes: 81 additions & 0 deletions mascarade-eval/docs/grist-widget-setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Review Console widget — hosting, wiring, smoke test

The widget at `widgets/review-console/index.html` is a static file. It
must be served over HTTPS and registered in Grist as a Custom URL
widget.

## 1. Hosting

The widget is served by a dedicated `review-widget` nginx container in
`/home/electron/saillant-sites/` on electron-server, exposed through
traefik on the existing `admin.ailiance.fr` hostname under `/review`
(a `Host && PathPrefix` router — no new cloudflared hostname needed).

Compose service (`saillant-sites/docker-compose.yml`):

```yaml
review-widget:
image: nginx:alpine
container_name: review-widget
restart: unless-stopped
networks: [traefik]
labels:
- traefik.enable=true
- traefik.docker.network=traefik
- traefik.http.routers.review-admin.rule=Host(`admin.ailiance.fr`) && PathPrefix(`/review`)
- traefik.http.routers.review-admin.entrypoints=websecure
- traefik.http.routers.review-admin.tls.certresolver=letsencrypt
- traefik.http.routers.review-admin.service=review-widget
- traefik.http.services.review-widget.loadbalancer.server.port=80
volumes:
- ./train-static:/usr/share/nginx/html:ro
```

The widget file lives at `saillant-sites/train-static/review/index.html`.
Redeploy after editing the widget (nginx serves the mount live, no
restart):

```bash
scp widgets/review-console/index.html \
electron-server:/home/electron/saillant-sites/train-static/review/index.html
```

Live URL (verified `HTTP 200`): `https://admin.ailiance.fr/review/`

## 2. Add a review page in Grist

In doc *ailiance-llm-workflow*:

1. **Add Page** → `Heldout review`.
2. Add a **Custom** widget. Select **Custom URL** and paste
`https://admin.ailiance.fr/review/`.
3. Bind the widget to the `Heldout_Items` table.
4. When prompted, grant the widget **Full document access** (it must
write the review columns).
5. Open the widget's **Column mapping**:
- `primary` → `prompt`
- `secondary` → `reference`
- `context` → `domain`, `source`

Repeat for the future `Mascarade_Training` table (map `primary` →
`user_msg`, `secondary` → `assistant_msg`) and for
`Mascarade_Eval_Items` in doc *mascarade-data* (map `primary` →
`question`, `secondary` → `reference`).

## 3. Smoke-test checklist

On the `Heldout review` page:

- [ ] The progress line shows `revus 0 / 400 — en attente 400`.
- [ ] The first pending item's prompt and reference render in full.
- [ ] Pressing `V` writes `review_status = validated`, `reviewer`,
`reviewed_at` (ISO-8601) and advances to the next item; the
progress counter increments.
- [ ] Pressing `R` and `F` write `rejected` / `needs_fix`.
- [ ] A value typed in the note field lands in `review_note` and the
field clears after the decision.
- [ ] `S` / `→` skips without writing.
- [ ] After every pending row is decided, the widget shows
"Tous les items en attente sont revus ✓".
- [ ] Re-running `python -m mascarade_eval.grist.cli export --domain
<d>` ships only the rows marked `validated`.
36 changes: 36 additions & 0 deletions mascarade-eval/mascarade_eval/grist/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# mascarade_eval.grist — Grist-backed dataset management

Grist is the canonical source of truth for the mascarade training corpus.
Mining ingests in insert-only mode (edits made in Grist are never
overwritten); training and HF publication consume a deterministic export.

## One-time setup

1. Create an empty Grist doc "Mascarade Training" at grist.saillant.cc.
2. Add `GRIST_DOC_TRAINING=<doc-id>` to `~/.config/electron-rare/grist.env`
(the file already holds `GRIST_API_KEY`).

## Commands

Run with `uv run python -m mascarade_eval.grist.cli <subcommand>`.

- `migrate --domain kicad` — backfill a domain's HF training data into
Grist (insert-only). Run once per domain to seed the doc.
- `ingest --domain kicad --jsonl mine.jsonl` — insert-only ingest of a
new mining/curation file. Existing rows are never touched.
- `export --domain kicad` — write a hashed `.jsonl` snapshot to
`exports/` and log a row in the `Exports` table.
- `publish --snapshot exports/kicad.<ts>.jsonl --hf-dataset
Ailiance-fr/mascarade-kicad-dataset --filename kicad_chat.jsonl` —
upload a snapshot to its HF dataset repo.

Add `--dry-run` to `ingest`, `export`, or `migrate` to preview without
writing to Grist or disk.

## Human review

Edit rows directly in the Grist UI. Each row carries a `review_status`
(`pending` / `validated` / `rejected` / `needs_fix`); `export` ships only
`validated` rows. Pass `--include-pending` to `export` to also include
rows still awaiting review. See `docs/grist-native-views-recipe.md` and
`docs/grist-widget-setup.md` for the review surfaces.
48 changes: 48 additions & 0 deletions mascarade-eval/mascarade_eval/grist/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# mascarade_eval/grist/__init__.py
"""Grist-backed dataset management for the mascarade training corpus.

Grist is the canonical source of truth. Mining ingests in insert-only
mode (human edits in Grist are never overwritten); training and HF
publication consume a deterministic export of human-validated rows.
"""
from pathlib import Path

GRIST_BASE = "https://grist.saillant.cc/api"

# Known existing docs. The training doc ID is provided at runtime via
# --doc or the GRIST_DOC_TRAINING env/file value.
DOC_HELDOUT = "eGbbrpzN3TeLq3sUd2YFA2" # ailiance-llm-workflow
DOC_MASCARADE = "dhyrySCayizD1PNqCNhCPN" # mascarade-data

KEY_FILE = Path.home() / ".config" / "electron-rare" / "grist.env"

TRAINING_TABLE = "Mascarade_Training"
REGISTRY_TABLE = "Datasets_Registry"
EXPORTS_TABLE = "Exports"

# Human-review columns appended to every validation-target table.
REVIEW_COLUMNS = ("review_status", "reviewer", "reviewed_at", "review_note")
REVIEW_STATUSES = ("pending", "validated", "rejected", "needs_fix")
REVIEWER_CHOICES = ("clems",)

Comment on lines +23 to +27
# Existing tables that receive the review columns, keyed by doc ID.
REVIEW_TARGETS = {
DOC_HELDOUT: ("Heldout_Items", "Datasets"),
DOC_MASCARADE: ("Mascarade_Eval_Items", "Bench_31_domains"),
}

TRAINING_COLUMNS = (
"item_key", "domain", "system", "user_msg", "assistant_msg",
"extra_turns", "source", "notes",
) + REVIEW_COLUMNS
REGISTRY_COLUMNS = (
"name", "family", "domain", "hf_dataset_id", "license",
"n_items", "notes",
)
EXPORTS_COLUMNS = (
"export_id", "domain", "created_at", "n_items", "content_hash",
"output_file", "hf_dataset_id",
)

_ROOT = Path(__file__).resolve().parent.parent.parent # .../mascarade-eval
EXPORTS_DIR = _ROOT / "exports"
134 changes: 134 additions & 0 deletions mascarade-eval/mascarade_eval/grist/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# mascarade_eval/grist/cli.py
"""CLI for Grist-backed dataset management: ingest / export / migrate / publish.

Run: python -m mascarade_eval.grist.cli <subcommand> [options]
"""
from __future__ import annotations

import argparse
import json
import sys
from pathlib import Path

from . import EXPORTS_DIR, TRAINING_COLUMNS, TRAINING_TABLE
from .client import GristClient, load_doc_id
from .export import export_domain
from .ingest import item_key, ingest_rows
from .migrate import flatten_messages, migrate_domain
from .publish import publish_snapshot


def build_parser() -> argparse.ArgumentParser:
ap = argparse.ArgumentParser(prog="grist-dataset", description=__doc__)
sub = ap.add_subparsers(dest="command", required=True)

p_ing = sub.add_parser("ingest", help="insert-only ingest a .jsonl")
p_ing.add_argument("--doc")
p_ing.add_argument("--jsonl", required=True)
p_ing.add_argument("--domain", required=True)
p_ing.add_argument("--dry-run", action="store_true")

p_exp = sub.add_parser("export", help="export a domain to a snapshot")
p_exp.add_argument("--doc")
p_exp.add_argument("--domain", required=True)
p_exp.add_argument("--dry-run", action="store_true")
p_exp.add_argument("--include-pending", action="store_true",
help="also export rows still pending review")

p_mig = sub.add_parser("migrate", help="backfill a domain from HF")
p_mig.add_argument("--doc")
p_mig.add_argument("--domain", required=True)
p_mig.add_argument("--dry-run", action="store_true")

p_pub = sub.add_parser("publish", help="upload a snapshot to HF")
p_pub.add_argument("--snapshot", required=True)
p_pub.add_argument("--hf-dataset", required=True)
p_pub.add_argument("--filename", required=True)

sub.add_parser("schema", help="add review columns to existing tables")

return ap


def resolve_doc(doc_arg: str | None) -> str:
"""Return the doc ID from --doc or the GRIST_DOC_TRAINING env/file value.

Exits the program (sys.exit) if neither source provides a doc ID.
"""
if doc_arg:
return doc_arg
doc = load_doc_id("GRIST_DOC_TRAINING")
if not doc:
sys.exit("no doc ID: pass --doc or set GRIST_DOC_TRAINING")
return doc


def _ingest_jsonl_rows(domain: str, jsonl_path: str) -> list[dict]:
try:
text = Path(jsonl_path).read_text(encoding="utf-8")
except FileNotFoundError:
sys.exit(f"file not found: {jsonl_path}")
except UnicodeDecodeError as exc:
sys.exit(f"cannot decode {jsonl_path}: {exc}")
rows: list[dict] = []
for line in text.splitlines():
line = line.strip()
if not line:
continue
try:
record = json.loads(line)
except json.JSONDecodeError as exc:
print(f"[warn] skipped malformed line: {exc}", file=sys.stderr)
continue
flat = flatten_messages(record)
rows.append({
"item_key": item_key(domain, flat["user_msg"]),
"domain": domain,
"system": flat["system"],
"user_msg": flat["user_msg"],
"assistant_msg": flat["assistant_msg"],
"extra_turns": flat["extra_turns"],
"source": record.get("source", ""),
"notes": "",
"review_status": "pending",
})
Comment on lines +67 to +94
return rows


def main(argv: list[str] | None = None) -> int:
args = build_parser().parse_args(argv)

if args.command == "publish":
publish_snapshot(args.snapshot, args.hf_dataset, args.filename)
print(f"published {args.snapshot} -> {args.hf_dataset}")
return 0

if args.command == "schema":
from . import REVIEW_TARGETS
from .schema import migrate_doc
for doc_id, tables in REVIEW_TARGETS.items():
doc_client = GristClient.from_env(doc_id)
report = migrate_doc(doc_client, tables)
print(f"schema {doc_id}: {report}")
return 0

client = GristClient.from_env(resolve_doc(args.doc))

if args.command == "ingest":
rows = _ingest_jsonl_rows(args.domain, args.jsonl)
report = ingest_rows(client, TRAINING_TABLE, TRAINING_COLUMNS, rows,
dry_run=args.dry_run)
print(f"ingest {args.domain}: {report}")
elif args.command == "export":
report = export_domain(client, args.domain, EXPORTS_DIR,
dry_run=args.dry_run,
include_pending=args.include_pending)
print(f"export {args.domain}: {report}")
elif args.command == "migrate":
report = migrate_domain(client, args.domain, dry_run=args.dry_run)
print(f"migrate {args.domain}: {report}")
return 0


if __name__ == "__main__":
raise SystemExit(main())
Loading