Grist dataset management: pipeline + human-review layer#14
Open
electron-rare wants to merge 24 commits into
Open
Grist dataset management: pipeline + human-review layer#14electron-rare wants to merge 24 commits into
electron-rare wants to merge 24 commits into
Conversation
Implement item_key (domain-prefixed SHA1), compute_delta (skips existing keys + dedupes within batch), and ingest_rows (ensure-table, fetch existing keys, insert delta only). dry_run=True computes without writing. Add FakeClient fixture in conftest.py for reuse in tasks 5/6/9.
canonical_jsonl sorts by item_key and uses sort_keys=True so the same Grist state always produces the same SHA256 digest. export_domain filters exclure rows, writes a hashed .jsonl snapshot, and journals one row to the Exports table. dry_run=True computes the report without any I/O.
export_domain now ships only rows with review_status=validated; --include-pending re-includes pending rows. Completes the exclure -> review_status amendment across the round-trip test and the package README, which the plan's affected-files list had missed.
There was a problem hiding this comment.
Pull request overview
Introduit un pipeline de gestion de dataset basé sur Grist comme source de vérité pour le corpus d’entraînement mascarade, avec une couche de revue humaine (colonnes de review + widget de validation) et un export déterministe journalisé/publishable vers HuggingFace.
Changes:
- Ajout d’un client Grist REST + CLI (
ingest/migrateHF→Grist /exportdéterministe /publishvers HF /schemamigration). - Mise en place d’un modèle de revue (
review_status,reviewer,reviewed_at,review_note) et d’un widget Grist “Review Console”. - Ajout d’une suite de tests unitaires couvrant client/migration/ingest/export/publish et transforms.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| mascarade-eval/widgets/review-console/index.html | Widget Grist custom pour valider/rejeter/needs_fix/skip en “one-at-a-time”. |
| mascarade-eval/tests/conftest.py | FakeClient in-memory pour tester le pipeline Grist sans réseau. |
| mascarade-eval/tests/test_grist_schema.py | Tests de l’ajout idempotent des colonnes de review. |
| mascarade-eval/tests/test_grist_roundtrip.py | Tests round-trip migrate→export + idempotence d’ingest. |
| mascarade-eval/tests/test_grist_publish.py | Tests de publish_snapshot (uploader injectable). |
| mascarade-eval/tests/test_grist_migrate_transforms.py | Tests des transforms flatten/rebuild messages. |
| mascarade-eval/tests/test_grist_migrate_domain.py | Tests de migrate_domain (ingest + registry). |
| mascarade-eval/tests/test_grist_ingest.py | Tests item_key/compute_delta/ingest_rows (insert-only). |
| mascarade-eval/tests/test_grist_export.py | Tests export déterministe + gating sur review_status + rollback on failure. |
| mascarade-eval/tests/test_grist_constants.py | Tests des constantes (tables, colonnes, review). |
| mascarade-eval/tests/test_grist_client.py | Tests du client REST (transport injectable, payloads). |
| mascarade-eval/tests/test_grist_cli.py | Tests parsing CLI + commande schema/include-pending. |
| mascarade-eval/mascarade_eval/grist/schema.py | Migration idempotente des colonnes de review sur tables existantes. |
| mascarade-eval/mascarade_eval/grist/README.md | Documentation d’usage de la CLI et du modèle review/export. |
| mascarade-eval/mascarade_eval/grist/publish.py | Upload d’un snapshot exporté vers un repo HF dataset. |
| mascarade-eval/mascarade_eval/grist/migrate.py | Backfill HF→Grist + transforms messages→colonnes éditables. |
| mascarade-eval/mascarade_eval/grist/ingest.py | Ingestion insert-only + delta via item_key. |
| mascarade-eval/mascarade_eval/grist/export.py | Export JSONL canonique, hashé, journalisé dans table Exports. |
| mascarade-eval/mascarade_eval/grist/client.py | Client REST Grist (tables/colonnes/records), types Choice/Int/Text. |
| mascarade-eval/mascarade_eval/grist/cli.py | Entrypoint CLI (ingest/export/migrate/publish/schema). |
| mascarade-eval/mascarade_eval/grist/init.py | Constantes (docs, tables, colonnes, review targets, exports dir). |
| mascarade-eval/docs/grist-widget-setup.md | Procédure d’hébergement/wiring/smoke test du widget. |
| mascarade-eval/docs/grist-native-views-recipe.md | Recette opérateur pour vues/formatting/form dans l’UI Grist. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| <div id="empty" hidden>Aucune ligne dans cette table.</div> | ||
| <script> | ||
| "use strict"; | ||
| const REVIEWER = "clems"; // adjust to the reviewer's Grist choice value |
| { name: "secondary", title: "Référence (reference / assistant_msg)", | ||
| optional: true }, | ||
| { name: "context", title: "Contexte (domain, source)", | ||
| optional: true, allowMultiple: true }, |
Comment on lines
+23
to
+27
| # Human-review columns appended to every validation-target table. | ||
| REVIEW_COLUMNS = ("review_status", "reviewer", "reviewed_at", "review_note") | ||
| REVIEW_STATUSES = ("pending", "validated", "rejected", "needs_fix") | ||
| REVIEWER_CHOICES = ("clems",) | ||
|
|
| """Inverse of flatten_messages: return {"messages": [...]}.""" | ||
| extra = row.get("extra_turns") or "" | ||
| if extra: | ||
| return {"messages": json.loads(extra)} |
Comment on lines
+36
to
+39
| client.ensure_table(table, columns) | ||
| existing = {r[key_field] for r in client.fetch_records(table) | ||
| if key_field in r} | ||
| delta = compute_delta(existing, rows, key_field) |
Comment on lines
+53
to
+55
| rows = [r for r in client.fetch_records(TRAINING_TABLE) | ||
| if r.get("domain") == domain | ||
| and _is_exportable(r, include_pending)] |
Comment on lines
+67
to
+94
| try: | ||
| text = Path(jsonl_path).read_text(encoding="utf-8") | ||
| except FileNotFoundError: | ||
| sys.exit(f"file not found: {jsonl_path}") | ||
| except UnicodeDecodeError as exc: | ||
| sys.exit(f"cannot decode {jsonl_path}: {exc}") | ||
| rows: list[dict] = [] | ||
| for line in text.splitlines(): | ||
| line = line.strip() | ||
| if not line: | ||
| continue | ||
| try: | ||
| record = json.loads(line) | ||
| except json.JSONDecodeError as exc: | ||
| print(f"[warn] skipped malformed line: {exc}", file=sys.stderr) | ||
| continue | ||
| flat = flatten_messages(record) | ||
| rows.append({ | ||
| "item_key": item_key(domain, flat["user_msg"]), | ||
| "domain": domain, | ||
| "system": flat["system"], | ||
| "user_msg": flat["user_msg"], | ||
| "assistant_msg": flat["assistant_msg"], | ||
| "extra_turns": flat["extra_turns"], | ||
| "source": record.get("source", ""), | ||
| "notes": "", | ||
| "review_status": "pending", | ||
| }) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings the mascarade/ailiance training corpus under Grist as the
canonical source of truth, with a human-verification layer.
Phase 1 — pipeline (
mascarade_eval/grist/): a thin Grist RESTclient, insert-only ingestion (human edits are never overwritten),
HuggingFace backfill, deterministic hashed export journaled in an
Exportstable, snapshot publishing, and agrist-datasetCLI.Review layer — review columns (
review_status/reviewer/reviewed_at/review_note) on every validation-target table; anidempotent
schemamigration with column DDL on the client; atable-agnostic Grist custom widget (
widgets/review-console/) forone-at-a-time validate/reject; native bench views and a bench form
(operator recipes under
docs/).exportis now gated onreview_status— onlyvalidatedrows ship to HF,--include-pendingrelaxes it.
The boolean
exclurecolumn is replaced everywhere by thereview_statusenum.Test Plan
uv run python -m pytest)python -m mascarade_eval.grist.cli schemaagainst the live docsdocs/grist-native-views-recipe.mdin the Grist UIdocs/grist-widget-setup.mdSpecs:
docs/superpowers/specs/2026-05-19-grist-dataset-management-design.md,docs/superpowers/specs/2026-05-19-grist-review-layer-design.md.