Skip to content

Grist dataset management: pipeline + human-review layer#14

Open
electron-rare wants to merge 24 commits into
mainfrom
worktree-grist-dataset-mgmt
Open

Grist dataset management: pipeline + human-review layer#14
electron-rare wants to merge 24 commits into
mainfrom
worktree-grist-dataset-mgmt

Conversation

@electron-rare
Copy link
Copy Markdown
Collaborator

Summary

Brings the mascarade/ailiance training corpus under Grist as the
canonical source of truth, with a human-verification layer.

Phase 1 — pipeline (mascarade_eval/grist/): a thin Grist REST
client, insert-only ingestion (human edits are never overwritten),
HuggingFace backfill, deterministic hashed export journaled in an
Exports table, snapshot publishing, and a grist-dataset CLI.

Review layer — review columns (review_status / reviewer /
reviewed_at / review_note) on every validation-target table; an
idempotent schema migration with column DDL on the client; a
table-agnostic Grist custom widget (widgets/review-console/) for
one-at-a-time validate/reject; native bench views and a bench form
(operator recipes under docs/). export is now gated on
review_status — only validated rows ship to HF, --include-pending
relaxes it.

The boolean exclure column is replaced everywhere by the
review_status enum.

Test Plan

  • 102 tests pass (uv run python -m pytest)
  • Run python -m mascarade_eval.grist.cli schema against the live docs
  • Apply docs/grist-native-views-recipe.md in the Grist UI
  • Host the widget and run the smoke test in docs/grist-widget-setup.md

Specs: docs/superpowers/specs/2026-05-19-grist-dataset-management-design.md,
docs/superpowers/specs/2026-05-19-grist-review-layer-design.md.

Implement item_key (domain-prefixed SHA1), compute_delta
(skips existing keys + dedupes within batch), and ingest_rows
(ensure-table, fetch existing keys, insert delta only).
dry_run=True computes without writing.
Add FakeClient fixture in conftest.py for reuse in tasks 5/6/9.
canonical_jsonl sorts by item_key and uses sort_keys=True so
the same Grist state always produces the same SHA256 digest.
export_domain filters exclure rows, writes a hashed .jsonl
snapshot, and journals one row to the Exports table.
dry_run=True computes the report without any I/O.
export_domain now ships only rows with review_status=validated;
--include-pending re-includes pending rows. Completes the exclure ->
review_status amendment across the round-trip test and the package
README, which the plan's affected-files list had missed.
Copilot AI review requested due to automatic review settings May 19, 2026 11:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduit un pipeline de gestion de dataset basé sur Grist comme source de vérité pour le corpus d’entraînement mascarade, avec une couche de revue humaine (colonnes de review + widget de validation) et un export déterministe journalisé/publishable vers HuggingFace.

Changes:

  • Ajout d’un client Grist REST + CLI (ingest / migrate HF→Grist / export déterministe / publish vers HF / schema migration).
  • Mise en place d’un modèle de revue (review_status, reviewer, reviewed_at, review_note) et d’un widget Grist “Review Console”.
  • Ajout d’une suite de tests unitaires couvrant client/migration/ingest/export/publish et transforms.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
mascarade-eval/widgets/review-console/index.html Widget Grist custom pour valider/rejeter/needs_fix/skip en “one-at-a-time”.
mascarade-eval/tests/conftest.py FakeClient in-memory pour tester le pipeline Grist sans réseau.
mascarade-eval/tests/test_grist_schema.py Tests de l’ajout idempotent des colonnes de review.
mascarade-eval/tests/test_grist_roundtrip.py Tests round-trip migrate→export + idempotence d’ingest.
mascarade-eval/tests/test_grist_publish.py Tests de publish_snapshot (uploader injectable).
mascarade-eval/tests/test_grist_migrate_transforms.py Tests des transforms flatten/rebuild messages.
mascarade-eval/tests/test_grist_migrate_domain.py Tests de migrate_domain (ingest + registry).
mascarade-eval/tests/test_grist_ingest.py Tests item_key/compute_delta/ingest_rows (insert-only).
mascarade-eval/tests/test_grist_export.py Tests export déterministe + gating sur review_status + rollback on failure.
mascarade-eval/tests/test_grist_constants.py Tests des constantes (tables, colonnes, review).
mascarade-eval/tests/test_grist_client.py Tests du client REST (transport injectable, payloads).
mascarade-eval/tests/test_grist_cli.py Tests parsing CLI + commande schema/include-pending.
mascarade-eval/mascarade_eval/grist/schema.py Migration idempotente des colonnes de review sur tables existantes.
mascarade-eval/mascarade_eval/grist/README.md Documentation d’usage de la CLI et du modèle review/export.
mascarade-eval/mascarade_eval/grist/publish.py Upload d’un snapshot exporté vers un repo HF dataset.
mascarade-eval/mascarade_eval/grist/migrate.py Backfill HF→Grist + transforms messages→colonnes éditables.
mascarade-eval/mascarade_eval/grist/ingest.py Ingestion insert-only + delta via item_key.
mascarade-eval/mascarade_eval/grist/export.py Export JSONL canonique, hashé, journalisé dans table Exports.
mascarade-eval/mascarade_eval/grist/client.py Client REST Grist (tables/colonnes/records), types Choice/Int/Text.
mascarade-eval/mascarade_eval/grist/cli.py Entrypoint CLI (ingest/export/migrate/publish/schema).
mascarade-eval/mascarade_eval/grist/init.py Constantes (docs, tables, colonnes, review targets, exports dir).
mascarade-eval/docs/grist-widget-setup.md Procédure d’hébergement/wiring/smoke test du widget.
mascarade-eval/docs/grist-native-views-recipe.md Recette opérateur pour vues/formatting/form dans l’UI Grist.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

<div id="empty" hidden>Aucune ligne dans cette table.</div>
<script>
"use strict";
const REVIEWER = "clems"; // adjust to the reviewer's Grist choice value
{ name: "secondary", title: "Référence (reference / assistant_msg)",
optional: true },
{ name: "context", title: "Contexte (domain, source)",
optional: true, allowMultiple: true },
Comment on lines +23 to +27
# Human-review columns appended to every validation-target table.
REVIEW_COLUMNS = ("review_status", "reviewer", "reviewed_at", "review_note")
REVIEW_STATUSES = ("pending", "validated", "rejected", "needs_fix")
REVIEWER_CHOICES = ("clems",)

"""Inverse of flatten_messages: return {"messages": [...]}."""
extra = row.get("extra_turns") or ""
if extra:
return {"messages": json.loads(extra)}
Comment on lines +36 to +39
client.ensure_table(table, columns)
existing = {r[key_field] for r in client.fetch_records(table)
if key_field in r}
delta = compute_delta(existing, rows, key_field)
Comment on lines +53 to +55
rows = [r for r in client.fetch_records(TRAINING_TABLE)
if r.get("domain") == domain
and _is_exportable(r, include_pending)]
Comment on lines +67 to +94
try:
text = Path(jsonl_path).read_text(encoding="utf-8")
except FileNotFoundError:
sys.exit(f"file not found: {jsonl_path}")
except UnicodeDecodeError as exc:
sys.exit(f"cannot decode {jsonl_path}: {exc}")
rows: list[dict] = []
for line in text.splitlines():
line = line.strip()
if not line:
continue
try:
record = json.loads(line)
except json.JSONDecodeError as exc:
print(f"[warn] skipped malformed line: {exc}", file=sys.stderr)
continue
flat = flatten_messages(record)
rows.append({
"item_key": item_key(domain, flat["user_msg"]),
"domain": domain,
"system": flat["system"],
"user_msg": flat["user_msg"],
"assistant_msg": flat["assistant_msg"],
"extra_turns": flat["extra_turns"],
"source": record.get("source", ""),
"notes": "",
"review_status": "pending",
})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants