diff --git a/packages/vector-caliper/baselines/DOGFOOD-NOTES-2026-06-09.md b/packages/vector-caliper/baselines/DOGFOOD-NOTES-2026-06-09.md new file mode 100644 index 0000000..4bc08e9 --- /dev/null +++ b/packages/vector-caliper/baselines/DOGFOOD-NOTES-2026-06-09.md @@ -0,0 +1,48 @@ +# Dogfood notes — first real-data run (qwen-lora-tallow-fen-v1, 2026-06-09) + +Friction and findings from feeding VectorCaliper its first production training run. +Input for the next working session; nothing here was patched upstream yet. + +## 1. Determinism guarantee is not actually byte-deterministic +`src/projection/engine.ts` — the PCA power-iteration seeds eigenvector init with raw +`Math.random()` while the seeded Mulberry32 (`createSeededRandom`) sits unused in the +same file. PCA usually converges to the same components up to sign, but the README's +determinism promise ("deterministic, reproducible rendering") is not guaranteed at the +byte level. Fix: thread the seeded RNG into `pca()`. + +## 2. Raw Node ESM cannot consume the package +`tsc` emits the source's extensionless/directory relative imports verbatim +(`from './schema'`, `from './types/state'`); Node ESM rejects both +(`ERR_UNSUPPORTED_DIR_IMPORT`). Rendering required patching 32 dist files to append +`.js` / `/index.js`. Fix: `moduleResolution: "NodeNext"` + explicit `.js` extensions +in source imports. Related: no `dist/` ships and `files` only includes `dist/` — a +consumer must build from source with devDeps. + +## 3. Naming/metadata drift +- README installs `@mcp-tool-shop/vector-caliper`; package.json says + `@mcptoolshop/vector-caliper` (and `"private": true` — not published at all). +- `repository.url` points at `mcp-tool-shop-org/VectorCaliper.git`; the source lives + in `mcp-tool-shop-org/prototypes`. + +## 4. API fit for diffusion/LoRA training runs +The schema REQUIRES `uncertainty.{entropy, margin, calibration}` — natural for +classifiers, nonexistent for diffusion LoRA runs. This baseline used documented +proxies (entropy of the normalized centroid-similarity distribution; style-vs-photo +text-anchor contrast gap as margin; similarity std as calibration). Options: +make the uncertainty group optional like `dynamics`, or ship a domain preset +("diffusion-style-lora") that defines blessed proxies so cross-run baselines stay +comparable. + +## 5. The demo bypasses the product +`demo/canonical-demo.ts` hand-rolls its SVG and uses a flat ad-hoc JSON, bypassing +ProjectionEngine/SemanticMapper/SceneBuilder/SVGRenderer entirely — so the checked-in +canonical output exercises none of the public pipeline. This baseline's SVG is, as far +as the dogfood could tell, the first artifact rendered through the real pipeline. + +## 6. What worked +Zero-dep pure-TS core imported cleanly once dist was patched; all 8 states passed +`createModelState` validation on the first attempt (the capture script pre-clamped +its [0,1] proxies specifically because the factories fail closed — the contract +shaped the producer, which is the point of a strict schema); budget classes were a +non-issue at n=8; the semantic encoding (hue←effdim, radius←spread) makes the +step-2000 cloud collapse visible in the SVG without reading any numbers. diff --git a/packages/vector-caliper/baselines/README.md b/packages/vector-caliper/baselines/README.md new file mode 100644 index 0000000..d888894 --- /dev/null +++ b/packages/vector-caliper/baselines/README.md @@ -0,0 +1,28 @@ +# Baselines + +Real measured trajectories from production training runs, shaped to the +`createModelState()` contract. These are VectorCaliper's ground truth for the +"establish baselines → predict/hypothesize" roadmap: once several runs are in, +early-trajectory geometry (e.g. spread-collapse rate by step 500) can be tested +as a predictor of where the binding peak lands. + +## qwen-lora-tallow-fen-v1 (2026-06-09) — first real-data baseline + +A Qwen-Image rank-16 style LoRA (`tallow_fen_style_v1`, RTX 5090, 2000 steps, +8 checkpoints). Per checkpoint: a fixed 12-prompt eval grid was generated and the +CLIP ViT-B/32 embedding cloud measured. Field mapping and uncertainty PROXIES are +documented in the capture script docstring (a diffusion-LoRA run has no native +classifier entropy/margin/ECE — see dogfood notes #4). + +**What this baseline demonstrates** (the headline for the tool's thesis): +between steps 1750→2000, `performance.accuracy` (CLIP-sim to the style centroid) +ROSE 0.7796→0.7937 while `geometry.anisotropy` spiked 8.2→12.5 and +`geometry.effectiveDimension` collapsed 7.0→6.76. The similarity gain came from a +collapsing, less-diverse embedding cloud — overfit masquerading as improvement. +Performance-only checkpoint selection picks step 2000; geometry+performance picks +step 1250 (also the CMMD minimum, 0.1351, and the human looked-at choice, which +saw the same overfit as monochrome drift on neutral subjects). **The combined view +caught what the single metric missed.** + +- `qwen-lora-tallow-fen-v1.json` — 8 states (capture: `E:/AI/training/_caliper_capture.py` on the rig) +- `qwen-lora-tallow-fen-v1.svg` — rendered through the real pipeline (ProjectionEngine → SceneBuilder → SVGRenderer) diff --git a/packages/vector-caliper/baselines/qwen-lora-tallow-fen-v1.json b/packages/vector-caliper/baselines/qwen-lora-tallow-fen-v1.json new file mode 100644 index 0000000..a277fd8 --- /dev/null +++ b/packages/vector-caliper/baselines/qwen-lora-tallow-fen-v1.json @@ -0,0 +1,235 @@ +{ + "run": "tallow_fen_style_v1", + "captured": "2026-06-09", + "reference": { + "dir": "E:\\AI\\training\\dataset_tallow_fen", + "n": 44, + "sigma": 0.47900169753130234 + }, + "states": [ + { + "id": "tallow-fen-v1-step-250", + "time": 250, + "geometry": { + "effectiveDimension": 7.757477402078029, + "anisotropy": 6.621865643989259, + "spread": 0.8430308699607849, + "density": 7.102524389014206 + }, + "uncertainty": { + "entropy": 3.576477527618408, + "margin": 0.01452073804102838, + "calibration": 0.08244698494672775 + }, + "performance": { + "accuracy": 0.7647979855537415, + "loss": 0.16690856218338013 + }, + "metadata": { + "source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)", + "version": "1.0.0", + "tags": [ + "n=12", + "clip-vit-b32", + "proxy-uncertainty" + ] + } + }, + { + "id": "tallow-fen-v1-step-500", + "time": 500, + "geometry": { + "effectiveDimension": 7.180756051976849, + "anisotropy": 7.760998726659996, + "spread": 0.8325328826904297, + "density": 7.181163759334916 + }, + "uncertainty": { + "entropy": 3.575670003890991, + "margin": 0.025575989857316017, + "calibration": 0.08706717193126678 + }, + "performance": { + "accuracy": 0.7735676169395447, + "loss": 0.15337598323822021 + }, + "metadata": { + "source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)", + "version": "1.0.0", + "tags": [ + "n=12", + "clip-vit-b32", + "proxy-uncertainty" + ] + } + }, + { + "id": "tallow-fen-v1-step-750", + "time": 750, + "geometry": { + "effectiveDimension": 7.409353421390496, + "anisotropy": 6.979115957964436, + "spread": 0.8266791701316833, + "density": 7.201104059214372 + }, + "uncertainty": { + "entropy": 3.5770695209503174, + "margin": 0.03614329965785146, + "calibration": 0.08109613507986069 + }, + "performance": { + "accuracy": 0.7801888585090637, + "loss": 0.14684104919433594 + }, + "metadata": { + "source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)", + "version": "1.0.0", + "tags": [ + "n=12", + "clip-vit-b32", + "proxy-uncertainty" + ] + } + }, + { + "id": "tallow-fen-v1-step-1000", + "time": 1000, + "geometry": { + "effectiveDimension": 7.3610166758567885, + "anisotropy": 7.320895825606809, + "spread": 0.8148157596588135, + "density": 7.246805784660056 + }, + "uncertainty": { + "entropy": 3.577988862991333, + "margin": 0.041199419647455215, + "calibration": 0.07699479907751083 + }, + "performance": { + "accuracy": 0.788444459438324, + "loss": 0.13987720012664795 + }, + "metadata": { + "source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)", + "version": "1.0.0", + "tags": [ + "n=12", + "clip-vit-b32", + "proxy-uncertainty" + ] + } + }, + { + "id": "tallow-fen-v1-step-1250", + "time": 1250, + "geometry": { + "effectiveDimension": 7.075301788220325, + "anisotropy": 8.279761391066161, + "spread": 0.8146253824234009, + "density": 7.2864497225930664 + }, + "uncertainty": { + "entropy": 3.5780458450317383, + "margin": 0.045075961388647556, + "calibration": 0.07691206783056259 + }, + "performance": { + "accuracy": 0.7905473709106445, + "loss": 0.13513541221618652 + }, + "metadata": { + "source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)", + "version": "1.0.0", + "tags": [ + "n=12", + "clip-vit-b32", + "proxy-uncertainty" + ] + } + }, + { + "id": "tallow-fen-v1-step-1500", + "time": 1500, + "geometry": { + "effectiveDimension": 7.096816902083409, + "anisotropy": 8.518901948452722, + "spread": 0.8217833638191223, + "density": 7.262251305894052 + }, + "uncertainty": { + "entropy": 3.5785281658172607, + "margin": 0.04449977073818445, + "calibration": 0.07387224584817886 + }, + "performance": { + "accuracy": 0.7842926979064941, + "loss": 0.1435713768005371 + }, + "metadata": { + "source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)", + "version": "1.0.0", + "tags": [ + "n=12", + "clip-vit-b32", + "proxy-uncertainty" + ] + } + }, + { + "id": "tallow-fen-v1-step-1750", + "time": 1750, + "geometry": { + "effectiveDimension": 7.008293973017701, + "anisotropy": 8.21347886570257, + "spread": 0.8244690895080566, + "density": 7.25062711021022 + }, + "uncertainty": { + "entropy": 3.5780160427093506, + "margin": 0.04056647885590792, + "calibration": 0.07635637372732162 + }, + "performance": { + "accuracy": 0.7795696258544922, + "loss": 0.15124398469924927 + }, + "metadata": { + "source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)", + "version": "1.0.0", + "tags": [ + "n=12", + "clip-vit-b32", + "proxy-uncertainty" + ] + } + }, + { + "id": "tallow-fen-v1-step-2000", + "time": 2000, + "geometry": { + "effectiveDimension": 6.761863905951503, + "anisotropy": 12.496390278327958, + "spread": 0.7998718023300171, + "density": 7.395438826698832 + }, + "uncertainty": { + "entropy": 3.5797951221466064, + "margin": 0.051343273371458054, + "calibration": 0.0672067403793335 + }, + "performance": { + "accuracy": 0.7937332987785339, + "loss": 0.14313781261444092 + }, + "metadata": { + "source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)", + "version": "1.0.0", + "tags": [ + "n=12", + "clip-vit-b32", + "proxy-uncertainty" + ] + } + } + ] +} \ No newline at end of file diff --git a/packages/vector-caliper/baselines/qwen-lora-tallow-fen-v1.svg b/packages/vector-caliper/baselines/qwen-lora-tallow-fen-v1.svg new file mode 100644 index 0000000..305e436 --- /dev/null +++ b/packages/vector-caliper/baselines/qwen-lora-tallow-fen-v1.svg @@ -0,0 +1,23 @@ + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file