Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions packages/vector-caliper/baselines/DOGFOOD-NOTES-2026-06-09.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Dogfood notes — first real-data run (qwen-lora-tallow-fen-v1, 2026-06-09)

Friction and findings from feeding VectorCaliper its first production training run.
Input for the next working session; nothing here was patched upstream yet.

## 1. Determinism guarantee is not actually byte-deterministic
`src/projection/engine.ts` — the PCA power-iteration seeds eigenvector init with raw
`Math.random()` while the seeded Mulberry32 (`createSeededRandom`) sits unused in the
same file. PCA usually converges to the same components up to sign, but the README's
determinism promise ("deterministic, reproducible rendering") is not guaranteed at the
byte level. Fix: thread the seeded RNG into `pca()`.

## 2. Raw Node ESM cannot consume the package
`tsc` emits the source's extensionless/directory relative imports verbatim
(`from './schema'`, `from './types/state'`); Node ESM rejects both
(`ERR_UNSUPPORTED_DIR_IMPORT`). Rendering required patching 32 dist files to append
`.js` / `/index.js`. Fix: `moduleResolution: "NodeNext"` + explicit `.js` extensions
in source imports. Related: no `dist/` ships and `files` only includes `dist/` — a
consumer must build from source with devDeps.

## 3. Naming/metadata drift
- README installs `@mcp-tool-shop/vector-caliper`; package.json says
`@mcptoolshop/vector-caliper` (and `"private": true` — not published at all).
- `repository.url` points at `mcp-tool-shop-org/VectorCaliper.git`; the source lives
in `mcp-tool-shop-org/prototypes`.

## 4. API fit for diffusion/LoRA training runs
The schema REQUIRES `uncertainty.{entropy, margin, calibration}` — natural for
classifiers, nonexistent for diffusion LoRA runs. This baseline used documented
proxies (entropy of the normalized centroid-similarity distribution; style-vs-photo
text-anchor contrast gap as margin; similarity std as calibration). Options:
make the uncertainty group optional like `dynamics`, or ship a domain preset
("diffusion-style-lora") that defines blessed proxies so cross-run baselines stay
comparable.

## 5. The demo bypasses the product
`demo/canonical-demo.ts` hand-rolls its SVG and uses a flat ad-hoc JSON, bypassing
ProjectionEngine/SemanticMapper/SceneBuilder/SVGRenderer entirely — so the checked-in
canonical output exercises none of the public pipeline. This baseline's SVG is, as far
as the dogfood could tell, the first artifact rendered through the real pipeline.

## 6. What worked
Zero-dep pure-TS core imported cleanly once dist was patched; all 8 states passed
`createModelState` validation on the first attempt (the capture script pre-clamped
its [0,1] proxies specifically because the factories fail closed — the contract
shaped the producer, which is the point of a strict schema); budget classes were a
non-issue at n=8; the semantic encoding (hue←effdim, radius←spread) makes the
step-2000 cloud collapse visible in the SVG without reading any numbers.
28 changes: 28 additions & 0 deletions packages/vector-caliper/baselines/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Baselines

Real measured trajectories from production training runs, shaped to the
`createModelState()` contract. These are VectorCaliper's ground truth for the
"establish baselines → predict/hypothesize" roadmap: once several runs are in,
early-trajectory geometry (e.g. spread-collapse rate by step 500) can be tested
as a predictor of where the binding peak lands.

## qwen-lora-tallow-fen-v1 (2026-06-09) — first real-data baseline

A Qwen-Image rank-16 style LoRA (`tallow_fen_style_v1`, RTX 5090, 2000 steps,
8 checkpoints). Per checkpoint: a fixed 12-prompt eval grid was generated and the
CLIP ViT-B/32 embedding cloud measured. Field mapping and uncertainty PROXIES are
documented in the capture script docstring (a diffusion-LoRA run has no native
classifier entropy/margin/ECE — see dogfood notes #4).

**What this baseline demonstrates** (the headline for the tool's thesis):
between steps 1750→2000, `performance.accuracy` (CLIP-sim to the style centroid)
ROSE 0.7796→0.7937 while `geometry.anisotropy` spiked 8.2→12.5 and
`geometry.effectiveDimension` collapsed 7.0→6.76. The similarity gain came from a
collapsing, less-diverse embedding cloud — overfit masquerading as improvement.
Performance-only checkpoint selection picks step 2000; geometry+performance picks
step 1250 (also the CMMD minimum, 0.1351, and the human looked-at choice, which
saw the same overfit as monochrome drift on neutral subjects). **The combined view
caught what the single metric missed.**

- `qwen-lora-tallow-fen-v1.json` — 8 states (capture: `E:/AI/training/_caliper_capture.py` on the rig)
- `qwen-lora-tallow-fen-v1.svg` — rendered through the real pipeline (ProjectionEngine → SceneBuilder → SVGRenderer)
235 changes: 235 additions & 0 deletions packages/vector-caliper/baselines/qwen-lora-tallow-fen-v1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
{
"run": "tallow_fen_style_v1",
"captured": "2026-06-09",
"reference": {
"dir": "E:\\AI\\training\\dataset_tallow_fen",
"n": 44,
"sigma": 0.47900169753130234
},
"states": [
{
"id": "tallow-fen-v1-step-250",
"time": 250,
"geometry": {
"effectiveDimension": 7.757477402078029,
"anisotropy": 6.621865643989259,
"spread": 0.8430308699607849,
"density": 7.102524389014206
},
"uncertainty": {
"entropy": 3.576477527618408,
"margin": 0.01452073804102838,
"calibration": 0.08244698494672775
},
"performance": {
"accuracy": 0.7647979855537415,
"loss": 0.16690856218338013
},
"metadata": {
"source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)",
"version": "1.0.0",
"tags": [
"n=12",
"clip-vit-b32",
"proxy-uncertainty"
]
}
},
{
"id": "tallow-fen-v1-step-500",
"time": 500,
"geometry": {
"effectiveDimension": 7.180756051976849,
"anisotropy": 7.760998726659996,
"spread": 0.8325328826904297,
"density": 7.181163759334916
},
"uncertainty": {
"entropy": 3.575670003890991,
"margin": 0.025575989857316017,
"calibration": 0.08706717193126678
},
"performance": {
"accuracy": 0.7735676169395447,
"loss": 0.15337598323822021
},
"metadata": {
"source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)",
"version": "1.0.0",
"tags": [
"n=12",
"clip-vit-b32",
"proxy-uncertainty"
]
}
},
{
"id": "tallow-fen-v1-step-750",
"time": 750,
"geometry": {
"effectiveDimension": 7.409353421390496,
"anisotropy": 6.979115957964436,
"spread": 0.8266791701316833,
"density": 7.201104059214372
},
"uncertainty": {
"entropy": 3.5770695209503174,
"margin": 0.03614329965785146,
"calibration": 0.08109613507986069
},
"performance": {
"accuracy": 0.7801888585090637,
"loss": 0.14684104919433594
},
"metadata": {
"source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)",
"version": "1.0.0",
"tags": [
"n=12",
"clip-vit-b32",
"proxy-uncertainty"
]
}
},
{
"id": "tallow-fen-v1-step-1000",
"time": 1000,
"geometry": {
"effectiveDimension": 7.3610166758567885,
"anisotropy": 7.320895825606809,
"spread": 0.8148157596588135,
"density": 7.246805784660056
},
"uncertainty": {
"entropy": 3.577988862991333,
"margin": 0.041199419647455215,
"calibration": 0.07699479907751083
},
"performance": {
"accuracy": 0.788444459438324,
"loss": 0.13987720012664795
},
"metadata": {
"source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)",
"version": "1.0.0",
"tags": [
"n=12",
"clip-vit-b32",
"proxy-uncertainty"
]
}
},
{
"id": "tallow-fen-v1-step-1250",
"time": 1250,
"geometry": {
"effectiveDimension": 7.075301788220325,
"anisotropy": 8.279761391066161,
"spread": 0.8146253824234009,
"density": 7.2864497225930664
},
"uncertainty": {
"entropy": 3.5780458450317383,
"margin": 0.045075961388647556,
"calibration": 0.07691206783056259
},
"performance": {
"accuracy": 0.7905473709106445,
"loss": 0.13513541221618652
},
"metadata": {
"source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)",
"version": "1.0.0",
"tags": [
"n=12",
"clip-vit-b32",
"proxy-uncertainty"
]
}
},
{
"id": "tallow-fen-v1-step-1500",
"time": 1500,
"geometry": {
"effectiveDimension": 7.096816902083409,
"anisotropy": 8.518901948452722,
"spread": 0.8217833638191223,
"density": 7.262251305894052
},
"uncertainty": {
"entropy": 3.5785281658172607,
"margin": 0.04449977073818445,
"calibration": 0.07387224584817886
},
"performance": {
"accuracy": 0.7842926979064941,
"loss": 0.1435713768005371
},
"metadata": {
"source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)",
"version": "1.0.0",
"tags": [
"n=12",
"clip-vit-b32",
"proxy-uncertainty"
]
}
},
{
"id": "tallow-fen-v1-step-1750",
"time": 1750,
"geometry": {
"effectiveDimension": 7.008293973017701,
"anisotropy": 8.21347886570257,
"spread": 0.8244690895080566,
"density": 7.25062711021022
},
"uncertainty": {
"entropy": 3.5780160427093506,
"margin": 0.04056647885590792,
"calibration": 0.07635637372732162
},
"performance": {
"accuracy": 0.7795696258544922,
"loss": 0.15124398469924927
},
"metadata": {
"source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)",
"version": "1.0.0",
"tags": [
"n=12",
"clip-vit-b32",
"proxy-uncertainty"
]
}
},
{
"id": "tallow-fen-v1-step-2000",
"time": 2000,
"geometry": {
"effectiveDimension": 6.761863905951503,
"anisotropy": 12.496390278327958,
"spread": 0.7998718023300171,
"density": 7.395438826698832
},
"uncertainty": {
"entropy": 3.5797951221466064,
"margin": 0.051343273371458054,
"calibration": 0.0672067403793335
},
"performance": {
"accuracy": 0.7937332987785339,
"loss": 0.14313781261444092
},
"metadata": {
"source": "tallow_fen_style_v1 (Qwen-Image LoRA, RTX 5090)",
"version": "1.0.0",
"tags": [
"n=12",
"clip-vit-b32",
"proxy-uncertainty"
]
}
}
]
}
23 changes: 23 additions & 0 deletions packages/vector-caliper/baselines/qwen-lora-tallow-fen-v1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.