vision-pipeline

CLI-first industrial visual anomaly / defect detection. Fit on the good parts → score the defects → eval-against-baseline → export to ONNX with verified parity → benchmark CPU latency, all driven by one machine- readable binary, tracked in MLflow. Built on the ml-pipeline-template house style.

Most ML demos are a notebook that works once on the author's laptop. This is the opposite: a real anomaly-detection pipeline that runs the same way in CI, in an agent loop, and on a fresh checkout — with the production bits (ONNX export, latency benchmark) actually wired and verified, not promised.

Two paths, honestly labelled

Path	What	Status
Verified CPU	reconstruction-error anomaly detector (PCA / IsolationForest) on built-in `load_digits`, ONNX export + parity, CPU latency benchmark	✅ runs in CI, on a laptop, in seconds
Scaffolded GPU	deep MVTec-AD model (PaDiM/PatchCore-style or autoencoder) on rented GPU via Modal	🟡 wired (`configs/mvtec.yaml`, `scripts/modal_train.py`), not CI-verified

The CPU path is genuinely a defensible CV anomaly pipeline. The deep path shows the same operational shell scaling to a real benchmark dataset on rented hardware, without pretending CI trained a deep model it didn't.

The model (verified path)

Industrial defect detection has many good parts and few/no labelled defects, so you fit only on "normal" data and score deviation from it. Here one digit class is the normal part; every other digit is a defect.

PCA — fit a subspace on normal images; anomaly score is the squared reconstruction error. Defects don't lie on the normal-part manifold, so they reconstruct poorly. (The linear stand-in for a deep autoencoder.)
IsolationForest — isolation-depth anomaly score, no reconstruction.

Evaluation is ROC-AUC + average precision vs a random-score baseline — because a metric without a baseline is marketing, not evaluation.

Quickstart

uv sync --extra dev                       # install (CPU-only deps)
uv run vp doctor                          # environment readiness (--json for CI)
uv run vp train configs/digits.yaml       # fit + eval vs baseline + log to MLflow
uv run vp eval digits-pca                 # recompute metrics on the holdout
uv run vp export digits-pca               # -> ONNX, assert parity with sklearn
uv run vp bench digits-pca --n 200        # onnxruntime CPU latency (p50/p95)
uv run vp infer digits-pca --index 1      # score one image

The block above is marked  — CI runs these exact commands on every push, so this quickstart can never silently drift from the code.

make demo runs the full verified loop. Output of train (human mode):

trained digits-pca (pca, normal class 0)
  ROC-AUC 1.0  (baseline 0.4895, lift +0.5105)
  avg precision 1.0  (baseline 0.9602)
  fit on 106 normal · eval 1691 (1619 anomalies)
  model -> artifacts/digits-pca/model.joblib

Digit 0 vs the rest is an easy anomaly task — PCA reconstruction separates it perfectly. Harder normal classes are still strong but not perfect (e.g. normal_class: 8 gives ROC-AUC ~0.97 with PCA, ~0.93 with IsolationForest). The verified contract is "model ≫ random baseline, parity holds", not a magic number. Switch normal_class/model in the config to see the range.

export (parity is asserted — a failed check is a non-zero exit):

exported digits-pca (pca) -> artifacts/digits-pca/model.onnx (2559 bytes)
  parity ok: max abs diff 0.000e+00 <= tol 1e-03 over 64 samples

(For model: isoforest the ONNX graph computes the full tree-ensemble score, so parity is a more interesting ~6e-08 rather than exact.)

bench (onnxruntime, batch 1):

benchmarked digits-pca (pca) onnxruntime CPU, batch 1, n=200
  latency p50 0.0028 ms · p95 0.0031 ms · mean 0.0028 ms
  throughput ~351571 img/s

Exact numbers vary by machine; the shape (model ≫ baseline, parity within tolerance, sub-millisecond CPU latency) is what's verified in CI.

CLI surface

vp doctor [--json]                                  # is this environment ready?
vp train  <config> [--out] [--json]                 # fit, eval vs baseline, log to MLflow
vp eval   <name> [--config] [--out] [--json]        # recompute metrics on the holdout
vp export <name> [--out] [--json]                   # -> ONNX, verify sklearn parity
vp bench  <name> [--n] [--out] [--json]             # onnxruntime CPU latency (p50/p95)
vp infer  <name> [--index] [--out] [--json]         # score one dataset image
vp gpu-train <config> [--launch] [--json]           # scaffolded deep MVTec path (Modal)
vp version [--json]

Switch models in configs/digits.yaml (model: pca or model: isoforest, plus normal_class, n_components, contamination, seed).

The scaffolded GPU path

uv sync --extra gpu                                 # heavy deps: torch/timm/modal
modal token new                                     # one-time auth
modal run scripts/modal_train.py --config configs/mvtec.yaml

vp gpu-train configs/mvtec.yaml validates the config and either prints the launch command or fails cleanly when the gpu extra is absent. The Modal script is real, coherent code (frozen timm backbone → patch features → per-patch Gaussian → Mahalanobis scoring) with NotImplementedError where the licensed MVTec-AD download is required — it is not run in CI.

Tracking UI (optional)

make up                                             # MLflow on localhost:5050
export MLFLOW_TRACKING_URI=http://localhost:5050

The CLI works without it (falls back to local sqlite:///mlflow.db).

Notebooks (marimo)

uv run marimo edit notebooks/01_pr_curve.py         # PR/ROC curves vs baseline
uv run marimo edit notebooks/02_samples.py          # normal vs anomaly images

What's verified

Path	Status
`vp train` / `eval` beats random baseline	✅ verified (asserted in tests + CI)
`vp export` ONNX parity with sklearn	✅ verified (PCA + IsolationForest)
`vp bench` onnxruntime CPU latency	✅ verified
Full train→eval→export→bench→infer loop	✅ verified (pytest + CI)
`pytest` smoke suite + ruff in CI	✅ verified
MLflow local sqlite store	✅ verified
MLflow server via docker-compose	🟡 compose provided, runs locally
Deep MVTec-AD model via Modal (`gpu-train`)	🟡 scaffolded, not CI-verified

Agent-friendly by design

Every command is non-interactive, takes --json, and uses load-bearing exit codes — so Codex, Claude Code, Cursor, Copilot, and friends can drive the whole train→eval→export→bench→infer loop with no TTY, no UI, no running service: parse stdout, branch on the exit code.

uv run vp train configs/digits.yaml --json
# -> {"ok": true, "name": "digits-pca", "metrics": {"roc_auc": 1.0, "lift_roc_auc": 0.5105, ...}, ...}

Agent instructions live in AGENTS.md — the cross-tool standard; CLAUDE.md is a symlink to it.

CI does more than lint

Most repos' CI checks that the code parses. This one checks that the pipeline works — three things beyond lint + tests, all stdlib, no extra deps:

It runs the pipeline and publishes the numbers. Every push fits the detector and posts a live metrics table (ROC-AUC, average precision, baseline, lift) to the GitHub Actions run summary (scripts/ci_report.py). The numbers in CI are produced on that commit, not pasted by hand.
It keeps the docs honest. The Quickstart block is marked  and scripts/test_readme.py runs those exact commands in CI. Docs that drift from the code fail the build.
It proves determinism. scripts/check_repro.py trains twice and asserts identical metrics — a seed is a promise, and CI verifies the promise holds.

Run them locally too: make summary, make readme, make repro.

License

Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
notebooks		notebooks
scripts		scripts
src/vision_pipeline		src/vision_pipeline
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vision-pipeline

Two paths, honestly labelled

The model (verified path)

Quickstart

CLI surface

The scaffolded GPU path

Tracking UI (optional)

Notebooks (marimo)

What's verified

Agent-friendly by design

CI does more than lint

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vision-pipeline

Two paths, honestly labelled

The model (verified path)

Quickstart

CLI surface

The scaffolded GPU path

Tracking UI (optional)

Notebooks (marimo)

What's verified

Agent-friendly by design

CI does more than lint

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages