Skip to content

Releases: totalslacker/switchcraft

v0.1.0 — first public release

04 May 15:51

Choose a tag to compare

First public release of Switchcraft — a Swift port of Witchcraft (Apache 2.0) that runs the XTR-Warp algorithm natively on Apple platforms.

Phase 1 MVP feature-complete; see CHANGELOG.md for the full list of components (tokenizer, embedder, indexer, hybrid search engine, storage, public API).

What's in this release

Pre-built model assets converted from the google/xtr-base-en checkpoint (Apache 2.0). Two embedder backends are available — pick one based on your size/quality trade-off:

CoreML embedder assets (T5CoreMLEmbedder, SwitchcraftCoreML)

  • xtr-base-en.mlpackage.zip (~196 MB) — FP32 CoreML encoder + 768→128 projection. Unzip into the directory you'll pass to T5CoreMLEmbedder. Set SWITCHCRAFT_XTR_MLPACKAGE to the unzipped directory.
  • xtr-base-en.tokenizer.json (~2.4 MB) — HuggingFace tokenizer.json consumed by the Swift tokenizer port. Used by both embedder backends.

Production-default per ADR 010. Maximum precision (mean cosine ≥ 0.999 vs PyTorch FP32 reference). NDCG@10 in the [0.31, 0.33] Witchcraft band on NFCorpus.

Metal embedder asset (T5MetalEmbedder, SwitchcraftMetal) — Phase 2

  • xtr-base-en.q4_k.gguf (~62 MB) — Q4_K-quantised GGUF of the same encoder + 768→128 projection, consumed by T5MetalEmbedder. Set SWITCHCRAFT_XTR_GGUF to this file.

Phase 2 alternative (umbrella issue #57). ~7× smaller on disk, faster inference via custom Metal kernels (Q4KMatMul, RMSNorm, Softmax, FP32MatMul, GatedGELU, L2Norm, ResidualAdd). Cross-stack-validated against the Witchcraft Q4K reference: maxAbs = 0.000216, minCosine = 0.9999996. NDCG@10 = 0.336 on NFCorpus (in the Metal-specific [0.31, 0.34] band per ADR 014; Metal runs FP32 throughout so it lands slightly above ggml's mixed-precision 0.33 ceiling).

Reader accepts GGUF v2 and v3 (per ADR 016). The asset bundled here is v3.

Mixing rule

The two embedders use different modelIdentifier values (google/xtr-base-en@v1 for CoreML, google/xtr-base-en@v1+gguf for Metal). A single store is locked to whichever variant indexed it — switching backends requires re-embedding the corpus. See ADR 010(f).

Provenance

Both assets are derived from google/xtr-base-en at HuggingFace revision f40cd399e67dfc8ec974e922ad828610e3c83a36. The CoreML asset's parity-validated PyTorch reference fixtures live at Tests/Fixtures/xtr-base-en.embeddings.{bin,json}. The Metal asset's cross-stack reference (Witchcraft Q4K) lives at Tests/Fixtures/reference_embeddings.{bin,json}. See ADR 010, ADR 014, ADR 016, ADR 017.

If you want to rebuild the assets yourself:

  • CoreML: scripts/convert-xtr-to-coreml.py + scripts/requirements-coreml.txt (Python 3.11 — torch==2.2.2 has no wheels for 3.13+).
  • Metal: see docs/porting/ggml-t5.md §"GGUF acquisition pipeline" for the Witchcraft quantize-tool recipe (note the linear.weight FP32 carve-out per ADR 017).

Asset checksums

0c88511ddf48f207196b66867f059b83302b082c967c6aa58168779fec1095df  xtr-base-en.mlpackage.zip
cc4a3ccc9ec82cc881b110116cc63a459b99f153f5f36955ff93c0138851fbc1  xtr-base-en.q4_k.gguf
8bc42ea1a5408c9dc21791c63501d35c5a40b2d12d186424b76755c9b71480cc  xtr-base-en.tokenizer.json

Attribution

Switchcraft uses model architecture and weights from google/xtr-base-en (Apache 2.0). Algorithm is a port of Witchcraft (Apache 2.0, Dropbox). See NOTICE for full third-party attribution.