Releases: totalslacker/switchcraft
v0.1.0 — first public release
First public release of Switchcraft — a Swift port of Witchcraft (Apache 2.0) that runs the XTR-Warp algorithm natively on Apple platforms.
Phase 1 MVP feature-complete; see CHANGELOG.md for the full list of components (tokenizer, embedder, indexer, hybrid search engine, storage, public API).
What's in this release
Pre-built model assets converted from the google/xtr-base-en checkpoint (Apache 2.0). Two embedder backends are available — pick one based on your size/quality trade-off:
CoreML embedder assets (T5CoreMLEmbedder, SwitchcraftCoreML)
xtr-base-en.mlpackage.zip(~196 MB) — FP32 CoreML encoder + 768→128 projection. Unzip into the directory you'll pass toT5CoreMLEmbedder. SetSWITCHCRAFT_XTR_MLPACKAGEto the unzipped directory.xtr-base-en.tokenizer.json(~2.4 MB) — HuggingFacetokenizer.jsonconsumed by the Swift tokenizer port. Used by both embedder backends.
Production-default per ADR 010. Maximum precision (mean cosine ≥ 0.999 vs PyTorch FP32 reference). NDCG@10 in the [0.31, 0.33] Witchcraft band on NFCorpus.
Metal embedder asset (T5MetalEmbedder, SwitchcraftMetal) — Phase 2
xtr-base-en.q4_k.gguf(~62 MB) — Q4_K-quantised GGUF of the same encoder + 768→128 projection, consumed byT5MetalEmbedder. SetSWITCHCRAFT_XTR_GGUFto this file.
Phase 2 alternative (umbrella issue #57). ~7× smaller on disk, faster inference via custom Metal kernels (Q4KMatMul, RMSNorm, Softmax, FP32MatMul, GatedGELU, L2Norm, ResidualAdd). Cross-stack-validated against the Witchcraft Q4K reference: maxAbs = 0.000216, minCosine = 0.9999996. NDCG@10 = 0.336 on NFCorpus (in the Metal-specific [0.31, 0.34] band per ADR 014; Metal runs FP32 throughout so it lands slightly above ggml's mixed-precision 0.33 ceiling).
Reader accepts GGUF v2 and v3 (per ADR 016). The asset bundled here is v3.
Mixing rule
The two embedders use different modelIdentifier values (google/xtr-base-en@v1 for CoreML, google/xtr-base-en@v1+gguf for Metal). A single store is locked to whichever variant indexed it — switching backends requires re-embedding the corpus. See ADR 010(f).
Provenance
Both assets are derived from google/xtr-base-en at HuggingFace revision f40cd399e67dfc8ec974e922ad828610e3c83a36. The CoreML asset's parity-validated PyTorch reference fixtures live at Tests/Fixtures/xtr-base-en.embeddings.{bin,json}. The Metal asset's cross-stack reference (Witchcraft Q4K) lives at Tests/Fixtures/reference_embeddings.{bin,json}. See ADR 010, ADR 014, ADR 016, ADR 017.
If you want to rebuild the assets yourself:
- CoreML:
scripts/convert-xtr-to-coreml.py+scripts/requirements-coreml.txt(Python 3.11 —torch==2.2.2has no wheels for 3.13+). - Metal: see
docs/porting/ggml-t5.md§"GGUF acquisition pipeline" for the Witchcraftquantize-toolrecipe (note thelinear.weightFP32 carve-out per ADR 017).
Asset checksums
0c88511ddf48f207196b66867f059b83302b082c967c6aa58168779fec1095df xtr-base-en.mlpackage.zip
cc4a3ccc9ec82cc881b110116cc63a459b99f153f5f36955ff93c0138851fbc1 xtr-base-en.q4_k.gguf
8bc42ea1a5408c9dc21791c63501d35c5a40b2d12d186424b76755c9b71480cc xtr-base-en.tokenizer.json
Attribution
Switchcraft uses model architecture and weights from google/xtr-base-en (Apache 2.0). Algorithm is a port of Witchcraft (Apache 2.0, Dropbox). See NOTICE for full third-party attribution.