stanford-futuredata/ColBERT should remain a stable, canonical reference implementation of late interaction, especially for newcomers to late interaction and the GPU-poor*. Folks looking for the bleeding-edge of late interaction should check out the fantastic lightonai/PyLate library.
*stanford-futuredata/ColBERT's coupled model-index design enables batched encoding with immediate compression, maintaining sub-5GB memory usage even for multi-million document collections.
- Upgrade PyTorch to 2.x
- Upgrade transformers (remove deprecated AdamW)
- Replace faiss with fastkmeans
- Test Python 3.9-3.12 compatibility
- Resolve crypt.h/ninja errors.
- Merge distributed training fix in #258 (potentially related: #132 and #233)
- Replace git-python with GitPython in PyPI (already changed in repo in commit 736f88b)
- Update documentation
- Investigate issues:
- Resuming training from checkpoint (#307).
- Allow string pids (#326).
- Explore batch size handling options to resolve OOM during search.
- etc.