Skip to content

Latest commit

 

History

History
36 lines (28 loc) · 3.56 KB

File metadata and controls

36 lines (28 loc) · 3.56 KB

stanford-futuredata/ColBERT Roadmap

Guiding Philosophy

stanford-futuredata/ColBERT should remain a stable, canonical reference implementation of late interaction, especially for newcomers to late interaction and the GPU-poor*. Folks looking for the bleeding-edge of late interaction should check out the fantastic lightonai/PyLate library.

*stanford-futuredata/ColBERT's coupled model-index design enables batched encoding with immediate compression, maintaining sub-5GB memory usage even for multi-million document collections.

Immediate Goal: Dependency Stabilization (~3 months)

  • Upgrade PyTorch to 2.x
  • Upgrade transformers (remove deprecated AdamW)
  • Replace faiss with fastkmeans
  • Test Python 3.9-3.12 compatibility
  • Resolve crypt.h/ninja errors.
  • Merge distributed training fix in #258 (potentially related: #132 and #233)
  • Replace git-python with GitPython in PyPI (already changed in repo in commit 736f88b)

Medium-Term Goals: Documentation and Bug Fixes (~6 months)

  • Update documentation
    • Address documentation updates in issues/PRs (#316, #153, #167, etc.).
    • Create an llms.txt and llms_ctx.txt for the repo.
  • Investigate issues:

Long-Term Goals: Feature Requests (~3 months)