stanford-futuredata/ColBERT Roadmap

Guiding Philosophy

stanford-futuredata/ColBERT should remain a stable, canonical reference implementation of late interaction, especially for newcomers to late interaction and the GPU-poor*. Folks looking for the bleeding-edge of late interaction should check out the fantastic lightonai/PyLate library.

*stanford-futuredata/ColBERT's coupled model-index design enables batched encoding with immediate compression, maintaining sub-5GB memory usage even for multi-million document collections.

Immediate Goal: Dependency Stabilization (~3 months)

Upgrade PyTorch to 2.x
Upgrade transformers (remove deprecated AdamW)
Replace faiss with fastkmeans
Test Python 3.9-3.12 compatibility
Resolve crypt.h/ninja errors.
Merge distributed training fix in #258 (potentially related: #132 and #233)
Replace git-python with GitPython in PyPI (already changed in repo in commit 736f88b)

Medium-Term Goals: Documentation and Bug Fixes (~6 months)

Update documentation
- Address documentation updates in issues/PRs (#316, #153, #167, etc.).
- Create an llms.txt and llms_ctx.txt for the repo.
Investigate issues:
- Bug (#159, #317, #360, etc.)
- Training (#262, #265, #291, etc.).
- IndexUpdater (#180, #261, #276, etc.).
- Multi-GPU (#158, #265, #318, etc.).
- Ready-to-close issues after review/repro (#139, #179, #335, etc.).
- etc.

Long-Term Goals: Feature Requests (~3 months)

Resuming training from checkpoint (#307).
Allow string pids (#326).
Explore batch size handling options to resolve OOM during search.
etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stanford-futuredata/ColBERT Roadmap

Guiding Philosophy

Immediate Goal: Dependency Stabilization (~3 months)

Medium-Term Goals: Documentation and Bug Fixes (~6 months)

Long-Term Goals: Feature Requests (~3 months)

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

stanford-futuredata/ColBERT Roadmap

Guiding Philosophy

Immediate Goal: Dependency Stabilization (~3 months)

Medium-Term Goals: Documentation and Bug Fixes (~6 months)

Long-Term Goals: Feature Requests (~3 months)