Qdrant backend#283
Conversation
milvus-lite ships x86_64-only wheels, so the default mmore index/retrieval path cannot run on ARM64 hosts (NVIDIA GH200, Apple Silicon, etc.). This change adds an opt-in Qdrant backend selected via a new YAML field db.backend: qdrant. The default path is unchanged: existing configs continue to instantiate MilvusClient exactly as before, so this is a strict superset of upstream behaviour. Implementation -------------- * QdrantMilvusClient is a drop-in MilvusClient-shaped adapter backed by qdrant-client local mode. Only the methods mmore actually uses are implemented: has_collection, list_collections, get_collection_stats, prepare_index_params, create_collection, describe_index, insert, delete, flush, query, hybrid_search, close. * Indexer.from_config and Retriever.from_config gain a one-line branch: if db.backend == "qdrant" they construct the adapter, otherwise they build a MilvusClient verbatim. No other call sites change. * New extra: pip install mmore[qdrant] (sibling of mmore[index]). * milvus-lite is gated to platform_machine == 'x86_64' so ARM64 users can still install mmore[index] for the base pymilvus client. Caveats ------- * Hybrid fusion uses Qdrant's RRF; the WeightedRanker weights passed by mmore are accepted but ignored (RRF is weight-agnostic). Top-k overlap with Milvus is high in practice. * String chunk IDs are mapped to UUIDs via uuid5 deterministically and the original is stored in the payload. * Logical partitions (partition_name=) are emulated via a payload field; Qdrant has no native partition concept. A standalone smoke-test (test_qdrant_pipeline.py) indexes 5 toy documents through both backends with no model downloads and prints a top-1 comparison so reviewers can see the parity for themselves.
Retriever inherits from langchain's BaseRetriever, which is a pydantic
BaseModel. The previous `client: MilvusClient` annotation made pydantic
reject the QdrantMilvusClient adapter at instantiation:
pydantic_core._pydantic_core.ValidationError: 1 validation error for
Retriever
client
Input should be an instance of MilvusClient
[type=is_instance_of, input_value=<...QdrantMilvusClient...>]
Relaxed the annotation to `client: Any` and added a comment pointing at
qdrant_client.py for the shared surface. The Indexer class is not
pydantic-validated and needs no change.
Move the post-`sys.path.insert` imports together at the top of the file and silence E402 with `# noqa`, since the path tweak must run before mmore is importable.
scripts/build_qdrant_alps.sh — reproducible Qdrant compile for aarch64 64K-page systems (Alps GH200). Sets JEMALLOC_SYS_WITH_LG_PAGE=16, pins protoc 34.1, defaults to v1.17.1. docs/QDRANT_BUILD.md — why the prebuilt aarch64 binary crashes on Alps (jemalloc page-size mismatch) and what the script does about it. docs/qdrantcolpali_design.md — design + validation of QdrantColpaliManager (native multi-vector / MAX_SIM, deterministic IDs, gRPC timeout tuning, synthetic correctness test, real-PDF integration test). tests/test_qdrant_server.py — PR EPFLiGHT#283's QdrantMilvusClient against a running server (smoke). tests/test_qdrant_colpali.py — synthetic 5-page MAX_SIM correctness test. tests/test_colpali_real.py — real PDF retrieval (COVID/LLaVA/calendar).
|
News? |
This PR implements Qdrant-lite, it works as is, though, it needs more documentation. The issue is that benchmarks were made on qdrant-server version which was a much more heavy rework than this addition and may interfere a lot with the current master. |
|
Could you please provide a short step by step tutorial on how you use it with CSCS ? Ideally a markdown file next to |
…ite pin Master added milvus-lite==2.5.1 as a separate dep; PR makes it ARM64-conditional via 'platform_machine == x86_64'. Resolution keeps both: master's 2.5.1 pin AND the ARM64 guard. ARM64 users install mmore[qdrant] and switch to db.backend: qdrant.
|
The documentation has been added |
JCHAVEROT
left a comment
There was a problem hiding this comment.
Hi @jeremydoumeng,
Good job with QDrant, I tested it locally on my ARM64 computer and could perform RAG over the DB created by the QDrant backend successfully
I made a few comments about the doc, they should be quick to handle, in any case let me know.
Once you have detailed a bit more the CSCS, I'll test on it
| ```{important} | ||
| The prebuilt Qdrant **server** binary for aarch64 ships a jemalloc compiled | ||
| for 4 KB pages and crashes on GH200 with | ||
| `<jemalloc>: Unsupported system page size`. Embedded mode bypasses this | ||
| because it never loads the Rust binary. For server-mode workloads on Alps | ||
| you need a custom Qdrant build (see | ||
| [qdrant-alps](https://github.com/jeremydoumeng/qdrant-alps)). | ||
| ``` |
There was a problem hiding this comment.
It's not clear what needs to be done so to solve the problem, clicking on the link we just find a new fork repo but don't know what to do
Ideally list all necessary commands in this file in a user-friendly way so that we don't even have to leave the documentation
…cscs link - Use `mmore index` / `mmore rag --config-file` instead of calling run_index / run_rag directly, for consistency with the other docs. - Fix the index config: wrap settings in the `indexer:` section with correct nesting (it was flat and would not load). - Replace the qdrant-alps reference with qdrant-cscs and inline the server build/launch commands so the server-mode path is self-contained.
JCHAVEROT
left a comment
There was a problem hiding this comment.
Please remove all the # noqa you introduced, they are not used anywhere else in the codebase. This is a very bad practice, which can end up being dangerous as you're not solving the problems just hiding them
| For the server-mode path on Alps, use | ||
| [qdrant-cscs](https://github.com/jeremydoumeng/qdrant-cscs): it ships a build | ||
| script that compiles a Qdrant binary patched for GH200's 64 KB pages, and a | ||
| Slurm wrapper that starts the server. In short: | ||
|
|
||
| ```bash | ||
| git clone https://github.com/jeremydoumeng/qdrant-cscs.git && cd qdrant-cscs | ||
| ./scripts/build_qdrant_alps.sh # one-time build (~5 min) | ||
| sbatch scripts/start_qdrant_server.sbatch # serves on 127.0.0.1:6333 | ||
| ``` | ||
|
|
||
| Then point this guide's `db.uri` at the server URL | ||
| (`http://127.0.0.1:6333`) instead of a directory path; everything else in the | ||
| index/RAG configs stays the same. See that repo's README for the full recipe. |
There was a problem hiding this comment.
These instructions should be given earlier as we cannot run the aforementioned commands without
| def _milvus_filter_to_qdrant( | ||
| expr: Optional[str], | ||
| partition_names: Optional[List[str]] = None, | ||
| ): | ||
| """Convert a Milvus filter expression to a ``qdrant_client.models.Filter``. | ||
|
|
||
| Supported patterns (the only ones mmore uses): | ||
|
|
||
| * ``field in ["a", "b", ...]`` | ||
| * ``field == 'value'`` | ||
| * ``field != 'value'`` (including ``field != ""``) | ||
|
|
||
| Anything else raises ``ValueError`` so unsupported patterns surface | ||
| loudly instead of silently returning the wrong rows. | ||
| """ |
There was a problem hiding this comment.
This function is awfully ugly.
It parses using regex the Milvus filter strings back into Qdrant filter objects. We need it because QdrantMilvusClient is a replacement for MilvusClient (which takes filter strings) but that is an anti-pattern solution
As Milvus remains our primary vector DB we can keep this, but later we can create a shared filter object both clients use directly, so not to have strings to convert from
Co-authored-by: Jérémy Chaverot <chaverotjrmy7@gmail.com>
…ild recipe Address review: full indexer: config section with correct indentation, python3 -m mmore commands matching other docs, env setup before any commands, and the aarch64 server build steps listed inline instead of linking out to a fork.
Drop the sys.path hack so imports sit at the top (no E402) and use importlib.util.find_spec for backend availability checks (no F401). The script now requires mmore to be installed, matching the docs.
# Conflicts: # pyproject.toml
|
Hi @JCHAVEROT , all points addressed: milvus-lite pin reverted to unconditional, as suggested |
There was a problem hiding this comment.
I updated the pyright.yml workflow to also include your "qdrant" extra, it seems like out of the 4 type checks there are three coming from your additions (the last remaining one concerning the processors cannot be solved currently), if you can take a look please
Other than that the doc cscs.md looks much cleaner and I could run everything successfully on the CSCS cluster
One last thing: could you please add one sentence in the documentation indexing.md to also have the precision that there also exists qdrant as an alternative to Milvus, with an hyperlink to your doc
| class StubSparseEmbedding(BaseSparseEmbedding): | ||
| """Returns a deterministic sparse vector keyed by word hash.""" | ||
|
|
||
| def embed_query(self, query: str) -> Dict[int, float]: | ||
| return {hash(w) % 512: 1.0 for w in query.split()} | ||
|
|
||
| def embed_documents(self, texts: List[str]) -> List[Dict[int, float]]: | ||
| return [self.embed_query(t) for t in texts] | ||
|
|
||
|
|
||
| _orig_sparse_from_config = _sparse_base.SparseModel.from_config | ||
|
|
||
|
|
||
| @classmethod # type: ignore[misc] | ||
| def _stub_sparse_from_config(cls, config): | ||
| return StubSparseEmbedding() | ||
|
|
||
|
|
||
| _sparse_base.SparseModel.from_config = _stub_sparse_from_config | ||
|
|
||
|
|
||
| # ── Toy corpus ──────────────────────────────────────────────────────────────── | ||
| DOCS = [ | ||
| MultimodalSample( | ||
| text="Barack Obama was born on August 4, 1961, in Honolulu, Hawaii.", | ||
| modalities=[], | ||
| metadata={"source": "wikipedia"}, | ||
| ), | ||
| MultimodalSample( | ||
| text="Google was founded by Larry Page and Sergey Brin in September 1998.", | ||
| modalities=[], | ||
| metadata={"source": "wikipedia"}, | ||
| ), | ||
| MultimodalSample( | ||
| text="The Eiffel Tower is located on the Champ de Mars in Paris, France.", | ||
| modalities=[], | ||
| metadata={"source": "wikipedia"}, | ||
| ), | ||
| MultimodalSample( | ||
| text="The Python programming language was created by Guido van Rossum.", | ||
| modalities=[], | ||
| metadata={"source": "wikipedia"}, | ||
| ), | ||
| MultimodalSample( | ||
| text="Mount Everest is the world's highest mountain above sea level.", | ||
| modalities=[], | ||
| metadata={"source": "wikipedia"}, | ||
| ), | ||
| ] |
There was a problem hiding this comment.
We already have a class creating fake embeddings and a collection of MultimodalSamples in contest.py, please reuse them and update them if necessary (just be careful not to break other tests relying on them)
There was a problem hiding this comment.
Your test file is not in the tests/ folder hence cannot run in the CI along the other tests (you may want to change the tests.yml workflow to also install the "qdrant" extra)
There was a problem hiding this comment.
Please take a look at the variables having an Any type in this file, and try to replace them by the correct one if possible
Qdrant vector backend
An ARM64-safe alternative to milvus-lite for mmore's index and retrieval
paths. Opt-in via a single YAML field; the default Milvus path is unchanged.
Why
milvus-liteships x86_64-only wheels, so the embedded mode that mmoreuses by default cannot run on ARM64 hosts (NVIDIA GH200, Apple Silicon,
some cloud ARM instances). Qdrant local mode works on any architecture and
provides the same on-disk, no-server-required experience.
Install
This installs
qdrant-clientalongside the existingmmore[index]deps.On x86_64 you can mix-and-match — both backends can be installed at the
same time.
Usage
Set
db.backend: qdrantin your IndexerConfig / RetrieverConfig YAML. Theuribecomes a directory path (Qdrant local mode), not a.dbfile.That's it.
Indexer.from_config(...)andRetriever.from_config(...)pickup the field automatically — no other code changes are needed.
You can also pass an HTTP(S) URL to point at a remote Qdrant server:
What changes
src/mmore/index/qdrant_client.py— aMilvusClient-shaped adapterbacked by
qdrant-clientlocal mode. Implements only the methods mmoreactually calls:
has_collection,list_collections,get_collection_stats,prepare_index_params,create_collection,describe_index,insert,delete,flush,query,hybrid_search,close.Indexer.from_config/Retriever.from_config— one-line branch ondb.backendto construct eitherMilvusClient(default, unchanged) orthe Qdrant adapter.
pyproject.toml— adds the[qdrant]extra and gatesmilvus-litetox86_64.No other call sites change. No public signatures change. No existing tests
modified.
Caveats
Fusion; the
WeightedRanker(w_dense, w_sparse)weights passed by mmoreare accepted but ignored (RRF is weight-agnostic). Top-k overlap with
Milvus is high in practice but rankings will differ slightly.
mmore's string chunk IDs are mapped via
uuid5deterministically; theoriginal is preserved in payload under
_str_idand surfaced as theidfield in all results.
partition_name=...is stored in the payload under_partition;partition_names=[...]on retrieval translates to a payload-fieldfilter.
exception — there is no automatic switch to Milvus.
data directory. Close the adapter (
indexer.client.close()) beforeopening another client on the same path.
Smoke test
A standalone script at the repo root:
Indexes 5 toy documents into each available backend, runs 3 retrieval
queries, and prints a side-by-side top-1 comparison if both backends are
installed. No model weights are downloaded — the script uses
FakeEmbeddingsfor the dense model and a stub sparse model so it runsoffline in a few seconds.
Expected output on ARM64 (Qdrant only):
Migrating an existing collection
Collections created by
MilvusClientare not directly readable byQdrantMilvusClient— they live in different on-disk formats. To switchan existing index over to Qdrant:
db.backendtoqdrantand pointdb.uriat a freshdirectory.
There is no automatic conversion path; the adapter exists to give ARM64
users a working backend, not to migrate Milvus data.