Skip to content

Qdrant backend#283

Open
jeremydoumeng wants to merge 13 commits into
EPFLiGHT:masterfrom
jeremydoumeng:qdrant-backend
Open

Qdrant backend#283
jeremydoumeng wants to merge 13 commits into
EPFLiGHT:masterfrom
jeremydoumeng:qdrant-backend

Conversation

@jeremydoumeng

Copy link
Copy Markdown

Qdrant vector backend

An ARM64-safe alternative to milvus-lite for mmore's index and retrieval
paths. Opt-in via a single YAML field; the default Milvus path is unchanged.

Why

milvus-lite ships x86_64-only wheels, so the embedded mode that mmore
uses by default cannot run on ARM64 hosts (NVIDIA GH200, Apple Silicon,
some cloud ARM instances). Qdrant local mode works on any architecture and
provides the same on-disk, no-server-required experience.

Install

pip install mmore[qdrant]

This installs qdrant-client alongside the existing mmore[index] deps.
On x86_64 you can mix-and-match — both backends can be installed at the
same time.

Note: milvus-lite is gated to platform_machine == 'x86_64'. ARM64
users get the base pymilvus client (so the package still imports) but
cannot use db.backend: milvus.

Usage

Set db.backend: qdrant in your IndexerConfig / RetrieverConfig YAML. The
uri becomes a directory path (Qdrant local mode), not a .db file.

db:
  backend: qdrant          # default is "milvus"
  uri: ./my_qdrant_dir     # directory; created on first use
  name: my_db              # accepted for parity, has no Qdrant equivalent

That's it. Indexer.from_config(...) and Retriever.from_config(...) pick
up the field automatically — no other code changes are needed.

You can also pass an HTTP(S) URL to point at a remote Qdrant server:

db:
  backend: qdrant
  uri: http://qdrant.internal:6333
  name: my_db

What changes

  • src/mmore/index/qdrant_client.py — a MilvusClient-shaped adapter
    backed by qdrant-client local mode. Implements only the methods mmore
    actually calls: has_collection, list_collections,
    get_collection_stats, prepare_index_params, create_collection,
    describe_index, insert, delete, flush, query, hybrid_search,
    close.
  • Indexer.from_config / Retriever.from_config — one-line branch on
    db.backend to construct either MilvusClient (default, unchanged) or
    the Qdrant adapter.
  • pyproject.toml — adds the [qdrant] extra and gates milvus-lite to
    x86_64.

No other call sites change. No public signatures change. No existing tests
modified.

Caveats

  • Hybrid fusion uses RRF. Qdrant's local mode supports Reciprocal Rank
    Fusion; the WeightedRanker(w_dense, w_sparse) weights passed by mmore
    are accepted but ignored (RRF is weight-agnostic). Top-k overlap with
    Milvus is high in practice but rankings will differ slightly.
  • String IDs → UUIDs. Qdrant requires unsigned-int or UUID point IDs.
    mmore's string chunk IDs are mapped via uuid5 deterministically; the
    original is preserved in payload under _str_id and surfaced as the id
    field in all results.
  • Partitions are emulated. Qdrant has no native partition concept.
    partition_name=... is stored in the payload under _partition;
    partition_names=[...] on retrieval translates to a payload-field
    filter.
  • No fallback. If the Qdrant adapter raises, the user gets a normal
    exception — there is no automatic switch to Milvus.
  • Local-mode file lock. Qdrant local mode holds a file lock on the
    data directory. Close the adapter (indexer.client.close()) before
    opening another client on the same path.

Smoke test

A standalone script at the repo root:

python test_qdrant_pipeline.py

Indexes 5 toy documents into each available backend, runs 3 retrieval
queries, and prints a side-by-side top-1 comparison if both backends are
installed. No model weights are downloaded — the script uses
FakeEmbeddings for the dense model and a stub sparse model so it runs
offline in a few seconds.

Expected output on ARM64 (Qdrant only):

[1/3] Indexing 5 documents...
      Inserted: 5 chunks
[2/3] Running retrieval for 3 queries...
  Q: When was Barack Obama born?
  A: Barack Obama was born on August 4, 1961, in Honolulu, Hawaii.…
  Q: Who founded Google?
  A: Google was founded by Larry Page and Sergey Brin in September 1998.…
  Q: Where is the Eiffel Tower located?
  A: The Eiffel Tower is located on the Champ de Mars in Paris, France.…
[3/3] Checking model metadata round-trip...
      dense  model: debug  ✓
      sparse model: naver/splade-cocondenser-selfdistil  ✓
  Backend QDRANT — ALL CHECKS PASSED ✓

Migrating an existing collection

Collections created by MilvusClient are not directly readable by
QdrantMilvusClient — they live in different on-disk formats. To switch
an existing index over to Qdrant:

  1. Run the indexer pipeline that produced your Milvus collection.
  2. Switch db.backend to qdrant and point db.uri at a fresh
    directory.
  3. Re-index from the same source documents.

There is no automatic conversion path; the adapter exists to give ARM64
users a working backend, not to migrate Milvus data.

milvus-lite ships x86_64-only wheels, so the default mmore index/retrieval
path cannot run on ARM64 hosts (NVIDIA GH200, Apple Silicon, etc.).

This change adds an opt-in Qdrant backend selected via a new YAML field
db.backend: qdrant. The default path is unchanged: existing configs
continue to instantiate MilvusClient exactly as before, so this is a
strict superset of upstream behaviour.

Implementation
--------------
* QdrantMilvusClient is a drop-in MilvusClient-shaped adapter backed by
  qdrant-client local mode. Only the methods mmore actually uses are
  implemented: has_collection, list_collections, get_collection_stats,
  prepare_index_params, create_collection, describe_index, insert,
  delete, flush, query, hybrid_search, close.
* Indexer.from_config and Retriever.from_config gain a one-line branch:
  if db.backend == "qdrant" they construct the adapter, otherwise they
  build a MilvusClient verbatim. No other call sites change.
* New extra: pip install mmore[qdrant] (sibling of mmore[index]).
* milvus-lite is gated to platform_machine == 'x86_64' so ARM64 users
  can still install mmore[index] for the base pymilvus client.

Caveats
-------
* Hybrid fusion uses Qdrant's RRF; the WeightedRanker weights passed by
  mmore are accepted but ignored (RRF is weight-agnostic). Top-k overlap
  with Milvus is high in practice.
* String chunk IDs are mapped to UUIDs via uuid5 deterministically and
  the original is stored in the payload.
* Logical partitions (partition_name=) are emulated via a payload field;
  Qdrant has no native partition concept.

A standalone smoke-test (test_qdrant_pipeline.py) indexes 5 toy
documents through both backends with no model downloads and prints a
top-1 comparison so reviewers can see the parity for themselves.
Retriever inherits from langchain's BaseRetriever, which is a pydantic
BaseModel. The previous `client: MilvusClient` annotation made pydantic
reject the QdrantMilvusClient adapter at instantiation:

    pydantic_core._pydantic_core.ValidationError: 1 validation error for
    Retriever
    client
      Input should be an instance of MilvusClient
      [type=is_instance_of, input_value=<...QdrantMilvusClient...>]

Relaxed the annotation to `client: Any` and added a comment pointing at
qdrant_client.py for the shared surface. The Indexer class is not
pydantic-validated and needs no change.
Move the post-`sys.path.insert` imports together at the top of the file
and silence E402 with `# noqa`, since the path tweak must run before
mmore is importable.
jeremydoumeng pushed a commit to jeremydoumeng/mmore-qdrant that referenced this pull request May 28, 2026
scripts/build_qdrant_alps.sh — reproducible Qdrant compile for aarch64 64K-page
  systems (Alps GH200). Sets JEMALLOC_SYS_WITH_LG_PAGE=16, pins protoc 34.1,
  defaults to v1.17.1.

docs/QDRANT_BUILD.md — why the prebuilt aarch64 binary crashes on Alps
  (jemalloc page-size mismatch) and what the script does about it.

docs/qdrantcolpali_design.md — design + validation of QdrantColpaliManager
  (native multi-vector / MAX_SIM, deterministic IDs, gRPC timeout tuning,
  synthetic correctness test, real-PDF integration test).

tests/test_qdrant_server.py — PR EPFLiGHT#283's QdrantMilvusClient against a
  running server (smoke).
tests/test_qdrant_colpali.py — synthetic 5-page MAX_SIM correctness test.
tests/test_colpali_real.py — real PDF retrieval (COVID/LLaVA/calendar).
@fabnemEPFL

fabnemEPFL commented May 29, 2026

Copy link
Copy Markdown
Collaborator

News?

@jeremydoumeng

Copy link
Copy Markdown
Author

News?

This PR implements Qdrant-lite, it works as is, though, it needs more documentation. The issue is that benchmarks were made on qdrant-server version which was a much more heavy rework than this addition and may interfere a lot with the current master.

@JCHAVEROT

JCHAVEROT commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Could you please provide a short step by step tutorial on how you use it with CSCS ?

Ideally a markdown file next to rcp_and_production.md in the folder docs/source/advanced_usage/

JOMENGO added 3 commits May 31, 2026 19:40
…ite pin

Master added milvus-lite==2.5.1 as a separate dep; PR makes it ARM64-conditional
via 'platform_machine == x86_64'. Resolution keeps both: master's 2.5.1 pin AND
the ARM64 guard. ARM64 users install mmore[qdrant] and switch to db.backend: qdrant.
@JCHAVEROT JCHAVEROT added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 1, 2026
@jeremydoumeng

Copy link
Copy Markdown
Author

The documentation has been added

@JCHAVEROT JCHAVEROT left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jeremydoumeng,

Good job with QDrant, I tested it locally on my ARM64 computer and could perform RAG over the DB created by the QDrant backend successfully

I made a few comments about the doc, they should be quick to handle, in any case let me know.

Once you have detailed a bit more the CSCS, I'll test on it

Image

Comment thread docs/source/advanced_usage/cscs.md Outdated
Comment thread docs/source/advanced_usage/cscs.md
Comment thread docs/source/advanced_usage/cscs.md Outdated
Comment on lines +9 to +16
```{important}
The prebuilt Qdrant **server** binary for aarch64 ships a jemalloc compiled
for 4 KB pages and crashes on GH200 with
`<jemalloc>: Unsupported system page size`. Embedded mode bypasses this
because it never loads the Rust binary. For server-mode workloads on Alps
you need a custom Qdrant build (see
[qdrant-alps](https://github.com/jeremydoumeng/qdrant-alps)).
```

@JCHAVEROT JCHAVEROT Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear what needs to be done so to solve the problem, clicking on the link we just find a new fork repo but don't know what to do

Ideally list all necessary commands in this file in a user-friendly way so that we don't even have to leave the documentation

…cscs link

- Use `mmore index` / `mmore rag --config-file` instead of calling
  run_index / run_rag directly, for consistency with the other docs.
- Fix the index config: wrap settings in the `indexer:` section with correct
  nesting (it was flat and would not load).
- Replace the qdrant-alps reference with qdrant-cscs and inline the server
  build/launch commands so the server-mode path is self-contained.

@JCHAVEROT JCHAVEROT left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all the # noqa you introduced, they are not used anywhere else in the codebase. This is a very bad practice, which can end up being dangerous as you're not solving the problems just hiding them

Comment thread pyproject.toml Outdated
Comment thread docs/source/advanced_usage/cscs.md Outdated
Comment on lines +93 to +106
For the server-mode path on Alps, use
[qdrant-cscs](https://github.com/jeremydoumeng/qdrant-cscs): it ships a build
script that compiles a Qdrant binary patched for GH200's 64 KB pages, and a
Slurm wrapper that starts the server. In short:

```bash
git clone https://github.com/jeremydoumeng/qdrant-cscs.git && cd qdrant-cscs
./scripts/build_qdrant_alps.sh # one-time build (~5 min)
sbatch scripts/start_qdrant_server.sbatch # serves on 127.0.0.1:6333
```

Then point this guide's `db.uri` at the server URL
(`http://127.0.0.1:6333`) instead of a directory path; everything else in the
index/RAG configs stays the same. See that repo's README for the full recipe.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These instructions should be given earlier as we cannot run the aforementioned commands without

Comment on lines +119 to +133
def _milvus_filter_to_qdrant(
expr: Optional[str],
partition_names: Optional[List[str]] = None,
):
"""Convert a Milvus filter expression to a ``qdrant_client.models.Filter``.

Supported patterns (the only ones mmore uses):

* ``field in ["a", "b", ...]``
* ``field == 'value'``
* ``field != 'value'`` (including ``field != ""``)

Anything else raises ``ValueError`` so unsupported patterns surface
loudly instead of silently returning the wrong rows.
"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is awfully ugly.

It parses using regex the Milvus filter strings back into Qdrant filter objects. We need it because QdrantMilvusClient is a replacement for MilvusClient (which takes filter strings) but that is an anti-pattern solution

As Milvus remains our primary vector DB we can keep this, but later we can create a shared filter object both clients use directly, so not to have strings to convert from

jeremydoumeng and others added 4 commits June 10, 2026 15:26
Co-authored-by: Jérémy Chaverot <chaverotjrmy7@gmail.com>
…ild recipe

Address review: full indexer: config section with correct indentation,
python3 -m mmore commands matching other docs, env setup before any
commands, and the aarch64 server build steps listed inline instead of
linking out to a fork.
Drop the sys.path hack so imports sit at the top (no E402) and use
importlib.util.find_spec for backend availability checks (no F401).
The script now requires mmore to be installed, matching the docs.
@jeremydoumeng

Copy link
Copy Markdown
Author

Hi @JCHAVEROT , all points addressed:

milvus-lite pin reverted to unconditional, as suggested
all noqa removed, imports are now legitimately at the top and the availability checks use importlib.util.find_spec, so nothing is suppressed
docs reworked: python3 -m mmore CLI throughout, valid indexer: config example, env setup before any commands, and the full aarch64 Qdrant build recipe inlined (no external links)
merged latest master (resolved the colvision extra conflict in pyproject.toml) and updated uv.lock for the qdrant extra
Smoke test passes end-to-end on a GH200 node.

@JCHAVEROT JCHAVEROT left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the pyright.yml workflow to also include your "qdrant" extra, it seems like out of the 4 type checks there are three coming from your additions (the last remaining one concerning the processors cannot be solved currently), if you can take a look please

Image

Other than that the doc cscs.md looks much cleaner and I could run everything successfully on the CSCS cluster

One last thing: could you please add one sentence in the documentation indexing.md to also have the precision that there also exists qdrant as an alternative to Milvus, with an hyperlink to your doc

Comment thread test_qdrant_pipeline.py
Comment on lines +37 to +85
class StubSparseEmbedding(BaseSparseEmbedding):
"""Returns a deterministic sparse vector keyed by word hash."""

def embed_query(self, query: str) -> Dict[int, float]:
return {hash(w) % 512: 1.0 for w in query.split()}

def embed_documents(self, texts: List[str]) -> List[Dict[int, float]]:
return [self.embed_query(t) for t in texts]


_orig_sparse_from_config = _sparse_base.SparseModel.from_config


@classmethod # type: ignore[misc]
def _stub_sparse_from_config(cls, config):
return StubSparseEmbedding()


_sparse_base.SparseModel.from_config = _stub_sparse_from_config


# ── Toy corpus ────────────────────────────────────────────────────────────────
DOCS = [
MultimodalSample(
text="Barack Obama was born on August 4, 1961, in Honolulu, Hawaii.",
modalities=[],
metadata={"source": "wikipedia"},
),
MultimodalSample(
text="Google was founded by Larry Page and Sergey Brin in September 1998.",
modalities=[],
metadata={"source": "wikipedia"},
),
MultimodalSample(
text="The Eiffel Tower is located on the Champ de Mars in Paris, France.",
modalities=[],
metadata={"source": "wikipedia"},
),
MultimodalSample(
text="The Python programming language was created by Guido van Rossum.",
modalities=[],
metadata={"source": "wikipedia"},
),
MultimodalSample(
text="Mount Everest is the world's highest mountain above sea level.",
modalities=[],
metadata={"source": "wikipedia"},
),
]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a class creating fake embeddings and a collection of MultimodalSamples in contest.py, please reuse them and update them if necessary (just be careful not to break other tests relying on them)

Comment thread test_qdrant_pipeline.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your test file is not in the tests/ folder hence cannot run in the CI along the other tests (you may want to change the tests.yml workflow to also install the "qdrant" extra)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the variables having an Any type in this file, and try to replace them by the correct one if possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants