2ChapsVision, Paris, France
3IRIT, Université de Toulouse, UMR5505 CNRS, F-31400 Toulouse, France
Cross-encoders deliver state-of-the-art ranking effectiveness in information retrieval, but have a high inference cost. This prevents them from being used as first-stage rankers, but also incurs a cost when re-ranking documents. Prior work has addressed this bottleneck from two largely separate directions: accelerating cross-encoder inference by sparsifying the attention process or improving first-stage retrieval effectiveness using more complex models, e.g. late-interaction ones. In this work, we propose to bridge these two approaches, based on an in-depth understanding of the internal mechanisms of cross-encoders. Starting from cross-encoders, we show that it is possible to derive a new late-interaction-like architecture by carefully removing detrimental or unnecessary interactions. We name this architecture MICE (Minimal Interaction Cross-Encoders). We extensively evaluate MICE across both in-domain (ID) and out-of-domain (OOD) datasets. MICE decreases fourfold the inference latency compared to standard cross-encoders, matching late-interaction models like ColBERT while retaining most of cross-encoder ID effectiveness and demonstrating superior generalization abilities in OOD.
To install this repository, first ensure you have git and uv installed.
-
Clone the repository and its submodules:
git clone --recurse-submodules git@github.com:xpmir/mice.git cd miceIf you have already cloned the repository without
--recurse-submodules, you can initialize and update them with:git submodule update --init --recursive
-
Synchronize the Python dependencies using
uv:uv sync
Experimaestro is used to launch and monitor experiments. You can run an experiment training a MICE Model based on the MiniLM-L12-v2 backbone using the following command:
uv run experimaestro run-experiment src/midFusion_training/midfusion_minilm_l4.yamlWe depend on several key packages:
experimaestro-pythonfor experiment management.ir-datasetsto access IR collections.
The paper is currently under-review and the citation will be updated following the notifications.