Problem
database.py hardcodes VECTOR_DIMS = 384, which matches the default BAAI/bge-small-en-v1.5 model. Changing TROVE_EMBEDDING_MODEL to any model with different output dimensions (e.g. intfloat/multilingual-e5-large at 1024, or sentence-transformers/paraphrase-multilingual-mpnet-base-v2 at 768) causes:
sqlite3.OperationalError: Dimension mismatch for inserted vector for the "embedding" column. Expected 384 dimensions but received 1024.
This also blocks multilingual use, since all fastembed-supported multilingual models output either 384, 768, or 1024 dims — and only one happens to match the hardcoded value.
Suggested fix
Derive VECTOR_DIMS from the actual embedding model at init time:
from fastembed import TextEmbedding
def get_vector_dims(model_name: str) -> int:
model = TextEmbedding(model_name=model_name)
return len(list(model.embed(["dimension probe"]))[0])
Then use the result when creating the chunks_vec virtual table:
CREATE VIRTUAL TABLE IF NOT EXISTS chunks_vec USING vec0(
embedding float[{vector_dims}]
);
Secondary issue: TROVE_PATHS splits on :, breaks Windows drive letters
C:\Users\foo\Documents gets split into C and \Users\foo\Documents. Fix: use os.pathsep (; on Windows, : on Linux) instead of hardcoded ":".
Environment
Windows 11
mcp-trove-crunchtools 0.3.0 via uvx
fastembed with intfloat/multilingual-e5-large
Problem
database.pyhardcodesVECTOR_DIMS = 384, which matches the default BAAI/bge-small-en-v1.5 model. Changing TROVE_EMBEDDING_MODEL to any model with different output dimensions (e.g.intfloat/multilingual-e5-largeat 1024, orsentence-transformers/paraphrase-multilingual-mpnet-base-v2at 768) causes:sqlite3.OperationalError: Dimension mismatch for inserted vector for the "embedding" column. Expected 384 dimensions but received 1024.This also blocks multilingual use, since all fastembed-supported multilingual models output either 384, 768, or 1024 dims — and only one happens to match the hardcoded value.
Suggested fix
Derive VECTOR_DIMS from the actual embedding model at init time:
Then use the result when creating the chunks_vec virtual table:
Secondary issue: TROVE_PATHS splits on :, breaks Windows drive letters
C:\Users\foo\Documents gets split into C and \Users\foo\Documents. Fix: use
os.pathsep(; on Windows, : on Linux) instead of hardcoded ":".Environment
Windows 11
mcp-trove-crunchtools 0.3.0 via uvx
fastembed with intfloat/multilingual-e5-large