Skip to content

Bug: VECTOR_DIMS hardcoded to 384, ignores TROVE_EMBEDDING_MODEL #7

@Ged-fi

Description

@Ged-fi

Problem
database.py hardcodes VECTOR_DIMS = 384, which matches the default BAAI/bge-small-en-v1.5 model. Changing TROVE_EMBEDDING_MODEL to any model with different output dimensions (e.g. intfloat/multilingual-e5-large at 1024, or sentence-transformers/paraphrase-multilingual-mpnet-base-v2 at 768) causes:

sqlite3.OperationalError: Dimension mismatch for inserted vector for the "embedding" column. Expected 384 dimensions but received 1024.
This also blocks multilingual use, since all fastembed-supported multilingual models output either 384, 768, or 1024 dims — and only one happens to match the hardcoded value.

Suggested fix
Derive VECTOR_DIMS from the actual embedding model at init time:

from fastembed import TextEmbedding

def get_vector_dims(model_name: str) -> int:
    model = TextEmbedding(model_name=model_name)
    return len(list(model.embed(["dimension probe"]))[0])

Then use the result when creating the chunks_vec virtual table:

CREATE VIRTUAL TABLE IF NOT EXISTS chunks_vec USING vec0(
    embedding float[{vector_dims}]
);

Secondary issue: TROVE_PATHS splits on :, breaks Windows drive letters
C:\Users\foo\Documents gets split into C and \Users\foo\Documents. Fix: use os.pathsep (; on Windows, : on Linux) instead of hardcoded ":".

Environment
Windows 11

mcp-trove-crunchtools 0.3.0 via uvx

fastembed with intfloat/multilingual-e5-large

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions