Skip to content

feat: support Gemini Embedding 2 as native multimodal embedding provider #1216

@lost9999

Description

@lost9999

Problem

GBrain v0.37 has a useful multimodal embedding path, but the native Google recipe only exposes gemini-embedding-001 as a text embedding model. embedMultimodal() currently rejects the Google recipe because supports_multimodal is not set, and the error message points operators to Voyage.

Google has shipped Gemini Embedding 2 / gemini-embedding-2-preview, a native multimodal embedding model that maps text, images, video, audio, and documents into a single embedding space.

Evidence

Google docs: https://ai.google.dev/gemini-api/docs/embeddings

  • gemini-embedding-2 / gemini-embedding-2-preview supports multimodal input: text, images, video, audio, PDF.
  • Output dimensions support 128-3072, recommended 768/1536/3072.
  • Current GBrain Google recipe only lists gemini-embedding-001 under embedding.
  • Current embedMultimodal() error says: Today: voyage:voyage-multimodal-3.

Local validation

LiteLLM proxy in front of Gemini can expose gemini-embedding-2-preview and text embeddings work:

POST /v1/embeddings model=gemini-embedding-2-preview input="hello world"
→ OK, 3072 dims

But the OpenAI-compatible multimodal content-array shape that GBrain sends through embedMultimodalOpenAICompat() fails via LiteLLM:

POST /v1/embeddings
input=[{type:"image_url", image_url:{url:"data:image/png;base64,..."}}]
→ 400 INVALID_ARGUMENT
Invalid value at 'requests[0].content.parts[0]' (text), Starting an object on a scalar field

So the LiteLLM path is not currently sufficient as a drop-in bridge for Gemini multimodal embeddings.

Desired behavior

Allow configs like:

gbrain config set embedding_multimodal true
gbrain config set embedding_multimodal_model google:gemini-embedding-2-preview
# or google:gemini-embedding-2 when stable

Then image ingestion / gbrain reindex --multimodal / cross-modal query paths should use Google native multimodal embeddings instead of requiring Voyage.

Why this matters

  • Single-vendor deployment for users already on Gemini.
  • Native all-modal support: text, images, video, audio, documents.
  • Avoids Voyage free-tier 429s during vault image ingestion.
  • Aligns with GBrain's provider-agnostic embedding architecture.

Implementation sketch

  • Extend src/core/ai/recipes/google.ts with Gemini Embedding 2 model(s).
  • Add Google native embedMultimodal request serialization for models/*:embedContent with parts containing text and inline image data.
  • Handle output dimensions consistently with GBrain schema / embedding_image and embedding_multimodal columns.
  • Add tests for text-only, image-only, text+image request payloads and model validation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions