Problem
GBrain v0.37 has a useful multimodal embedding path, but the native Google recipe only exposes gemini-embedding-001 as a text embedding model. embedMultimodal() currently rejects the Google recipe because supports_multimodal is not set, and the error message points operators to Voyage.
Google has shipped Gemini Embedding 2 / gemini-embedding-2-preview, a native multimodal embedding model that maps text, images, video, audio, and documents into a single embedding space.
Evidence
Google docs: https://ai.google.dev/gemini-api/docs/embeddings
gemini-embedding-2 / gemini-embedding-2-preview supports multimodal input: text, images, video, audio, PDF.
- Output dimensions support 128-3072, recommended 768/1536/3072.
- Current GBrain Google recipe only lists
gemini-embedding-001 under embedding.
- Current
embedMultimodal() error says: Today: voyage:voyage-multimodal-3.
Local validation
LiteLLM proxy in front of Gemini can expose gemini-embedding-2-preview and text embeddings work:
POST /v1/embeddings model=gemini-embedding-2-preview input="hello world"
→ OK, 3072 dims
But the OpenAI-compatible multimodal content-array shape that GBrain sends through embedMultimodalOpenAICompat() fails via LiteLLM:
POST /v1/embeddings
input=[{type:"image_url", image_url:{url:"data:image/png;base64,..."}}]
→ 400 INVALID_ARGUMENT
Invalid value at 'requests[0].content.parts[0]' (text), Starting an object on a scalar field
So the LiteLLM path is not currently sufficient as a drop-in bridge for Gemini multimodal embeddings.
Desired behavior
Allow configs like:
gbrain config set embedding_multimodal true
gbrain config set embedding_multimodal_model google:gemini-embedding-2-preview
# or google:gemini-embedding-2 when stable
Then image ingestion / gbrain reindex --multimodal / cross-modal query paths should use Google native multimodal embeddings instead of requiring Voyage.
Why this matters
- Single-vendor deployment for users already on Gemini.
- Native all-modal support: text, images, video, audio, documents.
- Avoids Voyage free-tier 429s during vault image ingestion.
- Aligns with GBrain's provider-agnostic embedding architecture.
Implementation sketch
- Extend
src/core/ai/recipes/google.ts with Gemini Embedding 2 model(s).
- Add Google native
embedMultimodal request serialization for models/*:embedContent with parts containing text and inline image data.
- Handle output dimensions consistently with GBrain schema /
embedding_image and embedding_multimodal columns.
- Add tests for text-only, image-only, text+image request payloads and model validation.
Problem
GBrain v0.37 has a useful multimodal embedding path, but the native Google recipe only exposes
gemini-embedding-001as a text embedding model.embedMultimodal()currently rejects the Google recipe becausesupports_multimodalis not set, and the error message points operators to Voyage.Google has shipped Gemini Embedding 2 /
gemini-embedding-2-preview, a native multimodal embedding model that maps text, images, video, audio, and documents into a single embedding space.Evidence
Google docs: https://ai.google.dev/gemini-api/docs/embeddings
gemini-embedding-2/gemini-embedding-2-previewsupports multimodal input: text, images, video, audio, PDF.gemini-embedding-001under embedding.embedMultimodal()error says: Today:voyage:voyage-multimodal-3.Local validation
LiteLLM proxy in front of Gemini can expose
gemini-embedding-2-previewand text embeddings work:But the OpenAI-compatible multimodal content-array shape that GBrain sends through
embedMultimodalOpenAICompat()fails via LiteLLM:So the LiteLLM path is not currently sufficient as a drop-in bridge for Gemini multimodal embeddings.
Desired behavior
Allow configs like:
Then image ingestion /
gbrain reindex --multimodal/ cross-modal query paths should use Google native multimodal embeddings instead of requiring Voyage.Why this matters
Implementation sketch
src/core/ai/recipes/google.tswith Gemini Embedding 2 model(s).embedMultimodalrequest serialization formodels/*:embedContentwithpartscontaining text and inline image data.embedding_imageandembedding_multimodalcolumns.