feat: add native Google/Gemini Embedding 2 support with Parts API#718
feat: add native Google/Gemini Embedding 2 support with Parts API#718ZaynJarvis wants to merge 16 commits intovolcengine:mainfrom
Conversation
- Replace OpenAI-compatible implementation with native Gemini API - Support task-specific embeddings (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.) - Add Matryoshka dimension reduction support - Include chunking for oversized texts - Add configuration examples and documentation - Support both simple and key=value parameter formats - Use Parts API for future multimodal capability
- Remove stray server.pid file - Fix base URL to https://generativelanguage.googleapis.com/v1beta - Use x-goog-api-key header instead of URL parameter - Remove model field from request body (already in URL) - Follow official Google API format exactly
- Remove support for text-embedding-004 and text-embedding-005 - Focus implementation on gemini-embedding-2-preview only - Add model validation to ensure only supported model is used - Update documentation to reflect single model support - Clarify that this is specifically for Gemini Embedding 2
- Covers basic functionality, advanced features, error handling - Includes 11 test scenarios with expected outcomes - Provides configuration examples and debug commands - Ready for real-world testing with provided API key
|
@qin-ctx /review |
qin-ctx
left a comment
There was a problem hiding this comment.
Review Summary
Found 4 blocking bugs that prevent embed() from functioning at runtime, plus 4 non-blocking suggestions.
Blocking Issues
_estimate_tokens()method is not defined anywhere in the class hierarchy — everyembed()call will crash withAttributeError_chunk_text()method is not defined in the embedder class hierarchy — chunking logic will crashself.max_tokensvsself._max_tokens— attribute name mismatch causesAttributeErrorcfg.max_tokens— field does not exist onEmbeddingModelConfig, factory lambda will crash
Non-blocking
- Inconsistent camelCase/snake_case in API request body (
taskTypevsoutput_dimensionality) - No retry mechanism for HTTP requests
- No automated unit tests
- Gemini model row added to Volcengine model table in docs
| def _chunk_and_embed(self, text: str, is_query: bool = False) -> EmbedResult: | ||
| """Chunk oversized text and average the embeddings. | ||
|
|
||
| Args: |
There was a problem hiding this comment.
[Bug] (blocking) _chunk_text() is not defined in the embedder class hierarchy.
_chunk_and_embed() calls self._chunk_text(text, self.max_tokens), but this method does not exist in GoogleDenseEmbedder or any of its parent classes. The only _chunk_text in the codebase is a @staticmethod on SessionCompressor (in openviking/session/compressor.py), which is unrelated to embedders.
Also, self.max_tokens should be self._max_tokens (same attribute name mismatch as above).
There was a problem hiding this comment.
Checked: _chunk_text() is defined in EmbedderBase (base.py:121) and inherited. Real bug found and fixed: the call was passing two args (self._chunk_text(text, self.max_tokens)) to a method that only accepts one. Removed the extra argument.
| "api_key": cfg.api_key, | ||
| "api_base": cfg.api_base, | ||
| "dimension": cfg.dimension, | ||
| **({"query_param": cfg.query_param} if cfg.query_param else {}), |
There was a problem hiding this comment.
[Bug] (blocking) cfg.max_tokens does not exist on EmbeddingModelConfig.
EmbeddingModelConfig (pydantic model with extra="forbid") has no max_tokens field. Accessing cfg.max_tokens will raise AttributeError. The max_tokens field exists on VLMConfig but not on EmbeddingModelConfig.
Suggested fix: Either add a max_tokens field to EmbeddingModelConfig, or remove this line and handle the default inside GoogleDenseEmbedder.__init__ (which already defaults to 8192).
There was a problem hiding this comment.
Checked: max_tokens field IS defined on EmbeddingModelConfig (embedding_config.py:54-57). No fix needed.
| # Build request body using Parts API | ||
| request_body = {"content": {"parts": [{"text": text}]}} | ||
|
|
||
| # Add task-specific parameters |
There was a problem hiding this comment.
[Design] (non-blocking) No retry mechanism for API requests.
This uses raw requests.post() without any retry logic. Other providers (Jina, Voyage, OpenAI) benefit from the OpenAI client's built-in retry mechanism. The base module already provides exponential_backoff_retry in openviking/models/embedder/base.py which could be used here to handle transient network failures.
There was a problem hiding this comment.
Fixed: wrapped requests.post with exponential_backoff_retry from base module, retrying on ConnectionError and Timeout.
| } | ||
|
|
||
|
|
||
| class GoogleDenseEmbedder(DenseEmbedderBase): |
There was a problem hiding this comment.
[Suggestion] (non-blocking) No automated unit tests for GoogleDenseEmbedder.
Consider adding unit tests covering at least: constructor validation (missing API key, unsupported model, dimension exceeding max), parameter parsing (_parse_param_string, _build_request_params), and mocked API response handling.
docs/en/guides/01-configuration.md
Outdated
| @@ -128,6 +128,7 @@ Embedding model configuration for vector search, supporting dense, sparse, and h | |||
| |-------|-----------|------------|-------| | |||
| | `doubao-embedding-vision-250615` | 1024 | multimodal | Recommended | | |||
| | `doubao-embedding-250615` | 1024 | text | Text only | | |||
There was a problem hiding this comment.
[Suggestion] (non-blocking) gemini-embedding-2-preview is added to the "Available Models" table which currently only contains Volcengine doubao-* models. This could be confusing since they are from different providers. Consider either adding a "Provider" column to the table, or listing Google models in a separate table under the Google provider section below.
There was a problem hiding this comment.
Fixed: added a Provider column to the table so it is clear which provider each model belongs to.
- Fix _chunk_text called with extra arg (real bug: base method only accepts text) - Fix inconsistent API key naming: output_dimensionality -> outputDimensionality - Add exponential_backoff_retry for transient network failures - Add Provider column to docs model table for clarity
|
four bugs are all possibly caused by #741, will fix now. |
- Add max_tokens property, _estimate_tokens, _chunk_text (+ helpers) to GoogleDenseEmbedder — these were removed from base class in main - Restore max_tokens field on EmbeddingModelConfig for google factory
- Both snake_case (task_type) and camelCase (taskType) are accepted by the API - All task type values produce identical embeddings in this model version - Parameter is forwarded for forward compatibility with future model versions
gemini-embedding-2-preview silently ignores taskType — verified 2026-03-19 at full 3072 dims, all task types return bit-for-bit identical vectors. Remove query_param, document_param, _parse_param_string, _build_request_params. Add note in docstring. Update factory and tests accordingly.
Overview
This PR adds native Google Gemini Embedding 2 support using the official Google API instead of OpenAI-compatible format.
Status: 🔍 Code reviewed. Pending real world testing.
Key Changes
/v1beta/models/gemini-embedding-2-preview:embedContent) with Parts formatgemini-embedding-2-preview(3072 dimensions with MRL support)RETRIEVAL_QUERY,RETRIEVAL_DOCUMENT,SEMANTIC_SIMILARITY,CLASSIFICATION, andCLUSTERINGtask types'RETRIEVAL_QUERY') and key=value format (e.g.,'task_type=RETRIEVAL_QUERY,output_dimensionality=1024')output_dimensionalityparameterReferences
is_queryparameter pattern from PR feat(embedding): combine document embedder and query embedder to avoi… #702 for consistent embedder interfaceTesting Needed