Skip to content

feat: add native Google/Gemini Embedding 2 support with Parts API#718

Open
ZaynJarvis wants to merge 16 commits intovolcengine:mainfrom
ZaynJarvis:feat/google-embedding-native-api
Open

feat: add native Google/Gemini Embedding 2 support with Parts API#718
ZaynJarvis wants to merge 16 commits intovolcengine:mainfrom
ZaynJarvis:feat/google-embedding-native-api

Conversation

@ZaynJarvis
Copy link
Collaborator

@ZaynJarvis ZaynJarvis commented Mar 17, 2026

Overview

This PR adds native Google Gemini Embedding 2 support using the official Google API instead of OpenAI-compatible format.

Status: 🔍 Code reviewed. Pending real world testing.

Key Changes

  • Native API Integration: Uses Google's native embedding API endpoint (/v1beta/models/gemini-embedding-2-preview:embedContent) with Parts format
  • Gemini Embedding 2 Only: Focused implementation supporting only gemini-embedding-2-preview (3072 dimensions with MRL support)
  • Task-Specific Embeddings: Supports RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, and CLUSTERING task types
  • Flexible Parameter Format: Supports both simple format (e.g., 'RETRIEVAL_QUERY') and key=value format (e.g., 'task_type=RETRIEVAL_QUERY,output_dimensionality=1024')
  • Matryoshka Reduction: Built-in support for dimension reduction using output_dimensionality parameter
  • Future Multimodal Ready: Uses Parts API structure that can be extended for multimodal content
  • Chunking Support: Automatic text chunking and averaging for oversized inputs
  • Updated Documentation: Added configuration examples and provider documentation

References

Testing Needed

  • Real world testing with Google API key
  • Verify task-specific embeddings work correctly
  • Test Matryoshka dimension reduction
  • Validate chunking for oversized inputs

- Replace OpenAI-compatible implementation with native Gemini API
- Support task-specific embeddings (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
- Add Matryoshka dimension reduction support
- Include chunking for oversized texts
- Add configuration examples and documentation
- Support both simple and key=value parameter formats
- Use Parts API for future multimodal capability
- Remove stray server.pid file
- Fix base URL to https://generativelanguage.googleapis.com/v1beta
- Use x-goog-api-key header instead of URL parameter
- Remove model field from request body (already in URL)
- Follow official Google API format exactly
- Remove support for text-embedding-004 and text-embedding-005
- Focus implementation on gemini-embedding-2-preview only
- Add model validation to ensure only supported model is used
- Update documentation to reflect single model support
- Clarify that this is specifically for Gemini Embedding 2
ZaynJarvis and others added 2 commits March 18, 2026 09:32
- Covers basic functionality, advanced features, error handling
- Includes 11 test scenarios with expected outcomes
- Provides configuration examples and debug commands
- Ready for real-world testing with provided API key
@ZaynJarvis ZaynJarvis marked this pull request as ready for review March 18, 2026 03:46
@ZaynJarvis
Copy link
Collaborator Author

@qin-ctx /review

Copy link
Collaborator

@qin-ctx qin-ctx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Found 4 blocking bugs that prevent embed() from functioning at runtime, plus 4 non-blocking suggestions.

Blocking Issues

  1. _estimate_tokens() method is not defined anywhere in the class hierarchy — every embed() call will crash with AttributeError
  2. _chunk_text() method is not defined in the embedder class hierarchy — chunking logic will crash
  3. self.max_tokens vs self._max_tokens — attribute name mismatch causes AttributeError
  4. cfg.max_tokens — field does not exist on EmbeddingModelConfig, factory lambda will crash

Non-blocking

  1. Inconsistent camelCase/snake_case in API request body (taskType vs output_dimensionality)
  2. No retry mechanism for HTTP requests
  3. No automated unit tests
  4. Gemini model row added to Volcengine model table in docs

def _chunk_and_embed(self, text: str, is_query: bool = False) -> EmbedResult:
"""Chunk oversized text and average the embeddings.

Args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) _chunk_text() is not defined in the embedder class hierarchy.

_chunk_and_embed() calls self._chunk_text(text, self.max_tokens), but this method does not exist in GoogleDenseEmbedder or any of its parent classes. The only _chunk_text in the codebase is a @staticmethod on SessionCompressor (in openviking/session/compressor.py), which is unrelated to embedders.

Also, self.max_tokens should be self._max_tokens (same attribute name mismatch as above).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked: _chunk_text() is defined in EmbedderBase (base.py:121) and inherited. Real bug found and fixed: the call was passing two args (self._chunk_text(text, self.max_tokens)) to a method that only accepts one. Removed the extra argument.

"api_key": cfg.api_key,
"api_base": cfg.api_base,
"dimension": cfg.dimension,
**({"query_param": cfg.query_param} if cfg.query_param else {}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) cfg.max_tokens does not exist on EmbeddingModelConfig.

EmbeddingModelConfig (pydantic model with extra="forbid") has no max_tokens field. Accessing cfg.max_tokens will raise AttributeError. The max_tokens field exists on VLMConfig but not on EmbeddingModelConfig.

Suggested fix: Either add a max_tokens field to EmbeddingModelConfig, or remove this line and handle the default inside GoogleDenseEmbedder.__init__ (which already defaults to 8192).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked: max_tokens field IS defined on EmbeddingModelConfig (embedding_config.py:54-57). No fix needed.

# Build request body using Parts API
request_body = {"content": {"parts": [{"text": text}]}}

# Add task-specific parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Design] (non-blocking) No retry mechanism for API requests.

This uses raw requests.post() without any retry logic. Other providers (Jina, Voyage, OpenAI) benefit from the OpenAI client's built-in retry mechanism. The base module already provides exponential_backoff_retry in openviking/models/embedder/base.py which could be used here to handle transient network failures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: wrapped requests.post with exponential_backoff_retry from base module, retrying on ConnectionError and Timeout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix.

}


class GoogleDenseEmbedder(DenseEmbedderBase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] (non-blocking) No automated unit tests for GoogleDenseEmbedder.

Consider adding unit tests covering at least: constructor validation (missing API key, unsupported model, dimension exceeding max), parameter parsing (_parse_param_string, _build_request_params), and mocked API response handling.

@@ -128,6 +128,7 @@ Embedding model configuration for vector search, supporting dense, sparse, and h
|-------|-----------|------------|-------|
| `doubao-embedding-vision-250615` | 1024 | multimodal | Recommended |
| `doubao-embedding-250615` | 1024 | text | Text only |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] (non-blocking) gemini-embedding-2-preview is added to the "Available Models" table which currently only contains Volcengine doubao-* models. This could be confusing since they are from different providers. Consider either adding a "Provider" column to the table, or listing Google models in a separate table under the Google provider section below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: added a Provider column to the table so it is clear which provider each model belongs to.

- Fix _chunk_text called with extra arg (real bug: base method only accepts text)
- Fix inconsistent API key naming: output_dimensionality -> outputDimensionality
- Add exponential_backoff_retry for transient network failures
- Add Provider column to docs model table for clarity
@ZaynJarvis
Copy link
Collaborator Author

four bugs are all possibly caused by #741, will fix now.

- Add max_tokens property, _estimate_tokens, _chunk_text (+ helpers) to
  GoogleDenseEmbedder — these were removed from base class in main
- Restore max_tokens field on EmbeddingModelConfig for google factory
- Both snake_case (task_type) and camelCase (taskType) are accepted by the API
- All task type values produce identical embeddings in this model version
- Parameter is forwarded for forward compatibility with future model versions
gemini-embedding-2-preview silently ignores taskType — verified 2026-03-19
at full 3072 dims, all task types return bit-for-bit identical vectors.
Remove query_param, document_param, _parse_param_string, _build_request_params.
Add note in docstring. Update factory and tests accordingly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants