Skip to content

[SC-16470] Support keyless Gemini fallback for judge configuration #521

Merged
juanmleng merged 3 commits into
mainfrom
juan/sc-16470/support-keyless-gemini-fallback-for-judge-configuration
May 28, 2026
Merged

[SC-16470] Support keyless Gemini fallback for judge configuration #521
juanmleng merged 3 commits into
mainfrom
juan/sc-16470/support-keyless-gemini-fallback-for-judge-configuration

Conversation

@juanmleng

Copy link
Copy Markdown
Contributor

Pull Request Description

What and why?

Supports keyless Gemini setups for ValidMind judge flows by defaulting to Gemini when OpenAI and Azure are not explicitly configured, instead of requiring Gemini API keys up front. This also extends the DeepEval scorer path to work without Gemini keys and updates tests and notebook guidance to reflect the new keyed and keyless Gemini behavior.

This is needed so enterprise Gemini users can run evaluations in environments where Gemini access is available without API keys.

How to test

Run uv run pytest tests/unit_tests/test_ai_utils.py

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Added support for keyless Gemini evaluation in ValidMind. Gemini now works as the default fallback judge provider when OpenAI and Azure are not explicitly configured, including DeepEval scorer flows, and the related notebook guidance was updated to document both keyed and keyless Gemini setups.

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

Co-authored-by: Cursor <cursoragent@cursor.com>
@juanmleng juanmleng self-assigned this May 27, 2026
@juanmleng juanmleng added the enhancement New feature or request label May 27, 2026
@juanmleng juanmleng changed the title ai: support keyless Gemini fallback [SC-16470] Support keyless Gemini fallback for judge configuration May 27, 2026

@johnwalz97 johnwalz97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had one note but lgtm!

Comment thread validmind/ai/utils.py
return "gemini"

return None
return "gemini"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: What if instead of trying to auto-configure creds and config for the LLM, we just accept a LangChain client object. That way the user has full flexibility to use whatever provider and credentials that they want?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juanmleng this is similar to what I was referring to today re: "bring your own client/judge" :)

I'd say this change is good for now (if a new version is needed soon) but we should figure out a more flexible interface so we don't have to change internal implementation of the code whenever the underlying LLM/client interface changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point @johnwalz97, totally agree. Worth noting that DeepEval scorers can’t use a raw LangChain client directly, so we would need an adapter around it. So perhaps we can leave this as is for now, and in the next iteration give it a bit of thought on how to expose a cleaner client-based API as @cachafla suggested?

Co-authored-by: Cursor <cursoragent@cursor.com>
@juanmleng

Copy link
Copy Markdown
Contributor Author

There was a separate issue I ran into while testing this, and it also reproduces on main: DeepEval is now trying to log scorer results to the Confident AI platform, which causes the scorer flow to fail locally with a missing/invalid DeepEval/Confident API key error.

The last commit (avoid logging deepeval scorers to Confident AI) addresses that by keeping the ValidMind DeepEval scorer path local instead of letting it attempt the Confident upload flow.

When you get a chance, could you please do a quick second sanity-check pass on the latest commit before we merge?

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

Copy link
Copy Markdown
Contributor

PR Summary

This PR introduces significant updates to the LLM and DeepEval integration within the ValidMind project. The key changes include:

  1. Updates to the configuration in notebooks, where the API host has been updated from a local endpoint to the production URL and placeholder keys have been updated accordingly.

  2. Version updates across various project files (pyproject.toml, DESCRIPTION, uv.lock, and version.py) to reflect the new version, ensuring consistency.

  3. Enhancements in the ai/utils.py module:

    • The provider resolution function now directly returns "gemini" rather than conditionally checking for a Google API key.
    • The configuration for Gemini judge models now leverages keyword arguments more effectively, and the corresponding embedding model reference has been updated.
    • Introduction of new helper functions to support Gemini DeepEval functionality, such as _import_deepeval_base_llm, _unwrap_deepeval_response, and _build_gemini_deepeval_model. These functions encapsulate the creation of a DeepEval model that wraps around the Gemini judge LLM.
    • A new run_deepeval_evaluation function is added to standardize evaluation runs. It temporarily disables the confidence check (by overriding is_confident) when running evaluations.
  4. Updates to several DeepEval scorer modules that now use the new run_deepeval_evaluation function instead of directly calling evaluate from deepeval. This change is reflected across multiple scoring implementations (e.g., AnswerRelevancy, ArgumentCorrectness, Bias, ContextualPrecision, etc.), ensuring a consistent evaluation interface.

  5. Additional test cases have been added to the unit test suite (in tests/unit_tests/test_ai_utils.py) to cover scenarios where provider environment variables are not set, ensuring that the defaults (Gemini) are correctly used. Tests for both synchronous and asynchronous outputs of the DeepEval model have been introduced, along with tests verifying that deepeval evaluation disables confident requests appropriately.

Overall, these changes improve the integration with the Gemini LLM and DeepEval components, provide better default behavior when environment variables are missing, and add thorough test coverage for the new and modified functionality.

Test Suggestions

  • Verify that the API host and key/secret values are correctly configured for production in the notebook.
  • Run unit tests to ensure get_client_and_model returns expected defaults when no provider-specific environment variables are set.
  • Test the new DeepEval model functionality including synchronous (generate) and asynchronous (a_generate) methods, both with and without structured output schemas.
  • Validate that run_deepeval_evaluation properly overrides and restores the is_confident flag during evaluations.
  • Simulate ImportError scenarios (e.g., missing deepeval modules) to ensure appropriate error messages are raised.

@juanmleng juanmleng merged commit 71c4a7f into main May 28, 2026
21 checks passed
@juanmleng juanmleng deleted the juan/sc-16470/support-keyless-gemini-fallback-for-judge-configuration branch May 28, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants