[SC-16470] Support keyless Gemini fallback for judge configuration by juanmleng · Pull Request #521 · validmind/validmind-library

juanmleng · 2026-05-27T20:16:40Z

Pull Request Description

What and why?

Supports keyless Gemini setups for ValidMind judge flows by defaulting to Gemini when OpenAI and Azure are not explicitly configured, instead of requiring Gemini API keys up front. This also extends the DeepEval scorer path to work without Gemini keys and updates tests and notebook guidance to reflect the new keyed and keyless Gemini behavior.

This is needed so enterprise Gemini users can run evaluations in environments where Gemini access is available without API keys.

How to test

Run uv run pytest tests/unit_tests/test_ai_utils.py

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Added support for keyless Gemini evaluation in ValidMind. Gemini now works as the default fallback judge provider when OpenAI and Azure are not explicitly configured, including DeepEval scorer flows, and the related notebook guidance was updated to document both keyed and keyless Gemini setups.

Checklist

Co-authored-by: Cursor <cursoragent@cursor.com>

johnwalz97

had one note but lgtm!

johnwalz97 · 2026-05-27T23:08:39Z

-        return "gemini"
-
-    return None
+    return "gemini"


Just a thought: What if instead of trying to auto-configure creds and config for the LLM, we just accept a LangChain client object. That way the user has full flexibility to use whatever provider and credentials that they want?

@juanmleng this is similar to what I was referring to today re: "bring your own client/judge" :)

I'd say this change is good for now (if a new version is needed soon) but we should figure out a more flexible interface so we don't have to change internal implementation of the code whenever the underlying LLM/client interface changes.

Great point @johnwalz97, totally agree. Worth noting that DeepEval scorers can’t use a raw LangChain client directly, so we would need an adapter around it. So perhaps we can leave this as is for now, and in the next iteration give it a bit of thought on how to expose a cleaner client-based API as @cachafla suggested?

Co-authored-by: Cursor <cursoragent@cursor.com>

juanmleng · 2026-05-28T08:39:28Z

There was a separate issue I ran into while testing this, and it also reproduces on main: DeepEval is now trying to log scorer results to the Confident AI platform, which causes the scorer flow to fail locally with a missing/invalid DeepEval/Confident API key error.

The last commit (avoid logging deepeval scorers to Confident AI) addresses that by keeping the ValidMind DeepEval scorer path local instead of letting it attempt the Confident upload flow.

When you get a chance, could you please do a quick second sanity-check pass on the latest commit before we merge?

Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-28T20:29:31Z

PR Summary

This PR introduces significant updates to the LLM and DeepEval integration within the ValidMind project. The key changes include:

Updates to the configuration in notebooks, where the API host has been updated from a local endpoint to the production URL and placeholder keys have been updated accordingly.
Version updates across various project files (pyproject.toml, DESCRIPTION, uv.lock, and version.py) to reflect the new version, ensuring consistency.
Enhancements in the ai/utils.py module:
- The provider resolution function now directly returns "gemini" rather than conditionally checking for a Google API key.
- The configuration for Gemini judge models now leverages keyword arguments more effectively, and the corresponding embedding model reference has been updated.
- Introduction of new helper functions to support Gemini DeepEval functionality, such as _import_deepeval_base_llm, _unwrap_deepeval_response, and _build_gemini_deepeval_model. These functions encapsulate the creation of a DeepEval model that wraps around the Gemini judge LLM.
- A new run_deepeval_evaluation function is added to standardize evaluation runs. It temporarily disables the confidence check (by overriding is_confident) when running evaluations.
Updates to several DeepEval scorer modules that now use the new run_deepeval_evaluation function instead of directly calling evaluate from deepeval. This change is reflected across multiple scoring implementations (e.g., AnswerRelevancy, ArgumentCorrectness, Bias, ContextualPrecision, etc.), ensuring a consistent evaluation interface.
Additional test cases have been added to the unit test suite (in tests/unit_tests/test_ai_utils.py) to cover scenarios where provider environment variables are not set, ensuring that the defaults (Gemini) are correctly used. Tests for both synchronous and asynchronous outputs of the DeepEval model have been introduced, along with tests verifying that deepeval evaluation disables confident requests appropriately.

Overall, these changes improve the integration with the Gemini LLM and DeepEval components, provide better default behavior when environment variables are missing, and add thorough test coverage for the new and modified functionality.

Test Suggestions

Verify that the API host and key/secret values are correctly configured for production in the notebook.
Run unit tests to ensure get_client_and_model returns expected defaults when no provider-specific environment variables are set.
Test the new DeepEval model functionality including synchronous (generate) and asynchronous (a_generate) methods, both with and without structured output schemas.
Validate that run_deepeval_evaluation properly overrides and restores the is_confident flag during evaluations.
Simulate ImportError scenarios (e.g., missing deepeval modules) to ensure appropriate error messages are raised.

ai: support keyless Gemini fallback

6db7d8e

Co-authored-by: Cursor <cursoragent@cursor.com>

juanmleng self-assigned this May 27, 2026

juanmleng added the enhancement New feature or request label May 27, 2026

juanmleng changed the title ~~ai: support keyless Gemini fallback~~ [SC-16470] Support keyless Gemini fallback for judge configuration May 27, 2026

juanmleng requested review from AnilSorathiya, cachafla and johnwalz97 May 27, 2026 20:18

johnwalz97 approved these changes May 27, 2026

View reviewed changes

avoid logging deepeval scorers to Confident AI

0de95a6

Co-authored-by: Cursor <cursoragent@cursor.com>

AnilSorathiya approved these changes May 28, 2026

View reviewed changes

cachafla approved these changes May 28, 2026

View reviewed changes

2.13.5

0c9182a

Co-authored-by: Cursor <cursoragent@cursor.com>

juanmleng merged commit 71c4a7f into main May 28, 2026
21 checks passed

juanmleng deleted the juan/sc-16470/support-keyless-gemini-fallback-for-judge-configuration branch May 28, 2026 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SC-16470] Support keyless Gemini fallback for judge configuration #521

[SC-16470] Support keyless Gemini fallback for judge configuration #521
juanmleng merged 3 commits into
mainfrom
juan/sc-16470/support-keyless-gemini-fallback-for-judge-configuration

juanmleng commented May 27, 2026

Uh oh!

johnwalz97 left a comment

Uh oh!

johnwalz97 May 27, 2026

Uh oh!

cachafla May 27, 2026

Uh oh!

juanmleng May 28, 2026

Uh oh!

juanmleng commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

juanmleng commented May 27, 2026

Pull Request Description

What and why?

How to test

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

Uh oh!

johnwalz97 left a comment

Choose a reason for hiding this comment

Uh oh!

johnwalz97 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

cachafla May 27, 2026

Choose a reason for hiding this comment

Uh oh!

juanmleng May 28, 2026

Choose a reason for hiding this comment

Uh oh!

juanmleng commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

PR Summary

Test Suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants