Skip to content

docs(vertexai): replace incorrect model identifiers with auto-discovery note#5406

Open
major wants to merge 2 commits intollamastack:mainfrom
major:fix/5403
Open

docs(vertexai): replace incorrect model identifiers with auto-discovery note#5406
major wants to merge 2 commits intollamastack:mainfrom
major:fix/5403

Conversation

@major
Copy link
Copy Markdown
Contributor

@major major commented Apr 1, 2026

What does this PR do?

The Vertex AI provider description listed model identifiers with a vertex_ai/ prefix (e.g. vertex_ai/gemini-2.5-flash) that never matched the resource names returned by the Vertex AI API (publishers/google/models/gemini-2.5-flash), causing server startup failures when users followed the documentation.

Since the provider rewrite in #4951, models are dynamically discovered from the Vertex AI project at startup via list_models(). The hardcoded model list was stale and misleading. This replaces it with a note that models are auto-discovered.

Closes #5403

How model discovery works after #4951

  • Auto-discovery (default): list_models() queries the Vertex AI API at startup and registers all available models automatically. No user configuration needed. The routing table prefixes each model with the provider ID, so publishers/google/models/gemini-2.5-flash becomes vertexai/publishers/google/models/gemini-2.5-flash.
  • Short-form model IDs work too: API requests can use vertexai/gemini-2.5-flash as a shorthand. The google-genai SDK normalizes the name internally.
  • Restricting models via allowed_models: Users who want a subset can set allowed_models in the provider config using the full API resource names (e.g. publishers/google/models/gemini-2.5-flash).
  • The old vertex_ai/gemini-2.5-flash format was never valid as a provider_resource_id and failed check_model_availability() against the API-populated cache.

Verified with a live Vertex AI project

Ran llama stack run with remote::vertexai against a real GCP project. Server auto-discovered 13 models at startup:

$ curl -s http://localhost:8321/v1/models | python3 -c "import sys,json; [print(m['id']) for m in json.load(sys.stdin)['data']]"
vertexai/publishers/google/models/gemini-1.5-pro-002
vertexai/publishers/google/models/gemini-2.0-flash-001
vertexai/publishers/google/models/gemini-2.0-flash
vertexai/publishers/google/models/gemini-2.0-flash-lite-001
vertexai/publishers/google/models/gemini-2.5-flash-preview-04-17
vertexai/publishers/google/models/gemini-2.5-pro-exp-03-25
vertexai/publishers/google/models/gemini-2.5-pro
vertexai/publishers/google/models/gemini-2.5-flash
vertexai/publishers/google/models/gemini-2.5-flash-lite
vertexai/publishers/google/models/gemini-2.5-pro-tts
vertexai/publishers/google/models/gemini-2.5-flash-tts
vertexai/publishers/google/models/gemini-live-2.5-flash-native-audio
vertexai/publishers/google/models/gemini-3.1-flash-image-preview

Chat completion with short-form ID works:

$ curl -s http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "vertexai/gemini-2.5-flash", "messages": [{"role": "user", "content": "Say hello in exactly 5 words."}]}' \
  | python3 -m json.tool
{
    "id": "chatcmpl-...",
    "choices": [{"message": {"role": "assistant", "content": "Hello there, my dear friend."}, "finish_reason": "stop", ...}],
    "model": "vertexai/gemini-2.5-flash",
    ...
}

Test Plan

Documentation-only change. Verified provider_codegen.py regenerates remote_vertexai.mdx cleanly and all pre-commit hooks pass.

$ uv run ./scripts/provider_codegen.py
Processing provider registry
  Processing provider registry...
$ SKIP=actionlint git commit ...
Provider Codegen.............................................................Passed
ruff (legacy alias)..........................................................Passed
ruff format..................................................................Passed
mypy.........................................................................Passed
markdownlint.............................................(no files to check)Skipped

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 1, 2026
@major major marked this pull request as draft April 1, 2026 16:22
@major major force-pushed the fix/5403 branch 2 times, most recently from b4150ac to 2dd9824 Compare April 1, 2026 16:24
@major major marked this pull request as ready for review April 1, 2026 17:40
…ry note

The provider description listed model identifiers with a vertex_ai/
prefix (e.g. vertex_ai/gemini-2.5-flash) that never matched the
resource names returned by the Vertex AI API, causing registration
failures at startup. Replace the stale hardcoded list with a note
that models are automatically discovered from the project.

Signed-off-by: Major Hayden <major@mhtx.net>
@major major closed this Apr 1, 2026
@major major reopened this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Incorrect Documentation ] Vertex AI model identifiers in documentation are incompatible with Llama Stack 0.6.0

1 participant