Skip to content

c0ffee0wl/llm-vertex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

171 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-vertex

Changelog License

Access Google's Gemini models via Vertex AI for enterprise use

Installation

Install this plugin in the same environment as LLM.

Note: This package is not published on PyPI (the llm-vertex name on PyPI is an unrelated project). Install directly from GitHub:

llm install "llm-vertex @ git+https://github.com/c0ffee0wl/llm-vertex.git"

Authentication Setup

This plugin uses Google Cloud Vertex AI, which supports three authentication methods:

Option 1: API Key (Recommended for Testing Only)

Fastest setup, but recommended for testing only:

# Set via llm keys command
llm keys set vertex
# Or via environment variable
export GOOGLE_CLOUD_API_KEY="YOUR_API_KEY"

Get your API key from the Google Cloud Console.

Note: Vertex AI API keys are different from Google AI Studio keys. Make sure to create a Vertex AI-compatible API key in your GCP project. API keys are convenient for development and testing but not recommended for production. For production, use Application Default Credentials (Option 2).

Option 2: Application Default Credentials (Recommended for Production)

If you're already using Google Cloud, authenticate with:

gcloud auth application-default login

This sets up Application Default Credentials (ADC) that the plugin will automatically use.

Option 3: Service Account JSON File

  1. Create a service account in your GCP project with Vertex AI User permissions
  2. Download the JSON key file
  3. Set the environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

Or configure it via the plugin:

llm vertex set-credentials /path/to/service-account.json

Configuration

Set Your GCP Project ID

# Via environment variable
export GOOGLE_CLOUD_PROJECT="your-project-id"

# Or via plugin config
llm vertex set-project your-project-id

Set Your Region (Optional)

The plugin defaults to the global endpoint. However, the global endpoint has important limitations:

  • ⚠️ Does not support tuning, batch prediction, or RAG corpus creation
  • ⚠️ Does not guarantee region-specific ML processing
  • ⚠️ Does not provide data residency compliance

For production use or if you need specific features, use a regional endpoint:

# Via environment variable
export GOOGLE_CLOUD_REGION="us-central1"

# Or via plugin config
llm vertex set-region us-central1

To see all available regions:

llm vertex list-regions

Available regions include:

  • United States: us-central1, us-east1, us-east4, us-east5, us-south1, us-west1, us-west4
  • Canada: northamerica-northeast1
  • South America: southamerica-east1
  • Europe: europe-west1, europe-west2, europe-west3, europe-west4, europe-west6, europe-west8, europe-west9, europe-north1, europe-southwest1, europe-central2
  • Asia Pacific: asia-east1, asia-northeast1, asia-northeast3, asia-southeast1, asia-south1, australia-southeast1, australia-southeast2
  • Middle East: me-central1, me-central2, me-west1

For the latest region availability and model-specific regional support, see the official Vertex AI locations documentation.

View Current Configuration

llm vertex config

Usage

Now run the model using -m vertex/gemini-2.5-flash, for example:

llm -m vertex/gemini-2.5-flash "A short joke about a pelican and a walrus"

You can set the default model to avoid the extra -m option:

llm models default vertex/gemini-2.5-flash
llm "A joke about a pelican and a walrus"

Available models

All Gemini models are available through Vertex AI:

Gemini 3 (Latest - Preview)

  • vertex/gemini-3-pro-preview: Gemini 3 Pro preview (global region only)
  • vertex/gemini-3-pro-preview-11-2025: Gemini 3 Pro November 2025 version (global region only)
  • vertex/gemini-3-pro-preview-11-2025-thinking: Gemini 3 Pro with thinking mode (global region only)
  • vertex/gemini-3-flash-preview: Gemini 3 Flash preview (global region only)

Note: Gemini 3 models automatically use the global endpoint regardless of your configured region.

Thinking Levels: Gemini 3 models support configurable thinking levels:

  • Flash (gemini-3-flash-preview): minimal, low, medium, high
  • Pro (gemini-3-pro-preview): low, high
llm -m vertex/gemini-3-flash-preview -o thinking_level high 'complex reasoning task'

Gemini 3.1 (Latest - Preview)

  • vertex/gemini-3.1-pro-preview: Gemini 3.1 Pro preview (global region only)
  • vertex/gemini-3.1-pro-preview-customtools: Gemini 3.1 Pro with custom tools support (global region only)
  • vertex/gemini-3.1-flash-lite-preview: Gemini 3.1 Flash Lite preview (global region only)

Thinking Levels: Gemini 3.1 models support configurable thinking levels:

  • Pro (gemini-3.1-pro-preview): low, medium, high
  • Flash Lite (gemini-3.1-flash-lite-preview): minimal, low, medium, high
llm -m vertex/gemini-3.1-pro-preview -o thinking_level high 'complex reasoning task'

Note: Gemini 3.1 models automatically use the global endpoint regardless of your configured region.

Gemma 3 (Open Models)

  • vertex/gemma-3-1b-it: Gemma 3 1B (text only, no vision)
  • vertex/gemma-3-4b-it: Gemma 3 4B
  • vertex/gemma-3-12b-it: Gemma 3 12B
  • vertex/gemma-3-27b-it: Gemma 3 27B
  • vertex/gemma-3n-e4b-it: Gemma 3n E4B (text only, no vision)

Note: Gemma models do not support structured output schemas or media resolution settings. gemma-3-1b-it and gemma-3n-e4b-it do not support vision/image inputs.

Gemini 2.5 and earlier

  • vertex/gemini-2.5-flash-lite-preview-09-2025
  • vertex/gemini-2.5-flash-preview-09-2025
  • vertex/gemini-flash-lite-latest: Latest Gemini Flash Lite
  • vertex/gemini-flash-latest: Latest Gemini Flash
  • vertex/gemini-2.5-flash-lite: Gemini 2.5 Flash Lite
  • vertex/gemini-2.5-pro: Gemini 2.5 Pro
  • vertex/gemini-2.5-flash: Gemini 2.5 Flash
  • vertex/gemini-2.0-flash: Gemini 2.0 Flash
  • vertex/gemini-2.0-flash-thinking-exp-01-21: Experimental "thinking" model
  • vertex/gemini-1.5-flash-8b-latest: The least expensive model
  • vertex/gemini-1.5-pro-latest
  • vertex/gemini-1.5-flash-latest

And many more. Use the vertex/ prefix to reference models:

llm -m vertex/gemini-1.5-flash-8b-latest --schema 'name,age int,bio' 'invent a dog'

Note: This plugin provides Gemini models via Vertex AI (enterprise API). For the public Google AI Studio API, use the separate llm-gemini plugin.

Model Regional Availability

Different Gemini models are available in different regions. Here's the detailed availability for the main models:

Gemini 2.5 Flash

The gemini-2.5-flash (GA) model is available in the following regions:

Region Code Geographic Location Notes
Global Global endpoint Limited features (no tuning, batch prediction, or RAG)
United States
us-central1 Iowa, USA
us-east1 South Carolina, USA
us-east4 Northern Virginia, USA
us-east5 Columbus, Ohio, USA
us-south1 Dallas, Texas, USA
us-west1 Oregon, USA
us-west4 Las Vegas, Nevada, USA
Europe
europe-central2 Warsaw, Poland
europe-north1 Hamina, Finland
europe-southwest1 Madrid, Spain
europe-west1 St. Ghislain, Belgium
europe-west4 Eemshaven, Netherlands
europe-west8 Milan, Italy
Canada
northamerica-northeast1 Montréal, Canada
Asia Pacific
asia-northeast1 Tokyo, Japan 128K context window only*
asia-northeast3 Seoul, South Korea 128K context window only*
asia-south1 Mumbai, India 128K context window only*
asia-southeast1 Jurong West, Singapore 128K context window only*
australia-southeast1 Sydney, Australia 128K context window only*

*Regions marked with asterisk have limitations: 128K context window only, supervised fine-tuning not supported.

The preview model (gemini-2.5-flash-preview-09-2025) is available only via the Global endpoint.

Gemini 2.5 Pro

The gemini-2.5-pro model is available in the following regions:

Region Code Geographic Location Notes
Global Global endpoint Limited features (no tuning, batch prediction, or RAG)
United States
us-central1 Iowa, USA
us-east1 South Carolina, USA
us-east4 Northern Virginia, USA
us-east5 Columbus, Ohio, USA
us-south1 Dallas, Texas, USA
us-west1 Oregon, USA
us-west4 Las Vegas, Nevada, USA
Europe
europe-central2 Warsaw, Poland
europe-north1 Hamina, Finland
europe-southwest1 Madrid, Spain
europe-west1 St. Ghislain, Belgium
europe-west4 Eemshaven, Netherlands
europe-west8 Milan, Italy
europe-west9 Paris, France
Asia Pacific
asia-northeast1 Tokyo, Japan 128K context window only; supervised fine-tuning not supported

Important Notes:

Gemini 3 (Preview)

The Gemini 3 models (Pro launched November 18, 2025; Flash launched December 17, 2025) are currently only available via the Global endpoint.

Model ID Availability Thinking Levels Auto-Region Override
gemini-3-pro-preview Global only low, high Yes
gemini-3-pro-preview-11-2025 Global only low, high Yes
gemini-3-pro-preview-11-2025-thinking Global only low, high Yes
gemini-3-flash-preview Global only minimal, low, medium, high Yes
gemini-3.1-pro-preview Global only low, medium, high Yes
gemini-3.1-pro-preview-customtools Global only low, medium, high Yes
gemini-3.1-flash-lite-preview Global only minimal, low, medium, high Yes

Key Features:

  • 1 million token context window
  • 64K token output limit
  • Multimodal support (text, images, audio, video)
  • Google Search grounding
  • Configurable thinking levels (Flash has 4 levels, Pro has 2)
  • Knowledge cutoff: January 2025

Important: These models automatically use the global endpoint regardless of your configured region setting. You don't need to change your GOOGLE_CLOUD_REGION configuration - the plugin handles this automatically.

For more information, see:

Images, audio and video

Gemini models are multi-modal. You can provide images, audio or video files as input like this:

llm -m vertex/gemini-2.5-flash 'extract text' -a image.jpg

Or with a URL:

llm -m vertex/gemini-2.5-flash-lite 'describe image' \
  -a https://static.simonwillison.net/static/2024/pelicans.jpg

Audio works too:

llm -m vertex/gemini-2.5-flash 'transcribe audio' -a audio.mp3

And video:

llm -m vertex/gemini-2.5-flash 'describe what happens' -a video.mp4

JSON output

Use -o json_object 1 to force the output to be JSON:

llm -m vertex/gemini-2.5-flash -o json_object 1 \
  '3 largest cities in California, list of {"name": "..."}'

Outputs:

{"cities": [{"name": "Los Angeles"}, {"name": "San Diego"}, {"name": "San Jose"}]}

Code execution

Gemini models can write and execute code - they can decide to write Python code, execute it in a secure sandbox and use the result as part of their response.

To enable this feature, use -o code_execution 1:

llm -m vertex/gemini-2.5-flash -o code_execution 1 \
'use python to calculate (factorial of 13) * 3'

Google search

Some Gemini models support Grounding with Google Search, where the model can run a Google search and use the results as part of answering a prompt.

To run a prompt with Google search enabled, use -o google_search 1:

llm -m vertex/gemini-2.5-flash -o google_search 1 \
  'What happened in Ireland today?'

URL context

Gemini models support a URL context tool which, when enabled, allows the models to fetch additional content from URLs as part of their execution.

You can enable that with the -o url_context 1 option:

llm -m vertex/gemini-2.5-flash -o url_context 1 'Latest headline on simonwillison.net'

Chat

To chat interactively with the model, run llm chat:

llm chat -m vertex/gemini-2.5-flash

Timeouts

By default there is no timeout against the Vertex AI API. You can use the timeout option to protect against API requests that hang indefinitely:

llm -m vertex/gemini-2.5-flash 'epic saga about mice' -o timeout 1.5

Embeddings

This plugin provides access to Vertex AI embedding models. All embedding model IDs use the vertex/ prefix to distinguish them from other plugins.

Available Embedding Models

Model ID Description Dimensions
vertex/gemini-embedding-001 Latest state-of-the-art model 3072 (default)
vertex/gemini-embedding-001-768 Truncated to 768 dimensions 768
vertex/gemini-embedding-001-1536 Truncated to 1536 dimensions 1536
vertex/text-embedding-005 Text embedding model 768
vertex/text-embedding-004 Legacy text embedding model 768
vertex/text-multilingual-embedding-002 Multilingual embedding model 768

Deprecated models (October 2025):

  • vertex/gemini-embedding-exp-03-07 (and truncated variants -128, -256, -512, -1024, -2048)

Usage

# Using the recommended model
llm embed -m vertex/gemini-embedding-001 -c 'hello world'

# Using a smaller dimension variant for efficiency
llm embed -m vertex/gemini-embedding-001-768 -c 'hello world'

# Using the multilingual model
llm embed -m vertex/text-multilingual-embedding-002 -c 'bonjour le monde'

See the LLM embeddings documentation for further details.

Prerequisites

GCP Project Setup

  1. Create a GCP project at https://console.cloud.google.com
  2. Enable the Vertex AI API for your project
  3. Set up billing for your project

Service Account Setup (if not using ADC)

  1. Go to IAM & Admin > Service Accounts in GCP Console
  2. Create a new service account
  3. Grant it the "Vertex AI User" role
  4. Create and download a JSON key
  5. Configure the plugin with the path to this key (see Configuration above)

Costs

Vertex AI charges for model usage. See Vertex AI pricing for details.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-vertex
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

llm install -e '.[test]'

To run the tests:

pytest

Troubleshooting

"No GCP project ID found" error

Make sure you've set the GOOGLE_CLOUD_PROJECT environment variable or run:

llm vertex set-project your-project-id

Authentication errors

Verify your authentication setup:

llm vertex config

For API key:

llm keys set vertex

For ADC:

gcloud auth application-default login

For service account, ensure GOOGLE_APPLICATION_CREDENTIALS points to a valid JSON file.

"API not enabled" errors

Enable the Vertex AI API:

gcloud services enable aiplatform.googleapis.com --project=your-project-id

About

Plugin for Simon Willison's llm to support Gemini models via Vertex AI for enterprise use

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages