Skip to content

chore(pricing): Update vertex-ai pricing#550

Open
siddharthsambharia-portkey wants to merge 44 commits intomainfrom
pricing-update/vertex-ai
Open

chore(pricing): Update vertex-ai pricing#550
siddharthsambharia-portkey wants to merge 44 commits intomainfrom
pricing-update/vertex-ai

Conversation

@siddharthsambharia-portkey
Copy link
Copy Markdown
Collaborator

@siddharthsambharia-portkey siddharthsambharia-portkey commented Mar 17, 2026

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type Count
➕ Models added 3
🔄 Models updated (merged) 14

➕ New Models

  • gemini-2.5-pro-tts
  • gemini-2.5-flash-tts
  • veo-3.1-lite-generate-001

🔄 Updated Models

  • gemini-3.1-pro-preview
  • gemini-3-pro-image-preview
  • gemini-3.1-flash-image-preview
  • gemini-3.1-flash-lite-preview
  • gemini-3-flash-preview
  • gemini-2.5-pro
  • veo-3.1-generate-001
  • veo-3.0-generate-001
  • veo-3.0-fast-generate-001
  • text-embedding-005
  • text-embedding-large-exp-03-07
  • text-multilingual-embedding-002
  • textembedding-gecko
  • multimodalembedding

Model-to-Pricing-Page Mapping

Model ID Publisher / Section Source Notes
gemini-3.1-pro-preview Google – Gemini 3.1 API Standard (≤200K) rates; Gemini 3 search $14/1000
gemini-3-pro-preview Google – Gemini 3 API Same pricing as Gemini 3.1 Pro ($2/$12/M)
gemini-3-pro-image-preview Google – Gemini 3 API Image output token $120/M; batch included
gemini-3.1-flash-image-preview Google – Gemini 3.1 API Image output token $60/M; batch included
gemini-3.1-flash-lite-preview Google – Gemini 3.1 API Input $0.25/M, output $1.50/M
gemini-3-flash-preview Google – Gemini 3 API Input $0.50/M, output $3/M
gemini-2.5-pro Google – Gemini 2.5 API Standard (≤200K) rates; long-context tier noted
gemini-2.5-flash Google – Gemini 2.5 API Standard rates
gemini-2.5-flash-preview-09-2025 Google – Gemini 2.5 API Matches gemini-2.5-flash pricing
gemini-2.5-flash-lite Google – Gemini 2.5 API Input $0.10/M, output $0.40/M
gemini-2.5-flash-lite-preview-09-2025 Google – Gemini 2.5 API Matches gemini-2.5-flash-lite pricing
gemini-2.5-flash-image Google – Gemini 2.5 API Image output token $30/M; batch included
gemini-2.5-computer-use-preview-10-2025 Google – Gemini 2.5 API Matched Computer Use row; same pricing as 2.5 Pro
gemini-2.0-flash-001 Google – Gemini 2.0 API Canonical ID with -001 suffix per ID rules
gemini-2.0-flash-lite-001 Google – Gemini 2.0 API Canonical ID with -001 suffix per ID rules
gemini-2.5-pro-tts Google – Gemini 2.5 API – price not found TTS model; no pricing row found; added with price 0
gemini-2.5-flash-tts Google – Gemini 2.5 API – price not found TTS model; no pricing row found; added with price 0
imagen-4.0-ultra-generate-001 Google – Imagen API $0.06/image
imagen-4.0-generate-001 Google – Imagen API $0.04/image
imagen-4.0-fast-generate-001 Google – Imagen API $0.02/image
imagen-3.0-generate-002 Google – Imagen API $0.04/image
imagen-3.0-capability-001 Google – Imagen API Capability model mapped to imagen-3.0-generate pricing ($0.04/image)
imagen-3.0-capability-002 Google – Imagen API Capability model mapped to imagen-3.0-generate pricing ($0.04/image)
veo-3.1-generate-001 Google – Veo API $0.40/sec (Video+Audio 720p/1080p); duration=8s, count=1
veo-3.1-fast-generate-001 Google – Veo API $0.15/sec (Video+Audio 720p/1080p)
veo-3.1-lite-generate-001 Google – Veo API $0.05/sec (Video+Audio 720p)
veo-3.0-generate-001 Google – Veo API $0.40/sec (Video+Audio 720p/1080p)
veo-3.0-fast-generate-001 Google – Veo API $0.15/sec (Video+Audio)
veo-2.0-generate-001 Google – Veo API $0.50/sec (Video 720p)
gemini-embedding-001 Google – Embedding API $0.00015/1K tokens (online)
text-embedding-005 Google – Embedding API $0.000025/1K chars → converted to $0.0001/1K tokens
text-embedding-large-exp-03-07 Google – Embedding API Same pricing as text-embedding-005 (no dedicated row)
text-multilingual-embedding-002 Google – Embedding API $0.000025/1K chars → converted to $0.0001/1K tokens
textembedding-gecko Google – Embedding API Legacy; uses standard text embedding pricing
multimodalembedding Google – Embedding API Text component: $0.0002/1K chars → $0.0001/1K tokens
gemini-embedding-2-preview Google – Embedding API – price not found Gemini Embedding 2 Unified Multimodal (Preview); added with price 0 pending confirmed pricing
claude-opus-4-6 Anthropic – Claude API @default stripped; Global pricing $5/$25/M; cache included
claude-sonnet-4-6 Anthropic – Claude API @default stripped; Global pricing $3/$15/M; cache included
claude-opus-4-5@20251101 Anthropic – Claude API Pinned version; Global pricing $5/$25/M; cache included
claude-sonnet-4-5@20250929 Anthropic – Claude API Pinned version; Global pricing $3/$15/M; cache included
claude-haiku-4-5@20251001 Anthropic – Claude API Pinned version; Global pricing $1/$5/M; cache included
claude-opus-4-1@20250805 Anthropic – Claude API Pinned version; Uniform pricing $15/$75/M; cache included
claude-opus-4@20250514 Anthropic – Claude API Pinned version; Uniform pricing $15/$75/M; cache included
claude-sonnet-4@20250514 Anthropic – Claude API Pinned version; Uniform pricing $3/$15/M; cache included
gpt-oss-120b-maas OpenAI API MaaS model; $0.09/$0.36/M; batch included
llama-3.3-70b-instruct-maas Meta – Llama API Llama 3.3 70B row; $0.72/$0.72/M; batch included
llama-4-maverick-17b-128e-instruct-maas Meta – Llama API Llama 4 Maverick row; $0.35/$1.15/M; batch included
mistral-small-2503 Mistral API Mistral Small 3.1 row; $0.10/$0.30/M
mistral-medium-3 Mistral API $0.40/$2.00/M
codestral-2 Mistral API Codestral 2 row; $0.30/$0.90/M
deepseek-r1-0528-maas DeepSeek API DeepSeek-R1 0528 row; $1.35/$5.40/M; batch included
deepseek-v3.1-maas DeepSeek API DeepSeek-V3.1 row; $0.60/$1.70/M; batch included
deepseek-v3.2-maas DeepSeek API DeepSeek-V3.2 row; $0.56/$1.68/M; batch included
qwen3-235b-a22b-instruct-2507-maas Qwen API Qwen3-235B row; $0.22/$0.88/M; batch included
qwen3-coder-480b-a35b-instruct-maas Qwen API Qwen3-Coder-480B row; $0.22/$1.80/M; batch included
qwen3-next-80b-a3b-instruct-maas Qwen API Qwen3-Next-80B-Instruct row; $0.15/$1.20/M
qwen3-next-80b-a3b-thinking-maas Qwen API Qwen3-Next-80B-Thinking row; $0.15/$1.20/M
kimi-k2-thinking-maas Kimi / Moonshot API Kimi-K2-Thinking row; $0.60/$2.50/M
minimax-m2-maas MiniMax API MiniMax-M2 row; $0.30/$1.20/M
glm-4.7-maas ZAI.org – GLM API GLM-4.7 row; $0.60/$2.20/M
glm-5-maas ZAI.org – GLM API GLM-5 row; $1.00/$3.20/M

Excluded Models (not in output)

Model Publisher Reason
gemini-live-2.5-flash-native-audio Google Global exclude: *-live-* streaming
jamba-large-1.6 AI21 has_deploy: true without -maas (self-deploy)
clip-vit-base-patch32, openclip OpenAI Non-generative (CLIP vision)
whisper-large OpenAI Audio transcription (global exclude)
gpt-oss OpenAI has_deploy: true without -maas (self-deploy)
sam3, faster-r-cnn, mask-r-cnn, retinanet Meta Non-generative CV/segmentation
xlm-roberta-large, roberta-large Meta Non-generative NLP
llama-guard, prompt-guard Meta Guard models (global exclude)
llama2, llama3, llama3_1, llama3-2, llama3-3, llama4, codellama-7b-hf, llama-2-quantized Meta has_deploy: true without -maas (self-deploy)
nllb, imagebind Meta Non-generative (translation / multimodal understanding)
codestral-2501-self-deploy Mistral has_deploy: true + "self-deploy" in name
mistral-ocr-2505 Mistral OCR model (global exclude)
ministral-3, mistral-large-3 Mistral has_deploy: true without -maas (self-deploy)
mistral (mistral-ai), mixtral Mistral has_deploy: true without -maas (self-deploy)
deepseek-r1, deepseek-v3, deepseek-v3-1, deepseek-v3-2 DeepSeek has_deploy: true without -maas (self-deploy)
deepseek-ocr, deepseek-ocr-2, deepseek-ocr-maas DeepSeek OCR model (global exclude)
qwq, qwen3, qwen3-5, qwen3-vl, qwen2, qwen3-embedding, qwen3-coder, qwen3-next, qwen3-coder-next Qwen has_deploy: true without -maas (self-deploy)
qwen-image Qwen Explicit policy exclusion
kimi-k2, kimi-k2-5 Kimi has_deploy: true without -maas (self-deploy)
minimax-m2 MiniMax has_deploy: true without -maas (self-deploy)
glm-4.7, glm-5, glm-4.5 ZAI.org has_deploy: true without -maas (self-deploy)
glm-image ZAI.org Explicit policy exclusion
glm-ocr ZAI.org OCR model (global exclude)
Various non-generative Google models Google Non-generative ML/CV/NLP (imagegeneration, virtual-try-on, bert, gemma, PaLM, etc.)

Pricing-Page-Only Models (not added — not in API)

Model Publisher Notes
Gemini 1.5 Flash / 1.5 Pro / 1.0 Pro Google On pricing page but NOT returned by get_vertex_models
Llama 3.1 405B / Llama 4 Scout Meta On pricing page but NOT returned by get_vertex_models
gpt-oss-20b OpenAI On pricing page but NOT returned by get_vertex_models
Lyria 3 Pro / Lyria 3 / Lyria 2 Google On pricing page; excluded per global lyria-* rule (music generation)

Generated by Pricing Agent on 2026-04-04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant