diff --git a/terraform/litellm/README.md b/terraform/litellm/README.md index 5ca704b96dd..8f09cb53407 100644 --- a/terraform/litellm/README.md +++ b/terraform/litellm/README.md @@ -1,18 +1,34 @@ # LiteLLM Terraform stacks -Two self-contained Terraform root modules that deploy the **componentized** -LiteLLM proxy — the gateway, backend, and UI as three independent containers -(see `helm/litellm/` for the canonical chart with the same split). +Two self-contained, reusable Terraform **modules** that deploy the +**componentized** LiteLLM proxy — the gateway, backend, and UI as three +independent containers (see `helm/litellm/` for the canonical chart with the +same split). + +Each module declares **no `provider` block of its own**, so it can be called +with `count` / `for_each` / `depends_on` and the caller controls region, +assume-role / impersonation, aliases, and `default_tags`. A ready-to-run root +that wires the provider lives at `/examples/default/` — that's the +one-command deploy path. To embed a stack in your own config, call the module +by source: + +```hcl +module "litellm" { + source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=" + # ... inputs ... +} +``` | Stack | Compute | Database (writer + reader) | Cache | Object store | Public entrypoint | | ------ | ----------- | ---------------------------------- | ----------- | ------------ | ------------------ | | `aws/` | ECS Fargate | Aurora Postgres (IAM auth) | ElastiCache | S3 | Application LB | | `gcp/` | Cloud Run | Cloud SQL Postgres (password auth) | Memorystore | GCS | External HTTPS LB | -Each stack creates its own VPC and managed data stores — drop in a tfvars -file and run `terraform apply`. Both stacks support a typed `proxy_config` -input (mirrors `helm/litellm`'s `gateway.config.proxy_config`) and per-component -extra env vars / secret-manager refs. +Each stack creates its own VPC and managed data stores — from +`/examples/default/`, drop in a tfvars file and run `terraform apply`. +Both stacks support a typed `proxy_config` input (mirrors `helm/litellm`'s +`gateway.config.proxy_config`) and per-component extra env vars / +secret-manager refs. ## Components @@ -147,6 +163,39 @@ against the backend image: Run the migration job once after the first `terraform apply` and before the gateway/backend services start serving traffic. +## Feature parity between stacks + +The two modules expose the same conceptual surface; concrete inputs differ +only where the underlying cloud forces it. + +| Capability | AWS input(s) | GCP input(s) | +| -------------------------------- | ------------------------------------------------------- | --------------------------------------------------------- | +| Tenant + env naming | `tenant`, `env` | `tenant`, `env` | +| Pre-shared master key / license | `litellm_master_key`, `litellm_license` | `litellm_master_key`, `litellm_license` | +| UI admin password | `ui_password` | `ui_password` | +| Per-deployment tags / labels | `tags` (`map(string)`) | `labels` (`map(string)`) | +| TLS posture | `acm_certificate_arn`, `allow_plaintext_alb` | `lb_domains`, `allow_plaintext_lb` | +| Force destroy of object store | `s3_force_destroy` | `gcs_force_destroy` | +| Database deletion protection | `skip_final_snapshot` | `cloudsql_deletion_protection` | +| `proxy_config` (typed YAML map) | `proxy_config` | `proxy_config` | +| Extra plain env per component | `gateway_extra_env`, `backend_extra_env` | `gateway_extra_env`, `backend_extra_env` | +| Extra secret-backed env | `gateway_extra_secrets`, `backend_extra_secrets` (ARNs) | `gateway_extra_secrets`, `backend_extra_secrets` (resource IDs) | +| Uvicorn `--workers` on gateway | `gateway_num_workers` | `gateway_num_workers` | +| OpenTelemetry v2 (opt-in) | `otel_endpoint`, `otel_exporter`, `otel_environment_name`, `otel_capture_message_content`, `otel_headers_secret_arn` | `otel_endpoint`, `otel_exporter`, `otel_environment_name`, `otel_capture_message_content`, `otel_headers_secret` | + +Each module stamps its own stack-identity tag (`litellm:stack` on AWS, +`litellm-stack` on GCP — GCP label keys forbid colons) plus +`managed-by = "terraform"` onto every taggable / labelable resource and +merges `var.tags` / `var.labels` on top. Provider `default_tags` on AWS +merge on top of all of these. + +OTel is opt-in on both clouds: leave `otel_endpoint` empty and nothing +OTel-related is added to the container env; set it and both gateway and +backend get `LITELLM_OTEL_V2=true` plus the full `OTEL_*` block, with +`OTEL_SERVICE_NAME` stamped per component +(`-litellm--gateway` and `-backend`). Any `OTEL_*` key set +in `gateway_extra_env` / `backend_extra_env` wins for that service. + ## What's not included - TLS certificates / custom domains. Both stacks expose plain-HTTP load @@ -156,4 +205,46 @@ gateway/backend services start serving traffic. backend block to `versions.tf` when graduating to a team environment. - Observability beyond the cloud provider's defaults (CloudWatch logs on AWS, Cloud Logging on GCP). Wire your own Prometheus / Datadog / Langfuse - via the `*_extra_env` variables. + via the `*_extra_env` variables, or turn on OTel v2 (see the parity + table above). + +## HCP Terraform no-code (1-click) deploy + +Both stacks are publishable as no-code modules in HCP Terraform's private +registry. The end-user flow is: open the no-code launch URL, fill in a +few inputs, hit *Create workspace*, and HCP runs plan/apply against your +cloud account using a variable-set of credentials (static keys or +dynamic-credentials OIDC). + +Required overrides the launcher must supply per stack: + +- **AWS** (`terraform/litellm/aws`): `region`, `azs`, `tenant`, `env`. + The image vars (`gateway_image`, `backend_image`, `ui_image`, + `migrations_image`) can be left at their defaults — the GHCR images + are anonymous-readable and ECS Fargate pulls them without extra + credentials. + +- **GCP** (`terraform/litellm/gcp`): `project`, `tenant`, `env`, **and + one of**: + - `image_registry` pointed at an Artifact Registry **remote** repository + backed by `https://ghcr.io` (e.g. + `us-central1-docker.pkg.dev//litellm/berriai`), so Cloud Run + pulls the four upstream `litellm-*` images through it; or + - all four per-component `*_image` URIs pointing at images mirrored + into a regular Artifact Registry repo. + + The defaults (`ghcr.io/berriai`) cause Cloud Run admission to reject + the service spec — Cloud Run only authenticates against Artifact + Registry, `[region.]gcr.io`, or `docker.io`. See + `terraform/litellm/gcp/README.md#image-pulls` for the + `gcloud artifacts repositories create … --mode=remote-repository` + command that sets up the passthrough repo (one-time, per project). + +What still requires a manual step regardless of HCP no-code: + +- The one-off migration task. The stacks auto-run it via `local-exec` + during `terraform apply`, but that requires the `aws` / `gcloud` CLI + on the runner. HCP-hosted runners don't have them; use an HCP agent + pool with a custom image that includes the relevant CLI, or run the + command printed in the `migration_run_command` output by hand after + the first apply. diff --git a/terraform/litellm/aws/README.md b/terraform/litellm/aws/README.md index 8638ea800ec..7d4ef0a14fb 100644 --- a/terraform/litellm/aws/README.md +++ b/terraform/litellm/aws/README.md @@ -44,9 +44,12 @@ needs the `aws` CLI installed and authenticated. ### `proxy_config` (preferred) Mirrors the helm chart's `gateway.config.proxy_config`. The map is YAML-encoded -and base64-passed to gateway, backend, and the migration task; each container -decodes it to `/tmp/litellm-config.yaml` at startup and sets `CONFIG_FILE_PATH` -to match. +and uploaded to S3 (`config/litellm-config.yaml` in the stack's bucket); the +gateway and backend container entrypoints download it to +`/tmp/litellm-config.yaml` at task start via boto3 and set `CONFIG_FILE_PATH` +to match. The S3 object's etag is wired into the task definition, so editing +`proxy_config` produces a new task-def revision and a rolling redeploy of both +services. ```hcl proxy_config = { @@ -119,6 +122,42 @@ aws secretsmanager create-secret \ --secret-string "sk-proj-..." ``` +### Observability (OpenTelemetry v2) + +OTel v2 (https://docs.litellm.ai/docs/observability/opentelemetry_v2) is +opt-in and gated entirely on `otel_endpoint`. Empty (default) and nothing +OTel-related is added to the container env. Set it and both gateway and +backend gain `LITELLM_OTEL_V2=true` plus the `OTEL_*` block, with +`OTEL_SERVICE_NAME` stamped per component (`${tenant}-litellm-${env}-gateway` +and `-backend`) so spans land tagged with the right hop. Any `OTEL_*` key +set in `gateway_extra_env` / `backend_extra_env` overrides the default for +that service. + +```hcl +otel_endpoint = "http://otel-collector.internal:4318" +otel_exporter = "otlp_http" # otlp_grpc, console +otel_environment_name = "prod" # defaults to var.env +``` + +For collectors that require an auth header, store the comma-separated +`key=value` string in Secrets Manager and reference it via +`otel_headers_secret_arn`. The execution role auto-gains +`secretsmanager:GetSecretValue` on that ARN. + +```hcl +otel_headers_secret_arn = "arn:aws:secretsmanager:us-west-2:111122223333:secret:honeycomb-otel-headers-AbCdEf" +``` + +`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` defaults to +`no_content`; flip `otel_capture_message_content = "prompt_and_completion"` +only after auditing what lands in the backend, since prompts and +completions are typically sensitive. + +Vendor presets (Arize, Phoenix, Langfuse OTel, Weave, Langtrace, Levo, +AgentOps) live under `proxy_config.litellm_settings.callbacks` and are +orthogonal to the OTLP variables above; their credentials still go in +`*_extra_secrets`. + ## Tenant deployment Every resource the stack creates is named `${tenant}-litellm-${env}` (or @@ -132,10 +171,11 @@ pair differs: | `acme` | `prod` | `acme-litellm-prod-master-key` | | `globex` | `dev` | `globex-litellm-dev-license` | -For a per-tenant instance, the only inputs that change are the tenant -slug, env, and the two pre-issued secrets: +For a per-tenant instance via the example root, the only inputs that +change are the tenant slug, env, and the two pre-issued secrets: ```bash +cd terraform/litellm/aws/examples/default export TF_VAR_litellm_master_key="sk-..." # the tenant's master key export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE @@ -146,6 +186,22 @@ terraform apply \ -var "env=stage" ``` +To run *many* tenants from a single config, call the module with +`for_each` instead of one root per tenant (see "Using as a module"): + +```hcl +module "litellm" { + for_each = toset(["acme", "globex"]) + source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=" + tenant = each.key + env = "prod" + region = "us-west-2" + azs = ["us-west-2a", "us-west-2b"] +} +``` +(This `for_each` form is only possible because the module declares no +provider block — the original root-with-provider layout forbade it.) + Both `litellm_master_key` and `litellm_license` are optional: - Omit `litellm_master_key` → the stack auto-generates a random `sk-…` value (trial/dev path). @@ -159,14 +215,21 @@ example files. ## Quick start ```bash -cd terraform/litellm/aws +cd terraform/litellm/aws/examples/default cp terraform.tfvars.example terraform.tfvars -# Edit: region, tenant, env, azs, *_image, proxy_config, gateway_extra_secrets. +# Edit: region, tenant, env, azs, proxy_config, gateway_extra_secrets. terraform init terraform apply ``` +`examples/default/` is a thin root that configures the `aws` provider and +calls the module (`../../`). It exposes a curated variable surface; for +advanced knobs (per-component CPU/memory/workers, autoscaling, RDS/Redis +sizing, per-component image pins) set them on the `module "litellm"` block +in `examples/default/main.tf`, or call the module from your own config — +see "Using as a module" below. + That single apply provisions everything, runs the DB user bootstrap, runs the schema migration, and only then starts the gateway/backend services. When it returns, the stack is serving traffic. @@ -179,6 +242,34 @@ aws secretsmanager get-secret-value \ --query SecretString --output text ``` +## Using as a module + +The directory itself is a module with **no `provider` block** — the caller +owns provider config. That means you can call it directly with `for_each` +(many tenants from one config), `count` (conditional stacks), `depends_on`, +an assume-role / aliased provider, etc.: + +```hcl +provider "aws" { + region = "us-west-2" + assume_role { role_arn = "arn:aws:iam::111122223333:role/deployer" } +} + +module "litellm" { + source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=" + + region = "us-west-2" + tenant = "acme" + env = "prod" + azs = ["us-west-2a", "us-west-2b"] + # ...any of the inputs in variables.tf... +} +``` + +Tags: the module threads its own `litellm:stack` / `managed-by` / `var.tags` +onto every taggable resource. Any `default_tags` on your provider merge on +top — set org-wide tags there, per-deployment tags via the `tags` input. + ## Image pulls The defaults pull from `ghcr.io/berriai/litellm-:v1.86.0-dev`, @@ -238,8 +329,8 @@ losing the contents. | File | What's in it | | ----------------- | --------------------------------------------------------------------- | -| `versions.tf` | Terraform + provider version constraints | -| `providers.tf` | AWS provider (region + default tags) | +| `versions.tf` | Terraform + `required_providers` constraints (module declares no provider config) | +| `examples/default/` | Thin root: `aws` provider (with an optional `default_tags` slot for org-wide tags) + a call to the module. The one-command deploy path. | | `variables.tf` | All input variables | | `locals.tf` | Path-prefix lists for ALB routing (mirror of `helm/.../ingress.yaml`) | | `network.tf` | VPC, subnets, IGW, NAT, route tables, security groups | diff --git a/terraform/litellm/aws/alb.tf b/terraform/litellm/aws/alb.tf index de0d9c2310f..786b9d9a5b9 100644 --- a/terraform/litellm/aws/alb.tf +++ b/terraform/litellm/aws/alb.tf @@ -6,6 +6,8 @@ resource "aws_lb" "this" { subnets = aws_subnet.public[*].id idle_timeout = 120 + + tags = local.tags } locals { @@ -35,6 +37,8 @@ resource "aws_lb_target_group" "gateway" { } deregistration_delay = 30 + + tags = local.tags } resource "aws_lb_target_group" "backend" { @@ -54,6 +58,8 @@ resource "aws_lb_target_group" "backend" { } deregistration_delay = 30 + + tags = local.tags } resource "aws_lb_target_group" "ui" { @@ -73,6 +79,8 @@ resource "aws_lb_target_group" "ui" { } deregistration_delay = 30 + + tags = local.tags } # HTTP listener. When TLS is enabled this only serves a permanent @@ -106,6 +114,8 @@ resource "aws_lb_listener" "http" { error_message = "ALB has no HTTPS listener. Either set `acm_certificate_arn` to enable TLS, or set `allow_plaintext_alb = true` to opt into HTTP-only (trial / dev only)." } } + + tags = local.tags } # HTTPS listener. Only created when an ACM cert ARN is supplied — terminates @@ -122,6 +132,8 @@ resource "aws_lb_listener" "https" { type = "forward" target_group_arn = aws_lb_target_group.backend.arn } + + tags = local.tags } # UI exact paths (/, /favicon.ico, /ui) — priority 10. @@ -139,6 +151,8 @@ resource "aws_lb_listener_rule" "ui_exact" { values = local.ui_exact_paths } } + + tags = local.tags } # UI prefix paths (/_next/*, /litellm-asset-prefix/*, /assets/*, /ui/*) — priority 20. @@ -156,6 +170,8 @@ resource "aws_lb_listener_rule" "ui_prefix" { values = local.ui_path_prefixes } } + + tags = local.tags } # Gateway prefix rules — one per chunk-of-5 because ALB caps a path-pattern @@ -176,4 +192,6 @@ resource "aws_lb_listener_rule" "gateway" { values = each.value } } + + tags = local.tags } diff --git a/terraform/litellm/aws/bootstrap.tf b/terraform/litellm/aws/bootstrap.tf index e9a56dedbb5..b0bc38d44fb 100644 --- a/terraform/litellm/aws/bootstrap.tf +++ b/terraform/litellm/aws/bootstrap.tf @@ -32,6 +32,8 @@ resource "aws_iam_policy" "bootstrap_secrets" { Resource = [aws_secretsmanager_secret.db_master_password.arn] }] }) + + tags = local.tags } resource "aws_iam_role_policy_attachment" "task_execution_bootstrap_secrets" { @@ -43,6 +45,8 @@ resource "aws_iam_role_policy_attachment" "task_execution_bootstrap_secrets" { resource "aws_cloudwatch_log_group" "bootstrap_db" { name = "/ecs/${local.name}/bootstrap-db" retention_in_days = var.log_retention_days + + tags = local.tags } locals { @@ -101,6 +105,8 @@ resource "aws_ecs_task_definition" "bootstrap_db" { } } }]) + + tags = local.tags } # ---------- Bootstrap trigger ---------- diff --git a/terraform/litellm/aws/ecs.tf b/terraform/litellm/aws/ecs.tf index a6d2350c681..54ab80de9f4 100644 --- a/terraform/litellm/aws/ecs.tf +++ b/terraform/litellm/aws/ecs.tf @@ -5,26 +5,36 @@ resource "aws_ecs_cluster" "this" { name = "containerInsights" value = "enabled" } + + tags = local.tags } resource "aws_cloudwatch_log_group" "gateway" { name = "/ecs/${local.name}/gateway" retention_in_days = var.log_retention_days + + tags = local.tags } resource "aws_cloudwatch_log_group" "backend" { name = "/ecs/${local.name}/backend" retention_in_days = var.log_retention_days + + tags = local.tags } resource "aws_cloudwatch_log_group" "ui" { name = "/ecs/${local.name}/ui" retention_in_days = var.log_retention_days + + tags = local.tags } resource "aws_cloudwatch_log_group" "migrations" { name = "/ecs/${local.name}/migrations" retention_in_days = var.log_retention_days + + tags = local.tags } # Shared env block fed to gateway, backend, and the migration task. Mirrors @@ -34,6 +44,38 @@ resource "aws_cloudwatch_log_group" "migrations" { # HOST/PORT/USER/NAME plus an IAM-signed token, so no DB password is needed # in the task definition. locals { + # OTel v2 is opt-in and gated on otel_endpoint, matching the GCP stack. + # When set, LITELLM_OTEL_V2 flips on alongside the OTEL_* block, with + # OTEL_SERVICE_NAME stamped per component so spans land tagged with the + # right hop. Any OTEL_* key set in *_extra_env wins over the default for + # that service (ECS allows duplicates but last-wins is undefined, so we + # filter here for the same predictable behavior GCP gets from Cloud Run's + # hard duplicate-rejection). + otel_enabled = var.otel_endpoint != "" + otel_environment_name = var.otel_environment_name != "" ? var.otel_environment_name : var.env + otel_shared_env = local.otel_enabled ? [ + { name = "LITELLM_OTEL_V2", value = "true" }, + { name = "OTEL_EXPORTER", value = var.otel_exporter }, + { name = "OTEL_ENDPOINT", value = var.otel_endpoint }, + { name = "OTEL_ENVIRONMENT_NAME", value = local.otel_environment_name }, + { name = "OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT", value = var.otel_capture_message_content }, + ] : [] + gateway_otel_env_raw = concat(local.otel_shared_env, local.otel_enabled ? [ + { name = "OTEL_SERVICE_NAME", value = "${local.name}-gateway" }, + ] : []) + backend_otel_env_raw = concat(local.otel_shared_env, local.otel_enabled ? [ + { name = "OTEL_SERVICE_NAME", value = "${local.name}-backend" }, + ] : []) + gateway_otel_env = [ + for e in local.gateway_otel_env_raw : e if !contains(keys(var.gateway_extra_env), e.name) + ] + backend_otel_env = [ + for e in local.backend_otel_env_raw : e if !contains(keys(var.backend_extra_env), e.name) + ] + otel_secrets = local.otel_enabled && var.otel_headers_secret_arn != "" ? [ + { name = "OTEL_HEADERS", valueFrom = var.otel_headers_secret_arn }, + ] : [] + shared_env = [ { name = "IAM_TOKEN_DB_AUTH", value = "true" }, { name = "DATABASE_HOST", value = aws_rds_cluster.this.endpoint }, @@ -65,6 +107,7 @@ locals { var.litellm_license == "" ? [] : [ { name = "LITELLM_LICENSE", valueFrom = aws_secretsmanager_secret.license[0].arn }, ], + local.otel_secrets, ) # Backend-only managed secrets. UI_PASSWORD is consumed by the management @@ -91,20 +134,26 @@ locals { ] # Mirrors the helm chart's gateway.config.create / configmap pattern. - # ECS Fargate has no ConfigMap analogue, so we pass the YAML as a - # base64-encoded env var and decode it at container start via a tiny - # python shim that prepends the image's normal uvicorn entrypoint. + # ECS Fargate has no ConfigMap analogue, so the YAML is uploaded to S3 + # (see aws_s3_object.proxy_config in s3.tf) and the container entrypoint + # downloads it to /tmp/litellm-config.yaml via boto3 before exec'ing + # uvicorn. The S3 object's etag is embedded in the task definition so a + # config edit forces a new task-def revision and a rolling redeploy. proxy_config_enabled = length(keys(var.proxy_config)) > 0 - proxy_config_b64 = local.proxy_config_enabled ? base64encode(yamlencode(var.proxy_config)) : "" + proxy_config_path = "/tmp/litellm-config.yaml" proxy_config_env = local.proxy_config_enabled ? [ - { name = "LITELLM_PROXY_CONFIG_B64", value = local.proxy_config_b64 }, - { name = "CONFIG_FILE_PATH", value = "/tmp/litellm-config.yaml" }, + { name = "CONFIG_FILE_PATH", value = local.proxy_config_path }, + { name = "LITELLM_PROXY_CONFIG_S3_BUCKET", value = aws_s3_bucket.this.bucket }, + { name = "LITELLM_PROXY_CONFIG_S3_KEY", value = aws_s3_object.proxy_config[0].key }, + { name = "LITELLM_PROXY_CONFIG_S3_ETAG", value = aws_s3_object.proxy_config[0].etag }, ] : [] + proxy_config_fetch_cmd = "python -c \"import os, boto3; boto3.client('s3', region_name=os.environ['AWS_REGION']).download_file(os.environ['LITELLM_PROXY_CONFIG_S3_BUCKET'], os.environ['LITELLM_PROXY_CONFIG_S3_KEY'], os.environ['CONFIG_FILE_PATH'])\"" + # Gateway always needs --workers wired in (no NUM_WORKERS env var support # in the image entrypoint). When proxy_config is enabled we also have to - # decode the base64 config first, so the command goes through `sh -c`; + # pull the config from S3 first, so the command goes through `sh -c`; # otherwise we keep the image's ENTRYPOINT and only override `command`. gateway_uvicorn_args = "--host 0.0.0.0 --port 4000 --workers ${var.gateway_num_workers}" backend_uvicorn_args = "--host 0.0.0.0 --port 4001" @@ -112,7 +161,7 @@ locals { gateway_proxy_overrides = local.proxy_config_enabled ? { entryPoint = ["sh", "-c"] command = [ - "python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\" && exec uvicorn gateway.main:app ${local.gateway_uvicorn_args}" + "${local.proxy_config_fetch_cmd} && exec uvicorn gateway.main:app ${local.gateway_uvicorn_args}" ] } : { # Mirror the image's ENTRYPOINT so we can append --workers via command. @@ -123,7 +172,7 @@ locals { backend_proxy_overrides = local.proxy_config_enabled ? { entryPoint = ["sh", "-c"] command = [ - "python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\" && exec uvicorn backend.main:app ${local.backend_uvicorn_args}" + "${local.proxy_config_fetch_cmd} && exec uvicorn backend.main:app ${local.backend_uvicorn_args}" ] } : {} } @@ -148,6 +197,7 @@ resource "aws_ecs_task_definition" "gateway" { portMappings = [{ containerPort = 4000, protocol = "tcp" }] environment = concat( local.shared_env, + local.gateway_otel_env, local.gateway_extra_env_list, local.proxy_config_env, ) @@ -169,6 +219,8 @@ resource "aws_ecs_task_definition" "gateway" { local.gateway_proxy_overrides, ) ]) + + tags = local.tags } resource "aws_ecs_service" "gateway" { @@ -206,6 +258,8 @@ resource "aws_ecs_service" "gateway" { aws_lb_listener.https, terraform_data.migration, ] + + tags = local.tags } # ---------- Backend ---------- @@ -229,6 +283,7 @@ resource "aws_ecs_task_definition" "backend" { environment = concat( local.shared_env, local.backend_default_env, + local.backend_otel_env, local.backend_extra_env_list, local.proxy_config_env, ) @@ -246,6 +301,8 @@ resource "aws_ecs_task_definition" "backend" { local.backend_proxy_overrides, ) ]) + + tags = local.tags } resource "aws_ecs_service" "backend" { @@ -279,6 +336,8 @@ resource "aws_ecs_service" "backend" { aws_lb_listener.https, terraform_data.migration, ] + + tags = local.tags } # ---------- UI ---------- @@ -312,6 +371,8 @@ resource "aws_ecs_task_definition" "ui" { } } ]) + + tags = local.tags } resource "aws_ecs_service" "ui" { @@ -344,4 +405,6 @@ resource "aws_ecs_service" "ui" { aws_lb_listener.http, aws_lb_listener.https, ] + + tags = local.tags } diff --git a/terraform/litellm/aws/examples/default/.terraform.lock.hcl b/terraform/litellm/aws/examples/default/.terraform.lock.hcl new file mode 100644 index 00000000000..4a059b2b268 --- /dev/null +++ b/terraform/litellm/aws/examples/default/.terraform.lock.hcl @@ -0,0 +1,46 @@ +# This file is maintained automatically by "terraform init". +# Manual edits may be lost in future updates. + +provider "registry.terraform.io/hashicorp/aws" { + version = "5.100.0" + constraints = "~> 5.60" + hashes = [ + "h1:Ijt7pOlB7Tr7maGQIqtsLFbl7pSMIj06TVdkoSBcYOw=", + "zh:054b8dd49f0549c9a7cc27d159e45327b7b65cf404da5e5a20da154b90b8a644", + "zh:0b97bf8d5e03d15d83cc40b0530a1f84b459354939ba6f135a0086c20ebbe6b2", + "zh:1589a2266af699cbd5d80737a0fe02e54ec9cf2ca54e7e00ac51c7359056f274", + "zh:6330766f1d85f01ae6ea90d1b214b8b74cc8c1badc4696b165b36ddd4cc15f7b", + "zh:7c8c2e30d8e55291b86fcb64bdf6c25489d538688545eb48fd74ad622e5d3862", + "zh:99b1003bd9bd32ee323544da897148f46a527f622dc3971af63ea3e251596342", + "zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425", + "zh:9f8b909d3ec50ade83c8062290378b1ec553edef6a447c56dadc01a99f4eaa93", + "zh:aaef921ff9aabaf8b1869a86d692ebd24fbd4e12c21205034bb679b9caf883a2", + "zh:ac882313207aba00dd5a76dbd572a0ddc818bb9cbf5c9d61b28fe30efaec951e", + "zh:bb64e8aff37becab373a1a0cc1080990785304141af42ed6aa3dd4913b000421", + "zh:dfe495f6621df5540d9c92ad40b8067376350b005c637ea6efac5dc15028add4", + "zh:f0ddf0eaf052766cfe09dea8200a946519f653c384ab4336e2a4a64fdd6310e9", + "zh:f1b7e684f4c7ae1eed272b6de7d2049bb87a0275cb04dbb7cda6636f600699c9", + "zh:ff461571e3f233699bf690db319dfe46aec75e58726636a0d97dd9ac6e32fb70", + ] +} + +provider "registry.terraform.io/hashicorp/random" { + version = "3.9.0" + constraints = "~> 3.6" + hashes = [ + "h1:OO+IuvQJSPmWdN8AyyIEvPJbLvDQpgX/zbktoa9KsJE=", + "zh:161ad0bd9a75768c82f53fb6e7172a9d8be2d4889b012645a34795031aaf1bf1", + "zh:19dc9a5b17729725ccfc4f45b0500af0ee5bc6b6b160c7adb8f2bf617d2c80ea", + "zh:269eda8fe42daa7974d5a34d166c3ba9defe80cde86c01e4dadcfdf2e1f05e5f", + "zh:373f7c65566f8f2cc7f45d698654feb9d988996957e1266a69ca00c52d6d16d0", + "zh:5599d16804c41c83009ec621b6d6b6f74e102f5827678a4750f8809055546b61", + "zh:583be0440469a22bff70dcfa56593b01566860b29607437264adb51060cf46fc", + "zh:5f211d8ec3f2e1f414870d9584bfe26e6995560ef81c748f8447a48164767398", + "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3", + "zh:7b547fd16216761ef86efc3ed516ac5ac0c5c42b7c7eb24a08cef2d93f69ed5e", + "zh:7e7c0679daf2a382151d05068c8c3f0dae6b7b7dccf818827b73dd08638df2ef", + "zh:8089dec888a8038b9b4fb23b3df7e1057293dbc5b60b42cc47ff690d69d4b61b", + "zh:c51f15a031edfd6f23ce8ced3446ca7f8d8d647e2499890d7d5d10d5016d7257", + "zh:c94784f005708890dc6895afd53636ec00ec1e430b15d41e5aebfb1d4b39bd04", + ] +} diff --git a/terraform/litellm/aws/examples/default/main.tf b/terraform/litellm/aws/examples/default/main.tf new file mode 100644 index 00000000000..3d421099aed --- /dev/null +++ b/terraform/litellm/aws/examples/default/main.tf @@ -0,0 +1,41 @@ +# One-command deploy of the LiteLLM AWS stack. +# +# cd terraform/litellm/aws/examples/default +# cp terraform.tfvars.example terraform.tfvars # edit it +# terraform init +# terraform apply +# +# This root just wires the provider (see providers.tf) to the module. The +# module itself (../../) declares no provider, so it can also be consumed +# from your own config with count/for_each/aliased or assume-role providers: +# +# module "litellm" { +# source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=" +# ... +# } +# +# Knobs not surfaced as variables here (per-component sizing, autoscaling, +# RDS/Redis tuning) can be set directly on this block — see ../../variables.tf. +module "litellm" { + source = "../../" + + region = var.region + tenant = var.tenant + env = var.env + azs = var.azs + + litellm_master_key = var.litellm_master_key + litellm_license = var.litellm_license + ui_password = var.ui_password + + acm_certificate_arn = var.acm_certificate_arn + allow_plaintext_alb = var.allow_plaintext_alb + s3_force_destroy = var.s3_force_destroy + skip_final_snapshot = var.skip_final_snapshot + + proxy_config = var.proxy_config + gateway_extra_env = var.gateway_extra_env + backend_extra_env = var.backend_extra_env + gateway_extra_secrets = var.gateway_extra_secrets + backend_extra_secrets = var.backend_extra_secrets +} diff --git a/terraform/litellm/aws/examples/default/outputs.tf b/terraform/litellm/aws/examples/default/outputs.tf new file mode 100644 index 00000000000..235c069933c --- /dev/null +++ b/terraform/litellm/aws/examples/default/outputs.tf @@ -0,0 +1,54 @@ +output "alb_dns_name" { + description = "Public DNS name of the LiteLLM ALB." + value = module.litellm.alb_dns_name +} + +output "alb_url" { + description = "Proxy URL. Dashboard at /, API at /v1/*." + value = module.litellm.alb_url +} + +output "ecs_cluster" { + description = "ECS cluster name." + value = module.litellm.ecs_cluster +} + +output "aurora_writer_endpoint" { + description = "Aurora writer endpoint." + value = module.litellm.aurora_writer_endpoint +} + +output "aurora_reader_endpoint" { + description = "Aurora reader endpoint." + value = module.litellm.aurora_reader_endpoint +} + +output "redis_endpoint" { + description = "ElastiCache Redis primary endpoint (TLS)." + value = module.litellm.redis_endpoint +} + +output "s3_bucket" { + description = "S3 bucket name." + value = module.litellm.s3_bucket +} + +output "master_key_secret_arn" { + description = "Secrets Manager ARN holding LITELLM_MASTER_KEY." + value = module.litellm.master_key_secret_arn +} + +output "db_master_password_secret_arn" { + description = "Secrets Manager ARN holding the Aurora master credentials (bootstrap-only)." + value = module.litellm.db_master_password_secret_arn +} + +output "db_bootstrap_sql" { + description = "Run once as the master DB user to create the IAM-authed app user." + value = module.litellm.db_bootstrap_sql +} + +output "migration_run_command" { + description = "Break-glass command to re-run the one-off prisma migration task." + value = module.litellm.migration_run_command +} diff --git a/terraform/litellm/aws/examples/default/providers.tf b/terraform/litellm/aws/examples/default/providers.tf new file mode 100644 index 00000000000..92723a92769 --- /dev/null +++ b/terraform/litellm/aws/examples/default/providers.tf @@ -0,0 +1,24 @@ +# The provider is configured HERE, in the root, not in the module. That is +# the whole point of the split: a module that declares its own configured +# `provider` block can't be called with count/for_each/depends_on and gives +# the caller no way to set assume-role, custom endpoints, or aliases. +# +# `default_tags` set here still flow into every resource the module creates +# (provider default_tags propagate through module calls) and merge with the +# module's own `litellm:stack` / `managed-by` / var.tags. Use this block for +# org-wide tags; use the module's `tags` input for per-deployment tags. +provider "aws" { + region = var.region + + # Reserve `default_tags` for pure org-wide tags the module shouldn't know + # about (cost center, team, compliance scope, …). They propagate through the + # module call and merge with the module's own `litellm:stack` / `managed-by` + # / var.tags. The module already stamps `managed-by = "terraform"`, so don't + # duplicate it here — set per-deployment tags via the module's `tags` input. + # + # default_tags { + # tags = { + # "cost-center" = "platform" + # } + # } +} diff --git a/terraform/litellm/aws/terraform.tfvars.example b/terraform/litellm/aws/examples/default/terraform.tfvars.example similarity index 70% rename from terraform/litellm/aws/terraform.tfvars.example rename to terraform/litellm/aws/examples/default/terraform.tfvars.example index 2be573949ef..4fdfb47e678 100644 --- a/terraform/litellm/aws/terraform.tfvars.example +++ b/terraform/litellm/aws/examples/default/terraform.tfvars.example @@ -23,22 +23,17 @@ env = "stage" # allow_plaintext_alb = true # Storage retention: false (default) makes `terraform destroy` refuse on a -# non-empty bucket. Flip to true only for ephemeral / CI stacks. -# s3_force_destroy = false +# non-empty bucket / take an Aurora final snapshot. Flip to true only for +# ephemeral / CI stacks where you accept losing the data. +# s3_force_destroy = false +# skip_final_snapshot = false -# Component images. Defaults pin all four to the same GHCR release tag — -# bump them together when bumping LiteLLM. Override here to pull from a -# private registry or to mix-and-match versions. -# gateway_image = "ghcr.io/berriai/litellm-gateway:1.86.0-dev" -# backend_image = "ghcr.io/berriai/litellm-backend:1.86.0-dev" -# ui_image = "ghcr.io/berriai/litellm-ui:1.86.0-dev" -# migrations_image = "ghcr.io/berriai/litellm-migrations:1.86.0-dev" - -# Per-task sizing for the gateway. Defaults are 1 vCPU / 4 GiB / 1 worker. -# uvicorn rule of thumb for CPU-bound work is (2 * vCPU) + 1 workers. -# gateway_cpu = 1024 # 1024 = 1 vCPU -# gateway_memory = 4096 # MiB -# gateway_num_workers = 1 +# Component images and per-task sizing/autoscaling are NOT exposed as +# variables in this example (it keeps the curated surface small). They +# default to working public GHCR images. To pin images or tune +# CPU/memory/workers/autoscaling, set those inputs directly on the +# `module "litellm"` block in main.tf — the full list is in +# ../../variables.tf — or call the module from your own root config. # ---------- proxy_config (mirrors helm gateway.config.proxy_config) ---------- # proxy_config = { @@ -86,3 +81,13 @@ env = "stage" # OPENAI_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:openai-api-key-AbCdEf" # ANTHROPIC_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:anthropic-api-key-GhIjKl" # } + +# ---------- OpenTelemetry v2 ---------- +# OTel is gated on otel_endpoint: empty (default) and nothing is added to +# the container env; set it and both gateway and backend gain +# LITELLM_OTEL_V2=true plus the OTEL_* block (with OTEL_SERVICE_NAME +# stamped per component). The knobs aren't surfaced as wrapper vars in +# this example; set them directly on the `module "litellm"` block in +# main.tf (otel_endpoint, otel_exporter, otel_environment_name, +# otel_capture_message_content, otel_headers_secret_arn). Full docs in +# ../../variables.tf. diff --git a/terraform/litellm/aws/examples/default/variables.tf b/terraform/litellm/aws/examples/default/variables.tf new file mode 100644 index 00000000000..74522118a93 --- /dev/null +++ b/terraform/litellm/aws/examples/default/variables.tf @@ -0,0 +1,104 @@ +# Curated surface for the one-command deploy path. The module (../../) +# exposes far more knobs (per-component CPU/memory, autoscaling, RDS/Redis +# sizing, …). To tune those, set them directly on the `module "litellm"` +# block in main.tf, or call the module from your own root config. Full +# per-variable docs live in ../../variables.tf — the module is the source +# of truth; descriptions here are intentionally terse. + +variable "region" { + description = "AWS region to deploy into." + type = string +} + +variable "tenant" { + description = "Tenant slug — prefix for every resource (-litellm-)." + type = string +} + +variable "env" { + description = "Environment suffix (stage, prod, dev)." + type = string +} + +variable "azs" { + description = "Availability zones for subnets. At least 2 (RDS + ALB)." + type = list(string) +} + +# Sensitive — prefer TF_VAR_litellm_master_key / TF_VAR_litellm_license / +# TF_VAR_ui_password so values stay out of any committed tfvars file. +variable "litellm_master_key" { + description = "Pre-existing LITELLM_MASTER_KEY (sk-…). Empty → auto-generated." + type = string + default = "" + sensitive = true +} + +variable "litellm_license" { + description = "LiteLLM enterprise license. Empty → OSS-only." + type = string + default = "" + sensitive = true +} + +variable "ui_password" { + description = "UI admin password. Empty → falls back to LITELLM_MASTER_KEY." + type = string + default = "" + sensitive = true +} + +# TLS — provide an ACM cert for production, or opt into HTTP-only for dev. +variable "acm_certificate_arn" { + description = "ACM cert ARN for the ALB HTTPS listener. Empty → no TLS." + type = string + default = "" +} + +variable "allow_plaintext_alb" { + description = "Opt into HTTP-only ALB (trial/dev only)." + type = bool + default = false +} + +variable "s3_force_destroy" { + description = "Allow destroy of a non-empty S3 bucket (ephemeral/CI only)." + type = bool + default = false +} + +variable "skip_final_snapshot" { + description = "Skip the Aurora final snapshot on destroy (ephemeral/CI only)." + type = bool + default = false +} + +variable "proxy_config" { + description = "LiteLLM proxy config (contents of config.yaml). Empty → defaults." + type = any + default = {} +} + +variable "gateway_extra_env" { + description = "Plain-text env vars layered onto the gateway." + type = map(string) + default = {} +} + +variable "backend_extra_env" { + description = "Plain-text env vars layered onto the backend." + type = map(string) + default = {} +} + +variable "gateway_extra_secrets" { + description = "Gateway env vars sourced from Secrets Manager (name → ARN)." + type = map(string) + default = {} +} + +variable "backend_extra_secrets" { + description = "Backend env vars sourced from Secrets Manager (name → ARN)." + type = map(string) + default = {} +} diff --git a/terraform/litellm/aws/examples/default/versions.tf b/terraform/litellm/aws/examples/default/versions.tf new file mode 100644 index 00000000000..73b88e91dce --- /dev/null +++ b/terraform/litellm/aws/examples/default/versions.tf @@ -0,0 +1,14 @@ +terraform { + required_version = ">= 1.6.0" + + required_providers { + aws = { + source = "hashicorp/aws" + version = "~> 5.60" + } + random = { + source = "hashicorp/random" + version = "~> 3.6" + } + } +} diff --git a/terraform/litellm/aws/iam.tf b/terraform/litellm/aws/iam.tf index 504e0fe1d63..64e1b1ad5f9 100644 --- a/terraform/litellm/aws/iam.tf +++ b/terraform/litellm/aws/iam.tf @@ -13,6 +13,8 @@ data "aws_iam_policy_document" "task_assume" { resource "aws_iam_role" "task_execution" { name = "${local.name}-task-execution" assume_role_policy = data.aws_iam_policy_document.task_assume.json + + tags = local.tags } resource "aws_iam_role_policy_attachment" "task_execution" { @@ -52,6 +54,7 @@ data "aws_iam_policy_document" "secrets_access" { aws_secretsmanager_secret.license[*].arn, aws_secretsmanager_secret.ui_password[*].arn, local.extra_secret_arns, + var.otel_headers_secret_arn == "" ? [] : [var.otel_headers_secret_arn], ) } } @@ -59,6 +62,8 @@ data "aws_iam_policy_document" "secrets_access" { resource "aws_iam_policy" "secrets_access" { name = "${local.name}-secrets-access" policy = data.aws_iam_policy_document.secrets_access.json + + tags = local.tags } resource "aws_iam_role_policy_attachment" "task_execution_secrets" { @@ -75,6 +80,8 @@ resource "aws_iam_role_policy_attachment" "task_execution_secrets" { resource "aws_iam_role" "task" { name = "${local.name}-task" assume_role_policy = data.aws_iam_policy_document.task_assume.json + + tags = local.tags } data "aws_caller_identity" "current" {} @@ -91,6 +98,8 @@ data "aws_iam_policy_document" "rds_iam_connect" { resource "aws_iam_policy" "rds_iam_connect" { name = "${local.name}-rds-iam-connect" policy = data.aws_iam_policy_document.rds_iam_connect.json + + tags = local.tags } resource "aws_iam_role_policy_attachment" "task_rds_iam_connect" { @@ -111,4 +120,6 @@ resource "aws_iam_role_policy_attachment" "task_rds_iam_connect" { resource "aws_iam_role" "ui_task" { name = "${local.name}-ui-task" assume_role_policy = data.aws_iam_policy_document.task_assume.json + + tags = local.tags } diff --git a/terraform/litellm/aws/locals.tf b/terraform/litellm/aws/locals.tf index 85c3b6eaaad..b5e28272d04 100644 --- a/terraform/litellm/aws/locals.tf +++ b/terraform/litellm/aws/locals.tf @@ -11,6 +11,20 @@ locals { # the stack can reference local.name. name = "${var.tenant}-litellm-${var.env}" + # This is a reusable module — it declares no `provider` block, so the AWS + # provider's `default_tags` is the caller's concern, not ours. To keep the + # same per-resource tagging the stack had when it owned the provider, the + # module threads `local.tags` onto every taggable resource itself. Callers + # may layer org-wide tags on top via their own provider `default_tags` + # (those merge with these). `var.tags` is the per-deployment override. + tags = merge( + { + "litellm:stack" = local.name + "managed-by" = "terraform" + }, + var.tags, + ) + gateway_path_prefixes = [ "/v1/chat/*", "/chat/*", "/v1/completions*", "/completions*", diff --git a/terraform/litellm/aws/migrations.tf b/terraform/litellm/aws/migrations.tf index fc4e2ce0cab..62880ebf165 100644 --- a/terraform/litellm/aws/migrations.tf +++ b/terraform/litellm/aws/migrations.tf @@ -42,4 +42,6 @@ resource "aws_ecs_task_definition" "migrations" { } } }]) + + tags = local.tags } diff --git a/terraform/litellm/aws/network.tf b/terraform/litellm/aws/network.tf index d5ed49c1b8a..2f104da6a6b 100644 --- a/terraform/litellm/aws/network.tf +++ b/terraform/litellm/aws/network.tf @@ -7,12 +7,12 @@ resource "aws_vpc" "this" { enable_dns_hostnames = true enable_dns_support = true - tags = { Name = local.name } + tags = merge(local.tags, { Name = local.name }) } resource "aws_internet_gateway" "this" { vpc_id = aws_vpc.this.id - tags = { Name = local.name } + tags = merge(local.tags, { Name = local.name }) } # Public subnets (ALB + NAT). One per AZ. @@ -23,7 +23,7 @@ resource "aws_subnet" "public" { availability_zone = var.azs[count.index] map_public_ip_on_launch = true - tags = { Name = "${local.name}-public-${var.azs[count.index]}" } + tags = merge(local.tags, { Name = "${local.name}-public-${var.azs[count.index]}" }) } # Private subnets (ECS tasks, RDS, ElastiCache). One per AZ, separate from @@ -34,12 +34,12 @@ resource "aws_subnet" "private" { cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10) availability_zone = var.azs[count.index] - tags = { Name = "${local.name}-private-${var.azs[count.index]}" } + tags = merge(local.tags, { Name = "${local.name}-private-${var.azs[count.index]}" }) } resource "aws_eip" "nat" { domain = "vpc" - tags = { Name = "${local.name}-nat" } + tags = merge(local.tags, { Name = "${local.name}-nat" }) depends_on = [aws_internet_gateway.this] } @@ -50,7 +50,7 @@ resource "aws_nat_gateway" "this" { allocation_id = aws_eip.nat.id subnet_id = aws_subnet.public[0].id - tags = { Name = local.name } + tags = merge(local.tags, { Name = local.name }) depends_on = [aws_internet_gateway.this] } @@ -63,7 +63,7 @@ resource "aws_route_table" "public" { gateway_id = aws_internet_gateway.this.id } - tags = { Name = "${local.name}-public" } + tags = merge(local.tags, { Name = "${local.name}-public" }) } resource "aws_route_table_association" "public" { @@ -80,7 +80,7 @@ resource "aws_route_table" "private" { nat_gateway_id = aws_nat_gateway.this.id } - tags = { Name = "${local.name}-private" } + tags = merge(local.tags, { Name = "${local.name}-private" }) } resource "aws_route_table_association" "private" { @@ -119,6 +119,8 @@ resource "aws_security_group" "alb" { protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } + + tags = local.tags } resource "aws_security_group" "tasks" { @@ -141,6 +143,8 @@ resource "aws_security_group" "tasks" { protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } + + tags = local.tags } resource "aws_security_group" "rds" { @@ -155,6 +159,8 @@ resource "aws_security_group" "rds" { protocol = "tcp" security_groups = [aws_security_group.tasks.id] } + + tags = local.tags } resource "aws_security_group" "redis" { @@ -169,4 +175,6 @@ resource "aws_security_group" "redis" { protocol = "tcp" security_groups = [aws_security_group.tasks.id] } + + tags = local.tags } diff --git a/terraform/litellm/aws/providers.tf b/terraform/litellm/aws/providers.tf deleted file mode 100644 index 5e7d506c23f..00000000000 --- a/terraform/litellm/aws/providers.tf +++ /dev/null @@ -1,13 +0,0 @@ -provider "aws" { - region = var.region - - default_tags { - tags = merge( - { - "litellm:stack" = local.name - "managed-by" = "terraform" - }, - var.tags, - ) - } -} diff --git a/terraform/litellm/aws/rds.tf b/terraform/litellm/aws/rds.tf index 8e3b70a8d62..d9b7351a805 100644 --- a/terraform/litellm/aws/rds.tf +++ b/terraform/litellm/aws/rds.tf @@ -19,12 +19,16 @@ resource "aws_db_subnet_group" "this" { name = "${local.name}-db" subnet_ids = aws_subnet.private[*].id + + tags = local.tags } resource "aws_rds_cluster_parameter_group" "this" { name = "${local.name}-cluster-pg" family = "aurora-postgresql${split(".", var.db_engine_version)[0]}" description = "LiteLLM Aurora Postgres cluster parameters." + + tags = local.tags } resource "aws_rds_cluster" "this" { @@ -52,6 +56,8 @@ resource "aws_rds_cluster" "this" { backup_retention_period = 7 preferred_backup_window = "07:00-09:00" + + tags = local.tags } resource "aws_rds_cluster_instance" "writer" { @@ -67,6 +73,8 @@ resource "aws_rds_cluster_instance" "writer" { # Promotion tier 0 — first in line during failover, so this instance stays # the writer unless it goes unhealthy. promotion_tier = 0 + + tags = local.tags } resource "aws_rds_cluster_instance" "reader" { @@ -82,4 +90,6 @@ resource "aws_rds_cluster_instance" "reader" { # Higher promotion tier — won't be picked as writer during a failover # unless the writer instance itself is gone. promotion_tier = 15 + + tags = local.tags } diff --git a/terraform/litellm/aws/redis.tf b/terraform/litellm/aws/redis.tf index 2a6fab2d89f..071cbc6d46f 100644 --- a/terraform/litellm/aws/redis.tf +++ b/terraform/litellm/aws/redis.tf @@ -1,6 +1,8 @@ resource "aws_elasticache_subnet_group" "this" { name = "${local.name}-redis" subnet_ids = aws_subnet.private[*].id + + tags = local.tags } # Replication group (not aws_elasticache_cluster, which is the @@ -30,4 +32,6 @@ resource "aws_elasticache_replication_group" "this" { transit_encryption_enabled = true apply_immediately = true + + tags = local.tags } diff --git a/terraform/litellm/aws/s3.tf b/terraform/litellm/aws/s3.tf index 375bc73bb71..a666a790c0c 100644 --- a/terraform/litellm/aws/s3.tf +++ b/terraform/litellm/aws/s3.tf @@ -18,6 +18,8 @@ resource "aws_s3_bucket" "this" { # cached responses, archived request logs, and /v1/files storage stay put. # Flip to true only for ephemeral / CI stacks (`var.s3_force_destroy`). force_destroy = var.s3_force_destroy + + tags = local.tags } resource "aws_s3_bucket_versioning" "this" { @@ -72,9 +74,30 @@ data "aws_iam_policy_document" "s3_access" { resource "aws_iam_policy" "s3_access" { name = "${local.name}-s3-access" policy = data.aws_iam_policy_document.s3_access.json + + tags = local.tags } resource "aws_iam_role_policy_attachment" "task_s3_access" { role = aws_iam_role.task.name policy_arn = aws_iam_policy.s3_access.arn } + +# proxy_config is uploaded as an S3 object so the gateway and backend +# containers can fetch it at startup instead of carrying the YAML inline +# as a base64 env var. ECS Fargate has no native S3 volume type, so +# "mount" here is: container entrypoint runs a boto3 download_file into +# /tmp/litellm-config.yaml before exec'ing uvicorn. The task role already +# has s3:GetObject on this bucket via aws_iam_policy.s3_access. +# +# etag flows into the task definition (see locals.proxy_config_env in +# ecs.tf) so a config edit produces a new task-def revision and ECS rolls +# both services automatically. +resource "aws_s3_object" "proxy_config" { + count = length(keys(var.proxy_config)) > 0 ? 1 : 0 + + bucket = aws_s3_bucket.this.id + key = "config/litellm-config.yaml" + content = yamlencode(var.proxy_config) + content_type = "application/yaml" +} diff --git a/terraform/litellm/aws/secrets.tf b/terraform/litellm/aws/secrets.tf index dd13fdc1239..300d38e4053 100644 --- a/terraform/litellm/aws/secrets.tf +++ b/terraform/litellm/aws/secrets.tf @@ -22,6 +22,8 @@ resource "aws_secretsmanager_secret" "master_key" { name = "${local.name}-master-key" description = "LITELLM_MASTER_KEY for gateway + backend." recovery_window_in_days = 0 + + tags = local.tags } resource "aws_secretsmanager_secret_version" "master_key" { @@ -40,6 +42,8 @@ resource "aws_secretsmanager_secret" "license" { name = "${local.name}-license" description = "LITELLM_LICENSE for gateway + backend." recovery_window_in_days = 0 + + tags = local.tags } resource "aws_secretsmanager_secret_version" "license" { @@ -59,6 +63,8 @@ resource "aws_secretsmanager_secret" "ui_password" { name = "${local.name}-ui-password" description = "UI_PASSWORD for the backend (UI admin login)." recovery_window_in_days = 0 + + tags = local.tags } resource "aws_secretsmanager_secret_version" "ui_password" { @@ -72,6 +78,8 @@ resource "aws_secretsmanager_secret" "db_master_password" { name = "${local.name}-db-master-password" description = "Aurora master-user password - bootstrap only. Runtime auth is IAM-token." recovery_window_in_days = 0 + + tags = local.tags } resource "aws_secretsmanager_secret_version" "db_master_password" { diff --git a/terraform/litellm/aws/variables.tf b/terraform/litellm/aws/variables.tf index 946cd7ebbf3..8db4935664b 100644 --- a/terraform/litellm/aws/variables.tf +++ b/terraform/litellm/aws/variables.tf @@ -24,7 +24,7 @@ variable "env" { } variable "tags" { - description = "Additional tags merged into the provider default_tags." + description = "Per-deployment tags applied to every taggable resource the module creates, on top of the module's own `litellm:stack` / `managed-by` tags. Caller-level provider `default_tags` (if any) merge with these." type = map(string) default = {} } @@ -420,10 +420,12 @@ variable "backend_extra_secrets" { variable "proxy_config" { description = <<-EOT LiteLLM proxy config (the contents of config.yaml). Mirrors the helm - chart's `gateway.config.proxy_config` value. Passed to gateway, backend, - and the migration task as a base64-encoded env var and decoded to - /tmp/litellm-config.yaml at container start; CONFIG_FILE_PATH is set - automatically. + chart's `gateway.config.proxy_config` value. Uploaded to S3 under + `config/litellm-config.yaml` in the stack's bucket; gateway and backend + container entrypoints download it to /tmp/litellm-config.yaml at task + start (CONFIG_FILE_PATH is set automatically). The S3 object's etag is + wired into the task definition, so editing this value produces a new + task-def revision and a rolling redeploy. Example: proxy_config = { @@ -456,3 +458,78 @@ variable "log_retention_days" { type = number default = 30 } + +# ---------- OpenTelemetry v2 ---------- +# +# https://docs.litellm.ai/docs/observability/opentelemetry_v2 +# +# OTel v2 is opt-in and gated entirely on otel_endpoint, matching the GCP +# stack. Leave otel_endpoint = "" and nothing OTel-related lands in the +# container env. Set it and the gateway and backend gain LITELLM_OTEL_V2=true +# plus the OTEL_* block (per-component OTEL_SERVICE_NAME, exporter, endpoint, +# environment name, capture-content), with OTEL_HEADERS sourced from +# otel_headers_secret_arn when provided. + +variable "otel_endpoint" { + description = <<-EOT + OTLP collector endpoint (sets OTEL_ENDPOINT). Empty disables OTel + entirely (no LITELLM_OTEL_V2, no OTEL_* env). Point at any + OTLP-compatible backend (self-hosted collector, Grafana Tempo, + Honeycomb, Datadog). Example: "http://otel-collector.internal:4318" + for OTLP/HTTP. + EOT + type = string + default = "" +} + +variable "otel_exporter" { + description = <<-EOT + OTLP exporter protocol. One of "otlp_http", "otlp_grpc", or "console" + (stdout, useful for verifying instrumentation against CloudWatch logs). + Ignored when otel_endpoint is empty. + EOT + type = string + default = "otlp_http" + + validation { + condition = contains(["otlp_http", "otlp_grpc", "console"], var.otel_exporter) + error_message = "otel_exporter must be one of: otlp_http, otlp_grpc, console." + } +} + +variable "otel_environment_name" { + description = <<-EOT + Value for OTEL_ENVIRONMENT_NAME (becomes `deployment.environment` on + every span). Defaults to var.env when empty so spans land tagged with + the deployment env without extra wiring. + EOT + type = string + default = "" +} + +variable "otel_capture_message_content" { + description = <<-EOT + Value for OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT. Default + `no_content` matches the litellm default; flip to `prompt_and_completion` + only when you've audited what's about to land in your observability + backend, because raw prompts/completions are typically sensitive. + EOT + type = string + default = "no_content" + + validation { + condition = contains(["no_content", "prompt_and_completion"], var.otel_capture_message_content) + error_message = "otel_capture_message_content must be one of: no_content, prompt_and_completion." + } +} + +variable "otel_headers_secret_arn" { + description = <<-EOT + Secrets Manager ARN whose plaintext value becomes OTEL_HEADERS + (comma-separated `key=value` pairs, typically used to pass an API key + header to a managed collector). The execution role auto-gains + secretsmanager:GetSecretValue on this ARN. Empty omits OTEL_HEADERS. + EOT + type = string + default = "" +} diff --git a/terraform/litellm/gcp/README.md b/terraform/litellm/gcp/README.md index 504cfa066e4..1e0bf4319df 100644 --- a/terraform/litellm/gcp/README.md +++ b/terraform/litellm/gcp/README.md @@ -1,5 +1,9 @@ # LiteLLM on GCP (Cloud Run) +[![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.svg)](https://ssh.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https%3A%2F%2Fgithub.com%2FBerriAI%2Flitellm&cloudshell_workspace=terraform%2Flitellm%2Fgcp%2Fexamples%2Fdefault&cloudshell_tutorial=TUTORIAL.md&cloudshell_image=gcr.io/ds-artifacts-cloudshell/deploystack_custom_image&shellonly=true) + +The button above opens the [DeployStack](https://github.com/GoogleCloudPlatform/deploystack) installer in Cloud Shell, walks you through `TUTORIAL.md`, and runs `terraform apply` once you've answered the prompts. The rest of this README is the manual / advanced path. + Deploys the componentized LiteLLM proxy on GCP: - **VPC** + Private Services Access range + a Serverless VPC Access connector @@ -25,10 +29,14 @@ and `litellm-migrations` (slim image used only by the one-off Cloud Run Job — runs `prisma migrate deploy` against the writer DB and exits). Bump them together when bumping LiteLLM. -Cloud Run only accepts images from Artifact Registry, `[region.]gcr.io`, -or `docker.io` — `ghcr.io` URIs are rejected at apply time. The four -images are published to GHCR upstream, so any real deploy needs an -Artifact Registry remote repository pointed at GHCR. +**Required override.** The `image_registry` default (`ghcr.io/berriai`) +does **not** work as-is — Cloud Run only accepts images from Artifact +Registry, `[region.]gcr.io`, or `docker.io`, and rejects `ghcr.io` URIs +at apply time. Every deploy (including HCP Terraform 1-click) must +supply either `image_registry` pointed at an Artifact Registry remote +repo backed by GHCR, or full per-component `*_image` URIs against +images you've already mirrored. The default is present only so +`terraform plan` succeeds during local iteration. **One-time setup (per project):** create a remote repo and let Cloud Run pull through it. @@ -102,9 +110,13 @@ Unix socket. ### `proxy_config` Mirrors the helm chart's `gateway.config.proxy_config`. The map is -YAML-encoded and base64-passed to gateway, backend, and the migration job; -each container decodes it to `/tmp/litellm-config.yaml` at startup and sets -`CONFIG_FILE_PATH`. +YAML-encoded and uploaded to a dedicated GCS bucket as `config.yaml`, then +mounted read-only into the gateway and backend at `/etc/litellm` via Cloud +Run v2's gcsfuse volume. `CONFIG_FILE_PATH` points at the mount path. A +hash of the YAML rides along as an env var so an edit to `proxy_config` +forces a new Cloud Run revision; without it the new file would sit in the +bucket unread until the next unrelated revision rollover. The migrations +job doesn't get the config (it only runs `prisma migrate deploy`). ```hcl proxy_config = { @@ -160,6 +172,38 @@ reject the version suffix; version is always resolved as `latest`. If you need a pinned version, edit `local.gateway_extra_secret_kv` in `cloudrun.tf` directly to set `version = "3"` for the entry in question. +### OpenTelemetry v2 + +OTel v2 (https://docs.litellm.ai/docs/observability/opentelemetry_v2) is +opt-in and gated entirely on `otel_endpoint`. Empty (default) and nothing +OTel-related lands in the container env. Set it and both gateway and +backend gain `LITELLM_OTEL_V2=true` plus the `OTEL_*` block, with +`OTEL_SERVICE_NAME` stamped per component (`${tenant}-litellm-${env}-gateway` +and `-backend`) so spans land tagged with the right hop. Any `OTEL_*` key +set in `gateway_extra_env` / `backend_extra_env` overrides the default for +that service (Cloud Run rejects duplicate env names, so the override is +predictable). + +```hcl +otel_endpoint = "https://otel.example.com:4318" +otel_exporter = "otlp_http" # or otlp_grpc +otel_environment_name = "prod" # default: var.env +otel_headers_secret = "projects/my-gcp-project/secrets/otel-headers" +``` + +`OTEL_HEADERS` is wired as a Secret Manager `secret_key_ref` since it +typically carries the collector's auth token; create the secret with the +literal header string, e.g. `Authorization=Bearer `. + +`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` defaults to +`no_content`; flip `otel_capture_message_content = "prompt_and_completion"` +only after auditing what lands in the backend, since prompts and +completions are typically sensitive. + +Behavior matches the AWS stack 1:1; the only naming differences are +`otel_headers_secret` (a Secret Manager resource ID) vs AWS's +`otel_headers_secret_arn` (a Secrets Manager ARN). + ## Tenant deployment Every resource the stack creates is named `${tenant}-litellm-${env}` (or @@ -173,20 +217,25 @@ pair differs: | `acme` | `prod` | `acme-litellm-prod-master-key` | | `globex` | `dev` | `globex-litellm-dev-license` | -For a per-tenant instance, the only inputs that change are the tenant -slug, env, and the two pre-issued secrets: +For a per-tenant instance via the example root, the only inputs that +change are the tenant slug, env, and the two pre-issued secrets: ```bash +cd terraform/litellm/gcp/examples/default export TF_VAR_litellm_master_key="sk-..." # the tenant's master key export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE terraform apply \ - -var "project=my-gcp-project" \ + -var "project_id=my-gcp-project" \ -var "region=us-central1" \ -var "tenant=acme" \ -var "env=stage" ``` +To run *many* tenants from a single config, call the module with +`for_each` instead of one root per tenant — only possible because the +module declares no provider block (see "Using as a module"). + Both `litellm_master_key` and `litellm_license` are optional: - Omit `litellm_master_key` → the stack auto-generates a random `sk-…` value (trial/dev path). @@ -200,14 +249,22 @@ example files. ## Quick start ```bash -cd terraform/litellm/gcp +cd terraform/litellm/gcp/examples/default cp terraform.tfvars.example terraform.tfvars -# Edit: project, region, tenant, env, *_image, proxy_config, gateway_extra_secrets. +# Edit: project, region, tenant, env, image_registry, proxy_config, gateway_extra_secrets. terraform init terraform apply ``` +`examples/default/` is a thin root that configures the `google` / +`google-beta` providers and calls the module (`../../`). It exposes a +curated variable surface; for advanced knobs (per-component +CPU/memory/instances, Cloud SQL tier/edition, Memorystore tier, +per-component image pins) set them on the `module "litellm"` block in +`examples/default/main.tf`, or call the module from your own config — see +"Using as a module" below. + That single apply provisions everything, runs the prisma schema migration via the Cloud Run job (auto-triggered by `bootstrap.tf`), and only then starts the gateway/backend services. When it returns, the stack is serving traffic. @@ -251,6 +308,56 @@ Set `allow_plaintext_lb = true` and leave `lb_domains = []`. Without the flag, plan fails with a clear error pointing at the precondition. Intended for short-lived trial / dev stacks only. +## Using as a module + +The directory itself is a module with **no `provider` block** — the caller +owns provider config. You can call it directly with `for_each` (many +tenants from one config), `count`, `depends_on`, or providers configured +to impersonate a service account / target a different project: + +```hcl +provider "google" { + project = "my-gcp-project" + region = "us-central1" +} +provider "google-beta" { + project = "my-gcp-project" + region = "us-central1" +} + +module "litellm" { + source = "github.com/BerriAI/litellm//terraform/litellm/gcp?ref=" + + project = "my-gcp-project" + region = "us-central1" + tenant = "acme" + env = "prod" + # ...any of the inputs in variables.tf... +} +``` + +Both the default `google` and `google-beta` configs are inherited by the +module automatically through the call; declare both in the caller. + +Labels: the module stamps its own `litellm-stack` and `managed-by` labels +onto every label-supporting resource (Cloud Run services and the +migrations job, Cloud SQL writer and reader, Memorystore, Secret Manager +entries, GCS buckets, the LB global address and forwarding rules) and +merges `var.labels` on top. Use the `labels` input for per-deployment +labels; mirrors the AWS stack's `tags` input. + +**`for_each` shares one provider config.** The module's `versions.tf` declares +`google` / `google-beta` *without* `configuration_aliases`, so it only ever +receives the caller's single default (unaliased) `google` / `google-beta` +providers. That's deliberate — it keeps the one-command path simple — but it +means a `for_each` over the module runs every instance against the **same +project, region, and credentials**. Use `for_each` for many tenants in one +project (distinct `tenant`/`env`); it cannot fan out across projects or regions +on its own. To deploy into separate projects/regions, give each its own root +with its own provider config (one `examples/default`-style root per project), +or fork the module to add `configuration_aliases` and pass per-instance +`providers = { ... }`. + ## Storage and database retention Two opt-in tripwires guard against accidental data loss on @@ -281,8 +388,8 @@ or point them at your own CA. | File | What's in it | | ----------------- | -------------------------------------------------------------------- | -| `versions.tf` | Terraform + provider version constraints | -| `providers.tf` | Google + Google-Beta providers | +| `versions.tf` | Terraform + `required_providers` constraints (module declares no provider config) | +| `examples/default/` | Thin root: `google` / `google-beta` providers + a call to the module. The one-command deploy path. | | `variables.tf` | All input variables | | `locals.tf` | Path-prefix lists (mirror of `helm/.../ingress.yaml`) + proxy_config helpers | | `network.tf` | VPC, subnet, PSA range, Serverless VPC connector | diff --git a/terraform/litellm/gcp/bootstrap.tf b/terraform/litellm/gcp/bootstrap.tf index 47ad885ff12..b929c4d76f3 100644 --- a/terraform/litellm/gcp/bootstrap.tf +++ b/terraform/litellm/gcp/bootstrap.tf @@ -25,7 +25,7 @@ resource "terraform_data" "migration" { environment = { JOB = google_cloud_run_v2_job.migrations.name REGION = var.region - PROJECT = var.project + PROJECT = var.project_id } command = <<-EOT set -euo pipefail diff --git a/terraform/litellm/gcp/cloudrun.tf b/terraform/litellm/gcp/cloudrun.tf index 28e1145b081..3e3f5f6924b 100644 --- a/terraform/litellm/gcp/cloudrun.tf +++ b/terraform/litellm/gcp/cloudrun.tf @@ -26,6 +26,39 @@ locals { { name = "GCS_BUCKET_NAME", value = google_storage_bucket.this.name }, ] + # OTel v2 is opt-in and gated on otel_endpoint, matching the AWS stack — + # nothing OTel-related is added to the container env until an endpoint is + # set. LITELLM_OTEL_V2 flips on alongside the OTEL_* block so the proxy + # never boots the instrumentation with no exporter wired in. + otel_enabled = var.otel_endpoint != "" + otel_environment_name = var.otel_environment_name != "" ? var.otel_environment_name : var.env + otel_shared_endpoint_kv = local.otel_enabled ? [ + { name = "LITELLM_OTEL_V2", value = "true" }, + { name = "OTEL_EXPORTER", value = var.otel_exporter }, + { name = "OTEL_ENDPOINT", value = var.otel_endpoint }, + { name = "OTEL_ENVIRONMENT_NAME", value = local.otel_environment_name }, + { name = "OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT", value = var.otel_capture_message_content }, + ] : [] + # OTel defaults are filtered out when the same key appears in + # *_extra_env, so a caller-supplied OTEL_SERVICE_NAME (or any other + # OTEL_*) takes precedence without colliding at Cloud Run apply time + # (Cloud Run rejects duplicate env var names). + gateway_otel_env_kv_raw = concat(local.otel_shared_endpoint_kv, local.otel_enabled ? [ + { name = "OTEL_SERVICE_NAME", value = "${local.name}-gateway" }, + ] : []) + backend_otel_env_kv_raw = concat(local.otel_shared_endpoint_kv, local.otel_enabled ? [ + { name = "OTEL_SERVICE_NAME", value = "${local.name}-backend" }, + ] : []) + gateway_otel_env_kv = [ + for e in local.gateway_otel_env_kv_raw : e if !contains(keys(var.gateway_extra_env), e.name) + ] + backend_otel_env_kv = [ + for e in local.backend_otel_env_kv_raw : e if !contains(keys(var.backend_extra_env), e.name) + ] + otel_env_secrets = local.otel_enabled && var.otel_headers_secret != "" ? [ + { name = "OTEL_HEADERS", secret = var.otel_headers_secret, version = "latest" }, + ] : [] + # Cloud Run v2 secret env vars use value_source.secret_key_ref pointing at a # secret resource ID. Shared between gateway and backend (the migrations # job has its own narrower env list — see migrations_env_secrets below). @@ -63,13 +96,6 @@ locals { for k, v in var.backend_extra_secrets : { name = k, secret = v, version = "latest" } ] - # Shell fragments composed with && so any failure short-circuits the - # whole startup instead of falling through to `exec uvicorn`. The - # python step is only included when the caller provided a proxy_config. - proxy_config_fragment = local.proxy_config_enabled ? [ - "python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\"" - ] : [] - # Decode the Memorystore CA cert (passed as REDIS_CA_PEM_B64) to the # path REDIS_SSL_CA_CERTS points at, so the redis-py client can validate # the rediss:// handshake. @@ -83,14 +109,12 @@ locals { ] gateway_args = join(" && ", concat( - local.proxy_config_fragment, local.redis_ca_fragment, local.database_url_fragment, - ["exec uvicorn gateway.main:app --host 0.0.0.0 --port 4000"], + ["exec uvicorn gateway.main:app --host 0.0.0.0 --port 4000 --workers ${var.gateway_num_workers}"], )) backend_args = join(" && ", concat( - local.proxy_config_fragment, local.redis_ca_fragment, local.database_url_fragment, ["exec uvicorn backend.main:app --host 0.0.0.0 --port 4001"], @@ -117,6 +141,7 @@ resource "google_cloud_run_v2_service" "gateway" { name = "${local.name}-gateway" location = var.region ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER" + labels = local.labels template { service_account = google_service_account.runtime.email @@ -149,7 +174,7 @@ resource "google_cloud_run_v2_service" "gateway" { } dynamic "env" { - for_each = concat(local.shared_env_kv, local.gateway_extra_env_kv, local.proxy_config_env) + for_each = concat(local.shared_env_kv, local.gateway_otel_env_kv, local.gateway_extra_env_kv, local.proxy_config_env) content { name = env.value.name value = env.value.value @@ -157,7 +182,7 @@ resource "google_cloud_run_v2_service" "gateway" { } dynamic "env" { - for_each = concat(local.shared_env_secrets, local.gateway_extra_secret_kv) + for_each = concat(local.shared_env_secrets, local.otel_env_secrets, local.gateway_extra_secret_kv) content { name = env.value.name value_source { @@ -169,6 +194,14 @@ resource "google_cloud_run_v2_service" "gateway" { } } + dynamic "volume_mounts" { + for_each = local.proxy_config_enabled ? [1] : [] + content { + name = local.proxy_config_volume + mount_path = local.proxy_config_mount_path + } + } + startup_probe { http_get { path = "/health/readiness" @@ -189,6 +222,17 @@ resource "google_cloud_run_v2_service" "gateway" { timeout_seconds = 5 } } + + dynamic "volumes" { + for_each = local.proxy_config_enabled ? [1] : [] + content { + name = local.proxy_config_volume + gcs { + bucket = google_storage_bucket.proxy_config[0].name + read_only = true + } + } + } } depends_on = [ @@ -196,6 +240,8 @@ resource "google_cloud_run_v2_service" "gateway" { google_secret_manager_secret_iam_member.db_password, google_secret_manager_secret_iam_member.license, google_secret_manager_secret_iam_member.extras, + google_secret_manager_secret_iam_member.otel_headers, + google_storage_bucket_iam_member.proxy_config_runtime, google_sql_user.app, # Don't go live until the schema is migrated; otherwise the proxy boots, # fails on missing tables, and Cloud Run keeps cold-restarting. @@ -208,6 +254,7 @@ resource "google_cloud_run_v2_service" "backend" { name = "${local.name}-backend" location = var.region ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER" + labels = local.labels template { service_account = google_service_account.runtime.email @@ -240,7 +287,7 @@ resource "google_cloud_run_v2_service" "backend" { } dynamic "env" { - for_each = concat(local.shared_env_kv, local.backend_default_env_kv, local.backend_extra_env_kv, local.proxy_config_env) + for_each = concat(local.shared_env_kv, local.backend_default_env_kv, local.backend_otel_env_kv, local.backend_extra_env_kv, local.proxy_config_env) content { name = env.value.name value = env.value.value @@ -248,7 +295,7 @@ resource "google_cloud_run_v2_service" "backend" { } dynamic "env" { - for_each = concat(local.shared_env_secrets, local.backend_managed_env_secrets, local.backend_extra_secret_kv) + for_each = concat(local.shared_env_secrets, local.backend_managed_env_secrets, local.otel_env_secrets, local.backend_extra_secret_kv) content { name = env.value.name value_source { @@ -260,6 +307,14 @@ resource "google_cloud_run_v2_service" "backend" { } } + dynamic "volume_mounts" { + for_each = local.proxy_config_enabled ? [1] : [] + content { + name = local.proxy_config_volume + mount_path = local.proxy_config_mount_path + } + } + startup_probe { http_get { path = "/health/readiness" @@ -280,6 +335,17 @@ resource "google_cloud_run_v2_service" "backend" { timeout_seconds = 5 } } + + dynamic "volumes" { + for_each = local.proxy_config_enabled ? [1] : [] + content { + name = local.proxy_config_volume + gcs { + bucket = google_storage_bucket.proxy_config[0].name + read_only = true + } + } + } } depends_on = [ @@ -288,6 +354,8 @@ resource "google_cloud_run_v2_service" "backend" { google_secret_manager_secret_iam_member.license, google_secret_manager_secret_iam_member.ui_password, google_secret_manager_secret_iam_member.extras, + google_secret_manager_secret_iam_member.otel_headers, + google_storage_bucket_iam_member.proxy_config_runtime, google_sql_user.app, terraform_data.migration, ] @@ -301,6 +369,7 @@ resource "google_cloud_run_v2_service" "ui" { name = "${local.name}-ui" location = var.region ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER" + labels = local.labels template { service_account = google_service_account.ui_runtime.email @@ -344,7 +413,7 @@ resource "google_cloud_run_v2_service" "ui" { # (LITELLM_MASTER_KEY); these IAM bindings just open up Cloud Run's invoker # gate so the LB request makes it to the container. resource "google_cloud_run_v2_service_iam_member" "gateway_allusers" { - project = var.project + project = var.project_id location = google_cloud_run_v2_service.gateway.location name = google_cloud_run_v2_service.gateway.name role = "roles/run.invoker" @@ -352,7 +421,7 @@ resource "google_cloud_run_v2_service_iam_member" "gateway_allusers" { } resource "google_cloud_run_v2_service_iam_member" "backend_allusers" { - project = var.project + project = var.project_id location = google_cloud_run_v2_service.backend.location name = google_cloud_run_v2_service.backend.name role = "roles/run.invoker" @@ -360,7 +429,7 @@ resource "google_cloud_run_v2_service_iam_member" "backend_allusers" { } resource "google_cloud_run_v2_service_iam_member" "ui_allusers" { - project = var.project + project = var.project_id location = google_cloud_run_v2_service.ui.location name = google_cloud_run_v2_service.ui.name role = "roles/run.invoker" @@ -374,6 +443,7 @@ resource "google_cloud_run_v2_service_iam_member" "ui_allusers" { resource "google_cloud_run_v2_job" "migrations" { name = "${local.name}-migrations" location = var.region + labels = local.labels template { template { diff --git a/terraform/litellm/gcp/cloudsql.tf b/terraform/litellm/gcp/cloudsql.tf index 70939c049c3..0af15eec22f 100644 --- a/terraform/litellm/gcp/cloudsql.tf +++ b/terraform/litellm/gcp/cloudsql.tf @@ -26,6 +26,8 @@ resource "google_sql_database_instance" "writer" { disk_size = 20 disk_autoresize = true + user_labels = local.labels + backup_configuration { enabled = true point_in_time_recovery_enabled = true @@ -45,6 +47,15 @@ resource "google_sql_database_instance" "writer" { } deletion_protection = var.cloudsql_deletion_protection + + lifecycle { + # disk_autoresize grows storage but never shrinks it. Without this, + # the first plan after any auto-grow reads disk_size as a shrink, which + # is an immutable change and forces a destroy/recreate of the instance + # (full data loss). Set the initial size only; let Cloud SQL own it + # thereafter. + ignore_changes = [settings[0].disk_size] + } } resource "google_sql_database_instance" "reader" { @@ -61,6 +72,8 @@ resource "google_sql_database_instance" "reader" { availability_type = "ZONAL" disk_autoresize = true + user_labels = local.labels + ip_configuration { ipv4_enabled = false private_network = google_compute_network.this.id @@ -68,6 +81,12 @@ resource "google_sql_database_instance" "reader" { } deletion_protection = var.cloudsql_deletion_protection + + lifecycle { + # Same autoresize footgun as the writer — the replica grows its disk + # independently. Never let a perceived shrink replace the instance. + ignore_changes = [settings[0].disk_size] + } } resource "google_sql_database" "this" { @@ -91,6 +110,7 @@ resource "google_sql_user" "app" { resource "google_secret_manager_secret" "db_password" { secret_id = "${local.name}-db-password" + labels = local.labels replication { auto {} } diff --git a/terraform/litellm/gcp/examples/default/.terraform.lock.hcl b/terraform/litellm/gcp/examples/default/.terraform.lock.hcl new file mode 100644 index 00000000000..e6285567315 --- /dev/null +++ b/terraform/litellm/gcp/examples/default/.terraform.lock.hcl @@ -0,0 +1,63 @@ +# This file is maintained automatically by "terraform init". +# Manual edits may be lost in future updates. + +provider "registry.terraform.io/hashicorp/google" { + version = "6.50.0" + constraints = "~> 6.10" + hashes = [ + "h1:79CwMTsp3Ud1nOl5hFS5mxQHyT0fGVye7pqpU0PPlHI=", + "zh:1f3513fcfcbf7ca53d667a168c5067a4dd91a4d4cccd19743e248ff31065503c", + "zh:3da7db8fc2c51a77dd958ea8baaa05c29cd7f829bd8941c26e2ea9cb3aadc1e5", + "zh:3e09ac3f6ca8111cbb659d38c251771829f4347ab159a12db195e211c76068bb", + "zh:7bb9e41c568df15ccf1a8946037355eefb4dfb4e35e3b190808bb7c4abae547d", + "zh:81e5d78bdec7778e6d67b5c3544777505db40a826b6eb5abe9b86d4ba396866b", + "zh:8d309d020fb321525883f5c4ea864df3d5942b6087f6656d6d8b3a1377f340fc", + "zh:93e112559655ab95a523193158f4a4ac0f2bfed7eeaa712010b85ebb551d5071", + "zh:d3efe589ffd625b300cef5917c4629513f77e3a7b111c9df65075f76a46a63c7", + "zh:d4a4d672bbef756a870d8f32b35925f8ce2ef4f6bbd5b71a3cb764f1b6c85421", + "zh:e13a86bca299ba8a118e80d5f84fbdd708fe600ecdceea1a13d4919c068379fe", + "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c", + "zh:fec30c095647b583a246c39d557704947195a1b7d41f81e369ba377d997faef6", + ] +} + +provider "registry.terraform.io/hashicorp/google-beta" { + version = "6.50.0" + constraints = "~> 6.10" + hashes = [ + "h1:P2GiUJM1frlPtBViwKn1A9V2dVBdGuWcX80w9TdH8ZE=", + "zh:18b442bd0a05321d39dda1e9e3f1bdede4e61bc2ac62cc7a67037a3864f75101", + "zh:2e387c51455862828bec923a3ec81abf63a4d998da470cf00e09003bda53d668", + "zh:3942e708fa84ebe54996086f4b1398cb747fe19cbcd0be07ace528291fb35dee", + "zh:496287dd48b34ae6197cb1f887abeafd07c33f389dbe431bb01e24846754cfdd", + "zh:6eca885419969ce5c2a706f34dce1f10bde9774757675f2d8a92d12e5a1be390", + "zh:710dbef826c3fe7f76f844dae47937e8e4c1279dd9205ec4610be04cf3327244", + "zh:777ebf44b24bfc7bdbf770dc089f1a72f143b4718fdedb8c6bd75983115a1ec2", + "zh:9c8703bba37b8c7ad857efc3513392c5a096c519397c1cb822d7612f38e4262f", + "zh:c4f1d3a73de2702277c99d5348ad6d374705bcfdd367ad964ff4cfd2cf06c281", + "zh:eca8df11af3f5a948492d5b8b5d01b4ec705aad10bc30ec1524205508ae28393", + "zh:f41e7fd5f2628e8fd6b8ea136366923858f54428d1729898925469b862c275c2", + "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c", + ] +} + +provider "registry.terraform.io/hashicorp/random" { + version = "3.9.0" + constraints = "~> 3.6" + hashes = [ + "h1:OO+IuvQJSPmWdN8AyyIEvPJbLvDQpgX/zbktoa9KsJE=", + "zh:161ad0bd9a75768c82f53fb6e7172a9d8be2d4889b012645a34795031aaf1bf1", + "zh:19dc9a5b17729725ccfc4f45b0500af0ee5bc6b6b160c7adb8f2bf617d2c80ea", + "zh:269eda8fe42daa7974d5a34d166c3ba9defe80cde86c01e4dadcfdf2e1f05e5f", + "zh:373f7c65566f8f2cc7f45d698654feb9d988996957e1266a69ca00c52d6d16d0", + "zh:5599d16804c41c83009ec621b6d6b6f74e102f5827678a4750f8809055546b61", + "zh:583be0440469a22bff70dcfa56593b01566860b29607437264adb51060cf46fc", + "zh:5f211d8ec3f2e1f414870d9584bfe26e6995560ef81c748f8447a48164767398", + "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3", + "zh:7b547fd16216761ef86efc3ed516ac5ac0c5c42b7c7eb24a08cef2d93f69ed5e", + "zh:7e7c0679daf2a382151d05068c8c3f0dae6b7b7dccf818827b73dd08638df2ef", + "zh:8089dec888a8038b9b4fb23b3df7e1057293dbc5b60b42cc47ff690d69d4b61b", + "zh:c51f15a031edfd6f23ce8ced3446ca7f8d8d647e2499890d7d5d10d5016d7257", + "zh:c94784f005708890dc6895afd53636ec00ec1e430b15d41e5aebfb1d4b39bd04", + ] +} diff --git a/terraform/litellm/gcp/examples/default/TUTORIAL.md b/terraform/litellm/gcp/examples/default/TUTORIAL.md new file mode 100644 index 00000000000..9026207563d --- /dev/null +++ b/terraform/litellm/gcp/examples/default/TUTORIAL.md @@ -0,0 +1,134 @@ +# Deploy LiteLLM on GCP + + + +This walkthrough provisions the full LiteLLM stack on GCP via Cloud Run, Cloud SQL, Memorystore Redis, and an external HTTPS load balancer. You'll answer a few prompts; DeployStack writes a `terraform.tfvars` and runs `terraform apply` against the project you select. + +## Prerequisites + + + +Pick the GCP project you want to deploy into, then make sure billing is enabled on it. The stack provisions paid resources (Cloud SQL, Memorystore, an LB anycast IP). + +## Enable required APIs + +The stack needs these APIs enabled in the target project. Click to enable, or run the gcloud command below. + + + +```bash +gcloud services enable \ + run.googleapis.com \ + sqladmin.googleapis.com \ + redis.googleapis.com \ + secretmanager.googleapis.com \ + vpcaccess.googleapis.com \ + compute.googleapis.com \ + servicenetworking.googleapis.com \ + storage.googleapis.com \ + artifactregistry.googleapis.com +``` + +## Create the Artifact Registry passthrough to GHCR + +Cloud Run only pulls from Artifact Registry, `gcr.io`, or `docker.io`; it rejects `ghcr.io` URIs at apply time. The four LiteLLM images live on GHCR, so the stack needs a remote Artifact Registry repo pointed at GHCR. This is a one-time setup per project. + +```bash +gcloud artifacts repositories create litellm \ + --repository-format=docker \ + --location= \ + --mode=remote-repository \ + --remote-repo-config-desc="GitHub Container Registry passthrough" \ + --remote-docker-repo=https://ghcr.io +``` + +If the repo already exists, this command exits with a clear error and you can move on. Then set `image_registry` in `terraform.tfvars` to `-docker.pkg.dev//litellm/berriai` before applying. + +## (Optional) Set tenant secrets + +The stack auto-generates a `LITELLM_MASTER_KEY` if you don't supply one. If you have an enterprise license or want a pre-chosen master key, export them as `TF_VAR_*` env vars before running the installer so they end up in Secret Manager but not in `terraform.tfvars`. + +```bash +export TF_VAR_litellm_master_key="sk-..." # optional; auto-generated if omitted +export TF_VAR_litellm_license="lic-..." # optional; OSS-only without it +export TF_VAR_ui_password="..." # optional; falls back to master_key for UI login +``` + +Skip this step entirely for a trial deploy. + +## Run the installer + +DeployStack will prompt for project, region, tenant, env, image tag, and TLS posture, then run `terraform apply`. Open `deploystack.json` if you want to see the prompt definitions first. + +```bash +deploystack install +``` + +The first apply takes 20-25 minutes; most of that is Cloud SQL provisioning. The migration Cloud Run Job runs automatically once the database is ready, and only then do gateway, backend, and UI start. + +## Grab the LB URL + +```bash +terraform output lb_url +``` + +For trial deploys (`allow_plaintext_lb=true`), this is `http://`. The UI lives at `/ui`; sign in with username `admin` and the master key: + +```bash +gcloud secrets versions access latest \ + --secret="$(terraform output -raw master_key_secret_id)" +``` + +## Going to TLS + +If you picked `allow_plaintext_lb=true` to bootstrap but want HTTPS for real, point a DNS A record at the LB IP, then re-run terraform with `lb_domains` set and `allow_plaintext_lb` removed: + +```bash +terraform apply \ + -var 'lb_domains=["proxy.example.com"]' +``` + +Google-managed certs sit in `PROVISIONING` for 15-60 minutes after DNS propagates. You can watch the state with `gcloud compute ssl-certificates describe -litellm--cert`. + +## Adding provider API keys + +Provider keys (OpenAI, Anthropic, etc.) belong in Secret Manager, not in `terraform.tfvars`. Create the secret first, then reference its resource ID from `gateway_extra_secrets` and re-apply: + +```bash +echo -n "sk-proj-..." | gcloud secrets create openai-api-key --data-file=- +``` + +Edit `terraform.tfvars`: + +```hcl +gateway_extra_secrets = { + OPENAI_API_KEY = "projects//secrets/openai-api-key" +} +proxy_config = { + model_list = [ + { + model_name = "gpt-4o" + litellm_params = { + model = "openai/gpt-4o" + api_key = "os.environ/OPENAI_API_KEY" + } + }, + ] +} +``` + +Then `terraform apply`. + +## Tearing it all down + +```bash +deploystack uninstall +``` + +`cloudsql_deletion_protection` is `true` by default; flip it to `false` in `terraform.tfvars` and apply before uninstalling if you actually want the DB gone. Same goes for `gcs_force_destroy` on the bucket. + +## You're done + + + +Full configuration reference is in `README.md`, and every input variable on the underlying module lives in `variables.tf`. diff --git a/terraform/litellm/gcp/examples/default/deploystack.json b/terraform/litellm/gcp/examples/default/deploystack.json new file mode 100644 index 00000000000..6e5339272e7 --- /dev/null +++ b/terraform/litellm/gcp/examples/default/deploystack.json @@ -0,0 +1,37 @@ +{ + "title": "LiteLLM on GCP (Cloud Run)", + "name": "litellm-gcp", + "description": "Deploys the LiteLLM proxy on GCP: Cloud Run gateway/backend/UI, Cloud SQL with a read replica, Memorystore Redis, a GCS bucket, Secret Manager entries, and an external HTTPS load balancer. Takes ~20-25 minutes on the first apply.", + "duration": 25, + "documentation_link": "https://github.com/BerriAI/litellm/blob/main/terraform/litellm/gcp/README.md", + "collect_project": true, + "collect_region": true, + "region_type": "run", + "region_default": "us-central1", + "collect_zone": false, + "custom_settings": [ + { + "name": "tenant", + "description": "Tenant slug used as the prefix for every GCP resource the stack creates (e.g. 'acme' produces 'acme-litellm--gateway'). 1-21 lowercase chars starting with a letter", + "default": "acme", + "validation": "^[a-z][a-z0-9-]{0,20}$" + }, + { + "name": "env", + "description": "Environment suffix appended to every resource name (e.g. 'stage', 'prod', 'dev'). 1-9 lowercase chars starting with a letter", + "default": "stage", + "validation": "^[a-z][a-z0-9-]{0,8}$" + }, + { + "name": "image_tag", + "description": "Tag for the four litellm-* images (gateway, backend, ui, migrations). Bump together when bumping LiteLLM", + "default": "v1.86.0-dev" + }, + { + "name": "allow_plaintext_lb", + "description": "Skip TLS on the load balancer (HTTP-only). Set true for trial/dev. For production, leave false and add lb_domains to terraform.tfvars after the first apply", + "default": "true", + "options": ["true", "false"] + } + ] +} diff --git a/terraform/litellm/gcp/examples/default/main.tf b/terraform/litellm/gcp/examples/default/main.tf new file mode 100644 index 00000000000..8760d445f0c --- /dev/null +++ b/terraform/litellm/gcp/examples/default/main.tf @@ -0,0 +1,51 @@ +# One-command deploy of the LiteLLM GCP stack. +# +# cd terraform/litellm/gcp/examples/default +# cp terraform.tfvars.example terraform.tfvars # edit it +# terraform init +# terraform apply +# +# This root just wires the providers (see providers.tf) to the module. The +# module itself (../../) declares no provider, so it can also be consumed +# from your own config with count/for_each or impersonated-SA providers: +# +# module "litellm" { +# source = "github.com/BerriAI/litellm//terraform/litellm/gcp?ref=" +# ... +# } +# +# Note: the module declares no `configuration_aliases`, so it receives only the +# caller's single default google/google-beta providers — a `for_each` over it +# runs every instance against the same project/region/credentials. To fan out +# across projects or regions, use one root per project. See the GCP README's +# "Using as a module" section. +# +# Knobs not surfaced as variables here (per-component sizing/instances, +# Cloud SQL tier/edition, Memorystore tier, per-component image overrides) +# can be set directly on this block — see ../../variables.tf. +module "litellm" { + source = "../../" + + project_id = var.project_id + region = var.region + tenant = var.tenant + env = var.env + + litellm_master_key = var.litellm_master_key + litellm_license = var.litellm_license + ui_password = var.ui_password + + image_registry = var.image_registry + image_tag = var.image_tag + + lb_domains = var.lb_domains + allow_plaintext_lb = var.allow_plaintext_lb + cloudsql_deletion_protection = var.cloudsql_deletion_protection + gcs_force_destroy = var.gcs_force_destroy + + proxy_config = var.proxy_config + gateway_extra_env = var.gateway_extra_env + backend_extra_env = var.backend_extra_env + gateway_extra_secrets = var.gateway_extra_secrets + backend_extra_secrets = var.backend_extra_secrets +} diff --git a/terraform/litellm/gcp/examples/default/outputs.tf b/terraform/litellm/gcp/examples/default/outputs.tf new file mode 100644 index 00000000000..3a9343c4850 --- /dev/null +++ b/terraform/litellm/gcp/examples/default/outputs.tf @@ -0,0 +1,59 @@ +output "lb_ip" { + description = "Global anycast IP of the external load balancer." + value = module.litellm.lb_ip +} + +output "lb_url" { + description = "Proxy URL. Dashboard at /, API at /v1/*." + value = module.litellm.lb_url +} + +output "gateway_service_url" { + description = "Default Cloud Run URL for the gateway (bypasses the LB)." + value = module.litellm.gateway_service_url +} + +output "backend_service_url" { + description = "Default Cloud Run URL for the backend (bypasses the LB)." + value = module.litellm.backend_service_url +} + +output "ui_service_url" { + description = "Default Cloud Run URL for the UI (bypasses the LB)." + value = module.litellm.ui_service_url +} + +output "cloudsql_writer_ip" { + description = "Private IP of the Cloud SQL writer." + value = module.litellm.cloudsql_writer_ip +} + +output "cloudsql_reader_ip" { + description = "Private IP of the Cloud SQL read replica." + value = module.litellm.cloudsql_reader_ip +} + +output "redis_endpoint" { + description = "Memorystore Redis endpoint." + value = module.litellm.redis_endpoint +} + +output "gcs_bucket" { + description = "GCS bucket name." + value = module.litellm.gcs_bucket +} + +output "master_key_secret_id" { + description = "Secret Manager resource ID holding LITELLM_MASTER_KEY." + value = module.litellm.master_key_secret_id +} + +output "db_password_secret_id" { + description = "Secret Manager resource ID holding the Cloud SQL app-user password." + value = module.litellm.db_password_secret_id +} + +output "migration_run_command" { + description = "Break-glass command to re-run the one-off migration job." + value = module.litellm.migration_run_command +} diff --git a/terraform/litellm/gcp/examples/default/providers.tf b/terraform/litellm/gcp/examples/default/providers.tf new file mode 100644 index 00000000000..d4a9836e887 --- /dev/null +++ b/terraform/litellm/gcp/examples/default/providers.tf @@ -0,0 +1,17 @@ +# Providers are configured HERE, in the root, not in the module. A module +# that declares its own configured `provider` block can't be called with +# count/for_each/depends_on and gives the caller no way to set an +# impersonated service account, a different project, or aliases. +# +# The module's resources inherit these default (unaliased) `google` / +# `google-beta` configs automatically through the module call, so project +# and region set here flow into every resource that doesn't pass its own. +provider "google" { + project = var.project_id + region = var.region +} + +provider "google-beta" { + project = var.project_id + region = var.region +} diff --git a/terraform/litellm/gcp/terraform.tfvars.example b/terraform/litellm/gcp/examples/default/terraform.tfvars.example similarity index 70% rename from terraform/litellm/gcp/terraform.tfvars.example rename to terraform/litellm/gcp/examples/default/terraform.tfvars.example index 5c22a14c6d6..6358ec96e6d 100644 --- a/terraform/litellm/gcp/terraform.tfvars.example +++ b/terraform/litellm/gcp/examples/default/terraform.tfvars.example @@ -1,5 +1,5 @@ -project = "my-gcp-project" -region = "us-central1" +project_id = "my-gcp-project" +region = "us-central1" # Resource naming: every GCP resource the stack creates is named # `${tenant}-litellm-${env}` (or that plus a per-resource suffix). E.g. @@ -28,14 +28,14 @@ env = "stage" # cloudsql_deletion_protection = true # default: refuse destroy on the DB # gcs_force_destroy = false # default: refuse destroy on a non-empty bucket -# Component images. Defaults pin all four to the same GHCR release tag — -# bump them together when bumping LiteLLM. To use private images, mirror -# them into Artifact Registry first — Cloud Run only authenticates against -# AR / gcr.io. -# gateway_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/gateway:1.86.0-dev" -# backend_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/backend:1.86.0-dev" -# ui_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/ui:1.86.0-dev" -# migrations_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/migrations:1.86.0-dev" +# Images. Cloud Run rejects ghcr.io, so a real deploy must point +# image_registry at an Artifact Registry remote repo (see README "Image +# pulls"); image_tag is applied to all four litellm-* images. Per-component +# *_image overrides are NOT exposed here — set them directly on the +# `module "litellm"` block in main.tf (see ../../variables.tf) if you need +# to mix-and-match versions. +# image_registry = "us-central1-docker.pkg.dev/my-gcp-project/litellm/berriai" +# image_tag = "v1.86.0-dev" # ---------- proxy_config (mirrors helm gateway.config.proxy_config) ---------- # proxy_config = { @@ -75,3 +75,13 @@ env = "stage" # OPENAI_API_KEY = "projects/my-gcp-project/secrets/openai-api-key" # ANTHROPIC_API_KEY = "projects/my-gcp-project/secrets/anthropic-api-key" # } + +# ---------- OpenTelemetry v2 ---------- +# OTel is gated on otel_endpoint: empty (default) and nothing is added to +# the container env; set it and both gateway and backend gain +# LITELLM_OTEL_V2=true plus the OTEL_* block (with OTEL_SERVICE_NAME +# stamped per component). These knobs aren't surfaced as wrapper vars in +# this example; set them directly on the `module "litellm"` block in +# main.tf (otel_endpoint, otel_exporter, otel_environment_name, +# otel_capture_message_content, otel_headers_secret). Full docs in +# ../../variables.tf. diff --git a/terraform/litellm/gcp/examples/default/variables.tf b/terraform/litellm/gcp/examples/default/variables.tf new file mode 100644 index 00000000000..56e5ec88ef8 --- /dev/null +++ b/terraform/litellm/gcp/examples/default/variables.tf @@ -0,0 +1,120 @@ +# Curated surface for the one-command deploy path. The module (../../) +# exposes far more knobs (per-component CPU/memory/instances, Cloud SQL +# tier/edition, Memorystore tier, per-component image overrides, …). To +# tune those, set them directly on the `module "litellm"` block in +# main.tf, or call the module from your own root config. Full per-variable +# docs live in ../../variables.tf — the module is the source of truth. + +variable "project_id" { + description = "GCP project ID." + type = string +} + +variable "region" { + description = "GCP region for VPC, Cloud SQL, Memorystore, Cloud Run, and the LB IP." + type = string + default = "us-central1" +} + +variable "tenant" { + description = "Tenant slug — prefix for every resource (-litellm-)." + type = string +} + +variable "env" { + description = "Environment suffix (stage, prod, dev)." + type = string +} + +# Sensitive — prefer TF_VAR_litellm_master_key / TF_VAR_litellm_license / +# TF_VAR_ui_password so values stay out of any committed tfvars file. +variable "litellm_master_key" { + description = "Pre-existing LITELLM_MASTER_KEY (sk-…). Empty → auto-generated." + type = string + default = "" + sensitive = true +} + +variable "litellm_license" { + description = "LiteLLM enterprise license. Empty → OSS-only." + type = string + default = "" + sensitive = true +} + +variable "ui_password" { + description = "UI admin password. Empty → falls back to LITELLM_MASTER_KEY." + type = string + default = "" + sensitive = true +} + +# Image source. Cloud Run rejects ghcr.io, so a real deploy must point +# image_registry at an Artifact Registry remote repo (see README "Image +# pulls"). Per-component overrides live in ../../variables.tf. +variable "image_registry" { + description = "Registry path prefix; images composed as /litellm-:." + type = string + default = "ghcr.io/berriai" +} + +variable "image_tag" { + description = "Tag applied to all four litellm-* images. Bump in lockstep." + type = string + default = "v1.86.0-dev" +} + +# TLS — provide DNS names for a managed cert, or opt into HTTP-only for dev. +variable "lb_domains" { + description = "DNS names (already pointing at lb_ip) for a Google-managed cert. Empty → no TLS." + type = list(string) + default = [] +} + +variable "allow_plaintext_lb" { + description = "Opt into HTTP-only LB (trial/dev only)." + type = bool + default = false +} + +variable "cloudsql_deletion_protection" { + description = "Cloud SQL deletion protection (writer + reader)." + type = bool + default = true +} + +variable "gcs_force_destroy" { + description = "Allow destroy of a non-empty GCS bucket (ephemeral/CI only)." + type = bool + default = false +} + +variable "proxy_config" { + description = "LiteLLM proxy config (contents of config.yaml). Empty → defaults." + type = any + default = {} +} + +variable "gateway_extra_env" { + description = "Plain-text env vars layered onto the gateway." + type = map(string) + default = {} +} + +variable "backend_extra_env" { + description = "Plain-text env vars layered onto the backend." + type = map(string) + default = {} +} + +variable "gateway_extra_secrets" { + description = "Gateway env vars sourced from Secret Manager (name → secret resource ID)." + type = map(string) + default = {} +} + +variable "backend_extra_secrets" { + description = "Backend env vars sourced from Secret Manager (name → secret resource ID)." + type = map(string) + default = {} +} diff --git a/terraform/litellm/gcp/examples/default/versions.tf b/terraform/litellm/gcp/examples/default/versions.tf new file mode 100644 index 00000000000..a630c59afd0 --- /dev/null +++ b/terraform/litellm/gcp/examples/default/versions.tf @@ -0,0 +1,18 @@ +terraform { + required_version = ">= 1.6.0" + + required_providers { + google = { + source = "hashicorp/google" + version = "~> 6.10" + } + google-beta = { + source = "hashicorp/google-beta" + version = "~> 6.10" + } + random = { + source = "hashicorp/random" + version = "~> 3.6" + } + } +} diff --git a/terraform/litellm/gcp/gcs.tf b/terraform/litellm/gcp/gcs.tf index 86511d38a31..3ba1f482219 100644 --- a/terraform/litellm/gcp/gcs.tf +++ b/terraform/litellm/gcp/gcs.tf @@ -7,7 +7,7 @@ resource "random_id" "bucket_suffix" { } resource "google_storage_bucket" "this" { - name = "${var.project}-${local.name}-${random_id.bucket_suffix.hex}" + name = "${var.project_id}-${local.name}-${random_id.bucket_suffix.hex}" location = var.region uniform_bucket_level_access = true force_destroy = var.gcs_force_destroy @@ -18,7 +18,7 @@ resource "google_storage_bucket" "this" { public_access_prevention = "enforced" - labels = var.labels + labels = local.labels } # Cloud Run runtime SA gains object admin on this bucket only. @@ -27,3 +27,43 @@ resource "google_storage_bucket_iam_member" "runtime" { role = "roles/storage.objectAdmin" member = "serviceAccount:${google_service_account.runtime.email}" } + +# Dedicated bucket holding only config.yaml. Mounted read-only into the +# gateway and backend via Cloud Run v2's gcsfuse volume. Kept separate from +# the data-plane bucket above so the runtime SA can hold a narrower +# objectViewer binding here (config is read-only at runtime) while keeping +# objectAdmin on the data-plane bucket. Only created when proxy_config is +# non-empty. +resource "google_storage_bucket" "proxy_config" { + count = local.proxy_config_enabled ? 1 : 0 + + name = "${var.project_id}-${local.name}-config-${random_id.bucket_suffix.hex}" + location = var.region + uniform_bucket_level_access = true + force_destroy = var.gcs_force_destroy + + versioning { + enabled = true + } + + public_access_prevention = "enforced" + + labels = local.labels +} + +resource "google_storage_bucket_object" "proxy_config" { + count = local.proxy_config_enabled ? 1 : 0 + + name = local.proxy_config_file_name + bucket = google_storage_bucket.proxy_config[0].name + content = local.proxy_config_yaml + content_type = "application/yaml" +} + +resource "google_storage_bucket_iam_member" "proxy_config_runtime" { + count = local.proxy_config_enabled ? 1 : 0 + + bucket = google_storage_bucket.proxy_config[0].name + role = "roles/storage.objectViewer" + member = "serviceAccount:${google_service_account.runtime.email}" +} diff --git a/terraform/litellm/gcp/iam.tf b/terraform/litellm/gcp/iam.tf index 93a9997ed7a..dc3ae5e0912 100644 --- a/terraform/litellm/gcp/iam.tf +++ b/terraform/litellm/gcp/iam.tf @@ -21,7 +21,7 @@ resource "google_service_account" "ui_runtime" { # Cloud SQL client — lets the Cloud Run services connect to the instance # over private IP via the VPC connector. resource "google_project_iam_member" "runtime_cloudsql" { - project = var.project + project = var.project_id role = "roles/cloudsql.client" member = "serviceAccount:${google_service_account.runtime.email}" } @@ -69,3 +69,13 @@ resource "google_secret_manager_secret_iam_member" "extras" { role = "roles/secretmanager.secretAccessor" member = "serviceAccount:${google_service_account.runtime.email}" } + +# OTEL_HEADERS secret accessor — only created when var.otel_headers_secret +# is set. Carries the OTLP collector's auth header(s). +resource "google_secret_manager_secret_iam_member" "otel_headers" { + count = var.otel_headers_secret == "" ? 0 : 1 + + secret_id = var.otel_headers_secret + role = "roles/secretmanager.secretAccessor" + member = "serviceAccount:${google_service_account.runtime.email}" +} diff --git a/terraform/litellm/gcp/load_balancer.tf b/terraform/litellm/gcp/load_balancer.tf index b0081786f13..11f30d0f944 100644 --- a/terraform/litellm/gcp/load_balancer.tf +++ b/terraform/litellm/gcp/load_balancer.tf @@ -14,7 +14,8 @@ locals { } resource "google_compute_global_address" "lb" { - name = "${local.name}-lb-ip" + name = "${local.name}-lb-ip" + labels = local.labels } # Serverless NEGs — one per Cloud Run service. @@ -148,6 +149,7 @@ resource "google_compute_global_forwarding_rule" "http" { load_balancing_scheme = "EXTERNAL_MANAGED" ip_address = google_compute_global_address.lb.address target = google_compute_target_http_proxy.this.id + labels = local.labels } # ---------- HTTPS (gated on var.lb_domains) ---------- @@ -160,11 +162,22 @@ resource "google_compute_global_forwarding_rule" "http" { resource "google_compute_managed_ssl_certificate" "this" { count = local.tls_enabled ? 1 : 0 - name = "${local.name}-cert" + + # A managed cert's `domains` is immutable, so changing var.lb_domains + # forces replacement, and the cert is referenced by the HTTPS target + # proxy — a destroy-then-create replacement fails with + # `resourceInUseByAnotherResource`. Hashing the domains into the name + # makes the name change with the domain set, so create_before_destroy + # builds the new cert + repoints the proxy before deleting the old one. + name = "${local.name}-cert-${substr(sha1(join(",", var.lb_domains)), 0, 8)}" managed { domains = var.lb_domains } + + lifecycle { + create_before_destroy = true + } } resource "google_compute_target_https_proxy" "this" { @@ -182,4 +195,5 @@ resource "google_compute_global_forwarding_rule" "https" { load_balancing_scheme = "EXTERNAL_MANAGED" ip_address = google_compute_global_address.lb.address target = google_compute_target_https_proxy.this[0].id + labels = local.labels } diff --git a/terraform/litellm/gcp/locals.tf b/terraform/litellm/gcp/locals.tf index 2d1231fb197..732b4ce7d6b 100644 --- a/terraform/litellm/gcp/locals.tf +++ b/terraform/litellm/gcp/locals.tf @@ -8,6 +8,19 @@ locals { # the stack can reference local.name. name = "${var.tenant}-litellm-${var.env}" + # Mirrors the AWS stack's local.tags: the module stamps its own + # `litellm-stack` / `managed-by` labels onto every label-supporting + # resource (Cloud Run, Cloud SQL, Memorystore, Secret Manager, GCS) and + # merges var.labels on top. GCP label keys/values are lower-kebab/snake + # only, so the key is `litellm-stack`, not AWS's `litellm:stack`. + labels = merge( + { + "litellm-stack" = local.name + "managed-by" = "terraform" + }, + var.labels, + ) + gateway_path_prefixes = [ "/v1/chat/*", "/chat/*", "/v1/completions*", "/completions*", @@ -62,11 +75,18 @@ locals { ] proxy_config_enabled = length(keys(var.proxy_config)) > 0 - proxy_config_b64 = local.proxy_config_enabled ? base64encode(yamlencode(var.proxy_config)) : "" + proxy_config_yaml = local.proxy_config_enabled ? yamlencode(var.proxy_config) : "" + + proxy_config_mount_path = "/etc/litellm" + proxy_config_file_name = "config.yaml" + proxy_config_volume = "proxy-config" proxy_config_env = local.proxy_config_enabled ? [ - { name = "LITELLM_PROXY_CONFIG_B64", value = local.proxy_config_b64 }, - { name = "CONFIG_FILE_PATH", value = "/tmp/litellm-config.yaml" }, + { name = "CONFIG_FILE_PATH", value = "${local.proxy_config_mount_path}/${local.proxy_config_file_name}" }, + # Forces a new Cloud Run revision when the YAML changes; gcsfuse only + # surfaces the new object on container restart, so without this an + # updated proxy_config would sit in the bucket unread. + { name = "PROXY_CONFIG_HASH", value = md5(local.proxy_config_yaml) }, ] : [] # Resolved image URIs: per-component override wins, otherwise compose diff --git a/terraform/litellm/gcp/outputs.tf b/terraform/litellm/gcp/outputs.tf index df25215adcc..6f1f1d5ccf4 100644 --- a/terraform/litellm/gcp/outputs.tf +++ b/terraform/litellm/gcp/outputs.tf @@ -59,6 +59,6 @@ output "migration_run_command" { "gcloud run jobs execute %s --region %s --project %s --wait", google_cloud_run_v2_job.migrations.name, var.region, - var.project, + var.project_id, ) } diff --git a/terraform/litellm/gcp/providers.tf b/terraform/litellm/gcp/providers.tf deleted file mode 100644 index fd1584463f8..00000000000 --- a/terraform/litellm/gcp/providers.tf +++ /dev/null @@ -1,9 +0,0 @@ -provider "google" { - project = var.project - region = var.region -} - -provider "google-beta" { - project = var.project - region = var.region -} diff --git a/terraform/litellm/gcp/redis.tf b/terraform/litellm/gcp/redis.tf index f7e174ecbae..0e07c416e85 100644 --- a/terraform/litellm/gcp/redis.tf +++ b/terraform/litellm/gcp/redis.tf @@ -9,6 +9,8 @@ resource "google_redis_instance" "this" { redis_version = "REDIS_7_0" + labels = local.labels + # In-transit encryption between Cloud Run and Memorystore. The instance # exposes its self-signed CA via `server_ca_certs` (read in cloudrun.tf # and passed to the proxy as REDIS_CA_PEM_B64); the proxy decodes it to diff --git a/terraform/litellm/gcp/secrets.tf b/terraform/litellm/gcp/secrets.tf index 80312e06a91..f93514bb70b 100644 --- a/terraform/litellm/gcp/secrets.tf +++ b/terraform/litellm/gcp/secrets.tf @@ -10,6 +10,7 @@ resource "random_password" "master_key" { # account gets accessor permission on it (see iam.tf). resource "google_secret_manager_secret" "master_key" { secret_id = "${local.name}-master-key" + labels = local.labels replication { auto {} } @@ -29,6 +30,7 @@ resource "google_secret_manager_secret" "license" { count = var.litellm_license == "" ? 0 : 1 secret_id = "${local.name}-license" + labels = local.labels replication { auto {} } @@ -49,6 +51,7 @@ resource "google_secret_manager_secret" "ui_password" { count = var.ui_password == "" ? 0 : 1 secret_id = "${local.name}-ui-password" + labels = local.labels replication { auto {} } diff --git a/terraform/litellm/gcp/variables.tf b/terraform/litellm/gcp/variables.tf index fe726b0317a..4355192e9f1 100644 --- a/terraform/litellm/gcp/variables.tf +++ b/terraform/litellm/gcp/variables.tf @@ -1,4 +1,4 @@ -variable "project" { +variable "project_id" { description = "GCP project ID." type = string } @@ -30,11 +30,9 @@ variable "env" { } variable "labels" { - description = "Resource labels merged into every label-supporting resource." + description = "Per-deployment labels applied to every label-supporting resource the module creates, on top of the module's own `litellm-stack` / `managed-by` labels. Mirrors the AWS stack's `tags` input." type = map(string) - default = { - "managed-by" = "terraform" - } + default = {} } # ---------- Tenant-supplied secrets ---------- @@ -171,6 +169,17 @@ variable "gateway_memory" { default = "4Gi" } +variable "gateway_num_workers" { + description = "uvicorn worker processes per gateway instance (passed as --workers). Size relative to gateway_cpu — uvicorn recommends ~(2 × vCPU) + 1 for CPU-bound work. Mirrors the AWS stack's gateway_num_workers." + type = number + default = 1 + + validation { + condition = var.gateway_num_workers >= 1 + error_message = "gateway_num_workers must be >= 1." + } +} + # Cloud Run autoscales out of the box (request-rate driven). The min/max # bounds mirror the HPA replica bounds in helm/litellm/values.yaml so each # stack scales over the same range. Cloud Run has no direct CPU-utilization @@ -394,12 +403,90 @@ variable "backend_extra_secrets" { variable "proxy_config" { description = <<-EOT LiteLLM proxy config (contents of config.yaml). Mirrors the helm chart's - `gateway.config.proxy_config`. Passed to gateway, backend, and the - migration job as a base64-encoded env var and decoded to - /tmp/litellm-config.yaml at container start; CONFIG_FILE_PATH is set - automatically. Reference env-injected secrets from the YAML via - `os.environ/`. Leave empty ({}) to skip. + `gateway.config.proxy_config`. YAML-encoded and uploaded to a dedicated + GCS bucket as `config.yaml`, then mounted read-only into the gateway + and backend at `/etc/litellm` via Cloud Run v2's gcsfuse volume; + CONFIG_FILE_PATH is set automatically. A hash of the YAML is wired in + as an env var so a config-only edit forces a new revision (gcsfuse + surfaces the new object on container restart). Reference env-injected + secrets from the YAML via `os.environ/`. Leave empty ({}) to + skip — the bucket isn't created and no volume is mounted. EOT type = any default = {} } + +# ---------- OpenTelemetry v2 ---------- +# +# https://docs.litellm.ai/docs/observability/opentelemetry_v2 +# +# OTel v2 is opt-in and gated entirely on otel_endpoint, matching the AWS +# stack. Leave otel_endpoint = "" and nothing OTel-related is added to the +# container env. Set it and the gateway/backend gain LITELLM_OTEL_V2=true +# plus the OTEL_* block (per-component OTEL_SERVICE_NAME, exporter, endpoint, +# environment name, capture-content), with OTEL_HEADERS sourced from +# otel_headers_secret when provided. + +variable "otel_endpoint" { + description = <<-EOT + OTLP collector URL (e.g. https://otel.example.com:4318 for HTTP, or + your collector's :4317 for gRPC). Empty disables OTel entirely (no + LITELLM_OTEL_V2, no OTEL_* env). When set, LITELLM_OTEL_V2=true plus + OTEL_EXPORTER / OTEL_ENDPOINT are injected and spans ship to the + collector. + EOT + type = string + default = "" +} + +variable "otel_exporter" { + description = <<-EOT + OTel exporter protocol. Ignored when otel_endpoint is empty. `otlp_http` + is the safer default (works through a vanilla L7 ingress); `otlp_grpc` + needs the collector reachable over h2 and the `grpcio` extra installed + in the proxy image. + EOT + type = string + default = "otlp_http" + validation { + condition = contains(["otlp_http", "otlp_grpc", "console"], var.otel_exporter) + error_message = "otel_exporter must be one of: otlp_http, otlp_grpc, console." + } +} + +variable "otel_headers_secret" { + description = <<-EOT + Optional Secret Manager secret resource ID + (`projects//secrets/`) whose latest version is the + value of OTEL_HEADERS — used for collector auth, e.g. + `Authorization=Bearer `. Mounted as an env-var secret_key_ref; + the runtime SA auto-gains roles/secretmanager.secretAccessor. + EOT + type = string + default = "" +} + +variable "otel_environment_name" { + description = <<-EOT + Value for OTEL_ENVIRONMENT_NAME (becomes `deployment.environment` on + every span). Defaults to var.env so spans land tagged with the + deployment env without extra wiring. + EOT + type = string + default = "" +} + +variable "otel_capture_message_content" { + description = <<-EOT + Value for OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT. Default + `no_content` matches the litellm default; flip to `prompt_and_completion` + only when you've audited what's about to land in your observability + backend, because raw prompts/completions are typically sensitive. + EOT + type = string + default = "no_content" + validation { + condition = contains(["no_content", "prompt_and_completion"], var.otel_capture_message_content) + error_message = "otel_capture_message_content must be one of: no_content, prompt_and_completion." + } +}