BerriAI · yassin-berriai · Jun 6, 2026 · May 17, 2026 · Jun 1, 2026 · Jun 1, 2026
diff --git a/terraform/litellm/README.md b/terraform/litellm/README.md
@@ -1,18 +1,34 @@
 # LiteLLM Terraform stacks
 
-Two self-contained Terraform root modules that deploy the **componentized**
-LiteLLM proxy — the gateway, backend, and UI as three independent containers
-(see `helm/litellm/` for the canonical chart with the same split).
+Two self-contained, reusable Terraform **modules** that deploy the
+**componentized** LiteLLM proxy — the gateway, backend, and UI as three
+independent containers (see `helm/litellm/` for the canonical chart with the
+same split).
+
+Each module declares **no `provider` block of its own**, so it can be called
+with `count` / `for_each` / `depends_on` and the caller controls region,
+assume-role / impersonation, aliases, and `default_tags`. A ready-to-run root
+that wires the provider lives at `<stack>/examples/default/` — that's the
+one-command deploy path. To embed a stack in your own config, call the module
+by source:
+
+```hcl
+module "litellm" {
+  source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
+  # ... inputs ...
+}
+```
 
 | Stack  | Compute     | Database (writer + reader)         | Cache       | Object store | Public entrypoint  |
 | ------ | ----------- | ---------------------------------- | ----------- | ------------ | ------------------ |
 | `aws/` | ECS Fargate | Aurora Postgres (IAM auth)         | ElastiCache | S3           | Application LB     |
 | `gcp/` | Cloud Run   | Cloud SQL Postgres (password auth) | Memorystore | GCS          | External HTTPS LB  |
 
-Each stack creates its own VPC and managed data stores — drop in a tfvars
-file and run `terraform apply`. Both stacks support a typed `proxy_config`
-input (mirrors `helm/litellm`'s `gateway.config.proxy_config`) and per-component
-extra env vars / secret-manager refs.
+Each stack creates its own VPC and managed data stores — from
+`<stack>/examples/default/`, drop in a tfvars file and run `terraform apply`.
+Both stacks support a typed `proxy_config` input (mirrors `helm/litellm`'s
+`gateway.config.proxy_config`) and per-component extra env vars /
+secret-manager refs.
 
 ## Components
 
@@ -147,6 +163,39 @@ against the backend image:
 Run the migration job once after the first `terraform apply` and before the
 gateway/backend services start serving traffic.
 
+## Feature parity between stacks
+
+The two modules expose the same conceptual surface; concrete inputs differ
+only where the underlying cloud forces it.
+
+| Capability                       | AWS input(s)                                            | GCP input(s)                                              |
+| -------------------------------- | ------------------------------------------------------- | --------------------------------------------------------- |
+| Tenant + env naming              | `tenant`, `env`                                         | `tenant`, `env`                                           |
+| Pre-shared master key / license  | `litellm_master_key`, `litellm_license`                 | `litellm_master_key`, `litellm_license`                   |
+| UI admin password                | `ui_password`                                           | `ui_password`                                             |
+| Per-deployment tags / labels     | `tags` (`map(string)`)                                  | `labels` (`map(string)`)                                  |
+| TLS posture                      | `acm_certificate_arn`, `allow_plaintext_alb`            | `lb_domains`, `allow_plaintext_lb`                        |
+| Force destroy of object store    | `s3_force_destroy`                                      | `gcs_force_destroy`                                       |
+| Database deletion protection     | `skip_final_snapshot`                                   | `cloudsql_deletion_protection`                            |
+| `proxy_config` (typed YAML map)  | `proxy_config`                                          | `proxy_config`                                            |
+| Extra plain env per component    | `gateway_extra_env`, `backend_extra_env`                | `gateway_extra_env`, `backend_extra_env`                  |
+| Extra secret-backed env          | `gateway_extra_secrets`, `backend_extra_secrets` (ARNs) | `gateway_extra_secrets`, `backend_extra_secrets` (resource IDs) |
+| Uvicorn `--workers` on gateway   | `gateway_num_workers`                                   | `gateway_num_workers`                                     |
+| OpenTelemetry v2 (opt-in)        | `otel_endpoint`, `otel_exporter`, `otel_environment_name`, `otel_capture_message_content`, `otel_headers_secret_arn` | `otel_endpoint`, `otel_exporter`, `otel_environment_name`, `otel_capture_message_content`, `otel_headers_secret` |
+
+Each module stamps its own stack-identity tag (`litellm:stack` on AWS,
+`litellm-stack` on GCP — GCP label keys forbid colons) plus
+`managed-by = "terraform"` onto every taggable / labelable resource and
+merges `var.tags` / `var.labels` on top. Provider `default_tags` on AWS
+merge on top of all of these.
+
+OTel is opt-in on both clouds: leave `otel_endpoint` empty and nothing
+OTel-related is added to the container env; set it and both gateway and
+backend get `LITELLM_OTEL_V2=true` plus the full `OTEL_*` block, with
+`OTEL_SERVICE_NAME` stamped per component
+(`<tenant>-litellm-<env>-gateway` and `-backend`). Any `OTEL_*` key set
+in `gateway_extra_env` / `backend_extra_env` wins for that service.
+
 ## What's not included
 
 - TLS certificates / custom domains. Both stacks expose plain-HTTP load
@@ -156,4 +205,46 @@ gateway/backend services start serving traffic.
   backend block to `versions.tf` when graduating to a team environment.
 - Observability beyond the cloud provider's defaults (CloudWatch logs on
   AWS, Cloud Logging on GCP). Wire your own Prometheus / Datadog / Langfuse
-  via the `*_extra_env` variables.
+  via the `*_extra_env` variables, or turn on OTel v2 (see the parity
+  table above).
+
+## HCP Terraform no-code (1-click) deploy
+
+Both stacks are publishable as no-code modules in HCP Terraform's private
+registry. The end-user flow is: open the no-code launch URL, fill in a
+few inputs, hit *Create workspace*, and HCP runs plan/apply against your
+cloud account using a variable-set of credentials (static keys or
+dynamic-credentials OIDC).
+
+Required overrides the launcher must supply per stack:
+
+- **AWS** (`terraform/litellm/aws`): `region`, `azs`, `tenant`, `env`.
+  The image vars (`gateway_image`, `backend_image`, `ui_image`,
+  `migrations_image`) can be left at their defaults — the GHCR images
+  are anonymous-readable and ECS Fargate pulls them without extra
+  credentials.
+
+- **GCP** (`terraform/litellm/gcp`): `project`, `tenant`, `env`, **and
+  one of**:
+  - `image_registry` pointed at an Artifact Registry **remote** repository
+    backed by `https://ghcr.io` (e.g.
+    `us-central1-docker.pkg.dev/<project>/litellm/berriai`), so Cloud Run
+    pulls the four upstream `litellm-*` images through it; or
+  - all four per-component `*_image` URIs pointing at images mirrored
+    into a regular Artifact Registry repo.
+
+  The defaults (`ghcr.io/berriai`) cause Cloud Run admission to reject
+  the service spec — Cloud Run only authenticates against Artifact
+  Registry, `[region.]gcr.io`, or `docker.io`. See
+  `terraform/litellm/gcp/README.md#image-pulls` for the
+  `gcloud artifacts repositories create … --mode=remote-repository`
+  command that sets up the passthrough repo (one-time, per project).
+
+What still requires a manual step regardless of HCP no-code:
+
+- The one-off migration task. The stacks auto-run it via `local-exec`
+  during `terraform apply`, but that requires the `aws` / `gcloud` CLI
+  on the runner. HCP-hosted runners don't have them; use an HCP agent
+  pool with a custom image that includes the relevant CLI, or run the
+  command printed in the `migration_run_command` output by hand after
+  the first apply.
diff --git a/terraform/litellm/aws/README.md b/terraform/litellm/aws/README.md
@@ -44,9 +44,12 @@ needs the `aws` CLI installed and authenticated.
 ### `proxy_config` (preferred)
 
 Mirrors the helm chart's `gateway.config.proxy_config`. The map is YAML-encoded
-and base64-passed to gateway, backend, and the migration task; each container
-decodes it to `/tmp/litellm-config.yaml` at startup and sets `CONFIG_FILE_PATH`
-to match.
+and uploaded to S3 (`config/litellm-config.yaml` in the stack's bucket); the
+gateway and backend container entrypoints download it to
+`/tmp/litellm-config.yaml` at task start via boto3 and set `CONFIG_FILE_PATH`
+to match. The S3 object's etag is wired into the task definition, so editing
+`proxy_config` produces a new task-def revision and a rolling redeploy of both
+services.
 
 ```hcl
 proxy_config = {
@@ -119,6 +122,42 @@ aws secretsmanager create-secret \
   --secret-string "sk-proj-..."
 ```
 
+### Observability (OpenTelemetry v2)
+
+OTel v2 (https://docs.litellm.ai/docs/observability/opentelemetry_v2) is
+opt-in and gated entirely on `otel_endpoint`. Empty (default) and nothing
+OTel-related is added to the container env. Set it and both gateway and
+backend gain `LITELLM_OTEL_V2=true` plus the `OTEL_*` block, with
+`OTEL_SERVICE_NAME` stamped per component (`${tenant}-litellm-${env}-gateway`
+and `-backend`) so spans land tagged with the right hop. Any `OTEL_*` key
+set in `gateway_extra_env` / `backend_extra_env` overrides the default for
+that service.
+
+```hcl
+otel_endpoint         = "http://otel-collector.internal:4318"
+otel_exporter         = "otlp_http"   # otlp_grpc, console
+otel_environment_name = "prod"        # defaults to var.env
+```
+
+For collectors that require an auth header, store the comma-separated
+`key=value` string in Secrets Manager and reference it via
+`otel_headers_secret_arn`. The execution role auto-gains
+`secretsmanager:GetSecretValue` on that ARN.
+
+```hcl
+otel_headers_secret_arn = "arn:aws:secretsmanager:us-west-2:111122223333:secret:honeycomb-otel-headers-AbCdEf"
+```
+
+`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` defaults to
+`no_content`; flip `otel_capture_message_content = "prompt_and_completion"`
+only after auditing what lands in the backend, since prompts and
+completions are typically sensitive.
+
+Vendor presets (Arize, Phoenix, Langfuse OTel, Weave, Langtrace, Levo,
+AgentOps) live under `proxy_config.litellm_settings.callbacks` and are
+orthogonal to the OTLP variables above; their credentials still go in
+`*_extra_secrets`.
+
 ## Tenant deployment
 
 Every resource the stack creates is named `${tenant}-litellm-${env}` (or
@@ -132,10 +171,11 @@ pair differs:
 | `acme`   | `prod`  | `acme-litellm-prod-master-key`     |
 | `globex` | `dev`   | `globex-litellm-dev-license`       |
 
-For a per-tenant instance, the only inputs that change are the tenant
-slug, env, and the two pre-issued secrets:
+For a per-tenant instance via the example root, the only inputs that
+change are the tenant slug, env, and the two pre-issued secrets:
 
 ```bash
+cd terraform/litellm/aws/examples/default
 export TF_VAR_litellm_master_key="sk-..."   # the tenant's master key
 export TF_VAR_litellm_license="lic-..."     # their LITELLM_LICENSE
 
@@ -146,6 +186,22 @@ terraform apply \
   -var "env=stage"
 ```
 
+To run *many* tenants from a single config, call the module with
+`for_each` instead of one root per tenant (see "Using as a module"):
+
+```hcl
+module "litellm" {
+  for_each = toset(["acme", "globex"])
+  source   = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
+  tenant   = each.key
+  env      = "prod"
+  region   = "us-west-2"
+  azs      = ["us-west-2a", "us-west-2b"]
+}
+```
+(This `for_each` form is only possible because the module declares no
+provider block — the original root-with-provider layout forbade it.)
+
 Both `litellm_master_key` and `litellm_license` are optional:
 - Omit `litellm_master_key` → the stack auto-generates a random `sk-…`
   value (trial/dev path).
@@ -159,14 +215,21 @@ example files.
 ## Quick start
 
 ```bash
-cd terraform/litellm/aws
+cd terraform/litellm/aws/examples/default
 cp terraform.tfvars.example terraform.tfvars
-# Edit: region, tenant, env, azs, *_image, proxy_config, gateway_extra_secrets.
+# Edit: region, tenant, env, azs, proxy_config, gateway_extra_secrets.
 
 terraform init
 terraform apply
 ```
 
+`examples/default/` is a thin root that configures the `aws` provider and
+calls the module (`../../`). It exposes a curated variable surface; for
+advanced knobs (per-component CPU/memory/workers, autoscaling, RDS/Redis
+sizing, per-component image pins) set them on the `module "litellm"` block
+in `examples/default/main.tf`, or call the module from your own config —
+see "Using as a module" below.
+
 That single apply provisions everything, runs the DB user bootstrap, runs the
 schema migration, and only then starts the gateway/backend services. When it
 returns, the stack is serving traffic.
@@ -179,6 +242,34 @@ aws secretsmanager get-secret-value \
   --query SecretString --output text
 ```
 
+## Using as a module
+
+The directory itself is a module with **no `provider` block** — the caller
+owns provider config. That means you can call it directly with `for_each`
+(many tenants from one config), `count` (conditional stacks), `depends_on`,
+an assume-role / aliased provider, etc.:
+
+```hcl
+provider "aws" {
+  region = "us-west-2"
+  assume_role { role_arn = "arn:aws:iam::111122223333:role/deployer" }
+}
+
+module "litellm" {
+  source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
+
+  region = "us-west-2"
+  tenant = "acme"
+  env    = "prod"
+  azs    = ["us-west-2a", "us-west-2b"]
+  # ...any of the inputs in variables.tf...
+}
+```
+
+Tags: the module threads its own `litellm:stack` / `managed-by` / `var.tags`
+onto every taggable resource. Any `default_tags` on your provider merge on
+top — set org-wide tags there, per-deployment tags via the `tags` input.
+
 ## Image pulls
 
 The defaults pull from `ghcr.io/berriai/litellm-<component>:v1.86.0-dev`,
@@ -238,8 +329,8 @@ losing the contents.
 
 | File              | What's in it                                                          |
 | ----------------- | --------------------------------------------------------------------- |
-| `versions.tf`     | Terraform + provider version constraints                              |
-| `providers.tf`    | AWS provider (region + default tags)                                  |
+| `versions.tf`     | Terraform + `required_providers` constraints (module declares no provider config) |
+| `examples/default/` | Thin root: `aws` provider (with an optional `default_tags` slot for org-wide tags) + a call to the module. The one-command deploy path. |
 | `variables.tf`    | All input variables                                                   |
 | `locals.tf`       | Path-prefix lists for ALB routing (mirror of `helm/.../ingress.yaml`) |
 | `network.tf`      | VPC, subnets, IGW, NAT, route tables, security groups                 |

diff --git a/terraform/litellm/aws/alb.tf b/terraform/litellm/aws/alb.tf
@@ -6,6 +6,8 @@ resource "aws_lb" "this" {
   subnets            = aws_subnet.public[*].id
 
   idle_timeout = 120
+
+  tags = local.tags
 }
 
 locals {
@@ -35,6 +37,8 @@ resource "aws_lb_target_group" "gateway" {
   }
 
   deregistration_delay = 30
+
+  tags = local.tags
 }
 
 resource "aws_lb_target_group" "backend" {
@@ -54,6 +58,8 @@ resource "aws_lb_target_group" "backend" {
   }
 
   deregistration_delay = 30
+
+  tags = local.tags
 }
 
 resource "aws_lb_target_group" "ui" {
@@ -73,6 +79,8 @@ resource "aws_lb_target_group" "ui" {
   }
 
   deregistration_delay = 30
+
+  tags = local.tags
 }
 
 # HTTP listener. When TLS is enabled this only serves a permanent
@@ -106,6 +114,8 @@ resource "aws_lb_listener" "http" {
       error_message = "ALB has no HTTPS listener. Either set `acm_certificate_arn` to enable TLS, or set `allow_plaintext_alb = true` to opt into HTTP-only (trial / dev only)."
     }
   }
+
+  tags = local.tags
 }
 
 # HTTPS listener. Only created when an ACM cert ARN is supplied — terminates
@@ -122,6 +132,8 @@ resource "aws_lb_listener" "https" {
     type             = "forward"
     target_group_arn = aws_lb_target_group.backend.arn
   }
+
+  tags = local.tags
 }
 
 # UI exact paths (/, /favicon.ico, /ui) — priority 10.
@@ -139,6 +151,8 @@ resource "aws_lb_listener_rule" "ui_exact" {
       values = local.ui_exact_paths
     }
   }
+
+  tags = local.tags
 }
 
 # UI prefix paths (/_next/*, /litellm-asset-prefix/*, /assets/*, /ui/*) — priority 20.
@@ -156,6 +170,8 @@ resource "aws_lb_listener_rule" "ui_prefix" {
       values = local.ui_path_prefixes
     }
   }
+
+  tags = local.tags
 }
 
 # Gateway prefix rules — one per chunk-of-5 because ALB caps a path-pattern
@@ -176,4 +192,6 @@ resource "aws_lb_listener_rule" "gateway" {
       values = each.value
     }
   }
+
+  tags = local.tags
 }