Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 99 additions & 8 deletions terraform/litellm/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,34 @@
# LiteLLM Terraform stacks

Two self-contained Terraform root modules that deploy the **componentized**
LiteLLM proxy — the gateway, backend, and UI as three independent containers
(see `helm/litellm/` for the canonical chart with the same split).
Two self-contained, reusable Terraform **modules** that deploy the
**componentized** LiteLLM proxy — the gateway, backend, and UI as three
independent containers (see `helm/litellm/` for the canonical chart with the
same split).

Each module declares **no `provider` block of its own**, so it can be called
with `count` / `for_each` / `depends_on` and the caller controls region,
assume-role / impersonation, aliases, and `default_tags`. A ready-to-run root
that wires the provider lives at `<stack>/examples/default/` — that's the
one-command deploy path. To embed a stack in your own config, call the module
by source:

```hcl
module "litellm" {
source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
# ... inputs ...
}
```

| Stack | Compute | Database (writer + reader) | Cache | Object store | Public entrypoint |
| ------ | ----------- | ---------------------------------- | ----------- | ------------ | ------------------ |
| `aws/` | ECS Fargate | Aurora Postgres (IAM auth) | ElastiCache | S3 | Application LB |
| `gcp/` | Cloud Run | Cloud SQL Postgres (password auth) | Memorystore | GCS | External HTTPS LB |

Each stack creates its own VPC and managed data stores — drop in a tfvars
file and run `terraform apply`. Both stacks support a typed `proxy_config`
input (mirrors `helm/litellm`'s `gateway.config.proxy_config`) and per-component
extra env vars / secret-manager refs.
Each stack creates its own VPC and managed data stores — from
`<stack>/examples/default/`, drop in a tfvars file and run `terraform apply`.
Both stacks support a typed `proxy_config` input (mirrors `helm/litellm`'s
`gateway.config.proxy_config`) and per-component extra env vars /
secret-manager refs.

## Components

Expand Down Expand Up @@ -147,6 +163,39 @@ against the backend image:
Run the migration job once after the first `terraform apply` and before the
gateway/backend services start serving traffic.

## Feature parity between stacks

The two modules expose the same conceptual surface; concrete inputs differ
only where the underlying cloud forces it.

| Capability | AWS input(s) | GCP input(s) |
| -------------------------------- | ------------------------------------------------------- | --------------------------------------------------------- |
| Tenant + env naming | `tenant`, `env` | `tenant`, `env` |
| Pre-shared master key / license | `litellm_master_key`, `litellm_license` | `litellm_master_key`, `litellm_license` |
| UI admin password | `ui_password` | `ui_password` |
| Per-deployment tags / labels | `tags` (`map(string)`) | `labels` (`map(string)`) |
| TLS posture | `acm_certificate_arn`, `allow_plaintext_alb` | `lb_domains`, `allow_plaintext_lb` |
| Force destroy of object store | `s3_force_destroy` | `gcs_force_destroy` |
| Database deletion protection | `skip_final_snapshot` | `cloudsql_deletion_protection` |
| `proxy_config` (typed YAML map) | `proxy_config` | `proxy_config` |
| Extra plain env per component | `gateway_extra_env`, `backend_extra_env` | `gateway_extra_env`, `backend_extra_env` |
| Extra secret-backed env | `gateway_extra_secrets`, `backend_extra_secrets` (ARNs) | `gateway_extra_secrets`, `backend_extra_secrets` (resource IDs) |
| Uvicorn `--workers` on gateway | `gateway_num_workers` | `gateway_num_workers` |
| OpenTelemetry v2 (opt-in) | `otel_endpoint`, `otel_exporter`, `otel_environment_name`, `otel_capture_message_content`, `otel_headers_secret_arn` | `otel_endpoint`, `otel_exporter`, `otel_environment_name`, `otel_capture_message_content`, `otel_headers_secret` |

Each module stamps its own stack-identity tag (`litellm:stack` on AWS,
`litellm-stack` on GCP — GCP label keys forbid colons) plus
`managed-by = "terraform"` onto every taggable / labelable resource and
merges `var.tags` / `var.labels` on top. Provider `default_tags` on AWS
merge on top of all of these.

OTel is opt-in on both clouds: leave `otel_endpoint` empty and nothing
OTel-related is added to the container env; set it and both gateway and
backend get `LITELLM_OTEL_V2=true` plus the full `OTEL_*` block, with
`OTEL_SERVICE_NAME` stamped per component
(`<tenant>-litellm-<env>-gateway` and `-backend`). Any `OTEL_*` key set
in `gateway_extra_env` / `backend_extra_env` wins for that service.

## What's not included

- TLS certificates / custom domains. Both stacks expose plain-HTTP load
Expand All @@ -156,4 +205,46 @@ gateway/backend services start serving traffic.
backend block to `versions.tf` when graduating to a team environment.
- Observability beyond the cloud provider's defaults (CloudWatch logs on
AWS, Cloud Logging on GCP). Wire your own Prometheus / Datadog / Langfuse
via the `*_extra_env` variables.
via the `*_extra_env` variables, or turn on OTel v2 (see the parity
table above).

## HCP Terraform no-code (1-click) deploy

Both stacks are publishable as no-code modules in HCP Terraform's private
registry. The end-user flow is: open the no-code launch URL, fill in a
few inputs, hit *Create workspace*, and HCP runs plan/apply against your
cloud account using a variable-set of credentials (static keys or
dynamic-credentials OIDC).

Required overrides the launcher must supply per stack:

- **AWS** (`terraform/litellm/aws`): `region`, `azs`, `tenant`, `env`.
The image vars (`gateway_image`, `backend_image`, `ui_image`,
`migrations_image`) can be left at their defaults — the GHCR images
are anonymous-readable and ECS Fargate pulls them without extra
credentials.

- **GCP** (`terraform/litellm/gcp`): `project`, `tenant`, `env`, **and
one of**:
- `image_registry` pointed at an Artifact Registry **remote** repository
backed by `https://ghcr.io` (e.g.
`us-central1-docker.pkg.dev/<project>/litellm/berriai`), so Cloud Run
pulls the four upstream `litellm-*` images through it; or
- all four per-component `*_image` URIs pointing at images mirrored
into a regular Artifact Registry repo.

The defaults (`ghcr.io/berriai`) cause Cloud Run admission to reject
the service spec — Cloud Run only authenticates against Artifact
Registry, `[region.]gcr.io`, or `docker.io`. See
`terraform/litellm/gcp/README.md#image-pulls` for the
`gcloud artifacts repositories create … --mode=remote-repository`
command that sets up the passthrough repo (one-time, per project).

What still requires a manual step regardless of HCP no-code:

- The one-off migration task. The stacks auto-run it via `local-exec`
during `terraform apply`, but that requires the `aws` / `gcloud` CLI
on the runner. HCP-hosted runners don't have them; use an HCP agent
pool with a custom image that includes the relevant CLI, or run the
command printed in the `migration_run_command` output by hand after
the first apply.
109 changes: 100 additions & 9 deletions terraform/litellm/aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,12 @@ needs the `aws` CLI installed and authenticated.
### `proxy_config` (preferred)

Mirrors the helm chart's `gateway.config.proxy_config`. The map is YAML-encoded
and base64-passed to gateway, backend, and the migration task; each container
decodes it to `/tmp/litellm-config.yaml` at startup and sets `CONFIG_FILE_PATH`
to match.
and uploaded to S3 (`config/litellm-config.yaml` in the stack's bucket); the
gateway and backend container entrypoints download it to
`/tmp/litellm-config.yaml` at task start via boto3 and set `CONFIG_FILE_PATH`
to match. The S3 object's etag is wired into the task definition, so editing
`proxy_config` produces a new task-def revision and a rolling redeploy of both
services.

```hcl
proxy_config = {
Expand Down Expand Up @@ -119,6 +122,42 @@ aws secretsmanager create-secret \
--secret-string "sk-proj-..."
```

### Observability (OpenTelemetry v2)

OTel v2 (https://docs.litellm.ai/docs/observability/opentelemetry_v2) is
opt-in and gated entirely on `otel_endpoint`. Empty (default) and nothing
OTel-related is added to the container env. Set it and both gateway and
backend gain `LITELLM_OTEL_V2=true` plus the `OTEL_*` block, with
`OTEL_SERVICE_NAME` stamped per component (`${tenant}-litellm-${env}-gateway`
and `-backend`) so spans land tagged with the right hop. Any `OTEL_*` key
set in `gateway_extra_env` / `backend_extra_env` overrides the default for
that service.

```hcl
otel_endpoint = "http://otel-collector.internal:4318"
otel_exporter = "otlp_http" # otlp_grpc, console
otel_environment_name = "prod" # defaults to var.env
```

For collectors that require an auth header, store the comma-separated
`key=value` string in Secrets Manager and reference it via
`otel_headers_secret_arn`. The execution role auto-gains
`secretsmanager:GetSecretValue` on that ARN.

```hcl
otel_headers_secret_arn = "arn:aws:secretsmanager:us-west-2:111122223333:secret:honeycomb-otel-headers-AbCdEf"
```

`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` defaults to
`no_content`; flip `otel_capture_message_content = "prompt_and_completion"`
only after auditing what lands in the backend, since prompts and
completions are typically sensitive.

Vendor presets (Arize, Phoenix, Langfuse OTel, Weave, Langtrace, Levo,
AgentOps) live under `proxy_config.litellm_settings.callbacks` and are
orthogonal to the OTLP variables above; their credentials still go in
`*_extra_secrets`.

## Tenant deployment

Every resource the stack creates is named `${tenant}-litellm-${env}` (or
Expand All @@ -132,10 +171,11 @@ pair differs:
| `acme` | `prod` | `acme-litellm-prod-master-key` |
| `globex` | `dev` | `globex-litellm-dev-license` |

For a per-tenant instance, the only inputs that change are the tenant
slug, env, and the two pre-issued secrets:
For a per-tenant instance via the example root, the only inputs that
change are the tenant slug, env, and the two pre-issued secrets:

```bash
cd terraform/litellm/aws/examples/default
export TF_VAR_litellm_master_key="sk-..." # the tenant's master key
export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE

Expand All @@ -146,6 +186,22 @@ terraform apply \
-var "env=stage"
```

To run *many* tenants from a single config, call the module with
`for_each` instead of one root per tenant (see "Using as a module"):

```hcl
module "litellm" {
for_each = toset(["acme", "globex"])
source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
tenant = each.key
env = "prod"
region = "us-west-2"
azs = ["us-west-2a", "us-west-2b"]
}
```
(This `for_each` form is only possible because the module declares no
provider block — the original root-with-provider layout forbade it.)

Both `litellm_master_key` and `litellm_license` are optional:
- Omit `litellm_master_key` → the stack auto-generates a random `sk-…`
value (trial/dev path).
Expand All @@ -159,14 +215,21 @@ example files.
## Quick start

```bash
cd terraform/litellm/aws
cd terraform/litellm/aws/examples/default
cp terraform.tfvars.example terraform.tfvars
# Edit: region, tenant, env, azs, *_image, proxy_config, gateway_extra_secrets.
# Edit: region, tenant, env, azs, proxy_config, gateway_extra_secrets.

terraform init
terraform apply
```
Comment thread
greptile-apps[bot] marked this conversation as resolved.

`examples/default/` is a thin root that configures the `aws` provider and
calls the module (`../../`). It exposes a curated variable surface; for
advanced knobs (per-component CPU/memory/workers, autoscaling, RDS/Redis
sizing, per-component image pins) set them on the `module "litellm"` block
in `examples/default/main.tf`, or call the module from your own config —
see "Using as a module" below.

That single apply provisions everything, runs the DB user bootstrap, runs the
schema migration, and only then starts the gateway/backend services. When it
returns, the stack is serving traffic.
Expand All @@ -179,6 +242,34 @@ aws secretsmanager get-secret-value \
--query SecretString --output text
```

## Using as a module

The directory itself is a module with **no `provider` block** — the caller
owns provider config. That means you can call it directly with `for_each`
(many tenants from one config), `count` (conditional stacks), `depends_on`,
an assume-role / aliased provider, etc.:

```hcl
provider "aws" {
region = "us-west-2"
assume_role { role_arn = "arn:aws:iam::111122223333:role/deployer" }
}

module "litellm" {
source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"

region = "us-west-2"
tenant = "acme"
env = "prod"
azs = ["us-west-2a", "us-west-2b"]
# ...any of the inputs in variables.tf...
}
```

Tags: the module threads its own `litellm:stack` / `managed-by` / `var.tags`
onto every taggable resource. Any `default_tags` on your provider merge on
top — set org-wide tags there, per-deployment tags via the `tags` input.

## Image pulls

The defaults pull from `ghcr.io/berriai/litellm-<component>:v1.86.0-dev`,
Expand Down Expand Up @@ -238,8 +329,8 @@ losing the contents.

| File | What's in it |
| ----------------- | --------------------------------------------------------------------- |
| `versions.tf` | Terraform + provider version constraints |
| `providers.tf` | AWS provider (region + default tags) |
| `versions.tf` | Terraform + `required_providers` constraints (module declares no provider config) |
| `examples/default/` | Thin root: `aws` provider (with an optional `default_tags` slot for org-wide tags) + a call to the module. The one-command deploy path. |
| `variables.tf` | All input variables |
| `locals.tf` | Path-prefix lists for ALB routing (mirror of `helm/.../ingress.yaml`) |
| `network.tf` | VPC, subnets, IGW, NAT, route tables, security groups |
Expand Down
18 changes: 18 additions & 0 deletions terraform/litellm/aws/alb.tf
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ resource "aws_lb" "this" {
subnets = aws_subnet.public[*].id

idle_timeout = 120

tags = local.tags
}

locals {
Expand Down Expand Up @@ -35,6 +37,8 @@ resource "aws_lb_target_group" "gateway" {
}

deregistration_delay = 30

tags = local.tags
}

resource "aws_lb_target_group" "backend" {
Expand All @@ -54,6 +58,8 @@ resource "aws_lb_target_group" "backend" {
}

deregistration_delay = 30

tags = local.tags
}

resource "aws_lb_target_group" "ui" {
Expand All @@ -73,6 +79,8 @@ resource "aws_lb_target_group" "ui" {
}

deregistration_delay = 30

tags = local.tags
}

# HTTP listener. When TLS is enabled this only serves a permanent
Expand Down Expand Up @@ -106,6 +114,8 @@ resource "aws_lb_listener" "http" {
error_message = "ALB has no HTTPS listener. Either set `acm_certificate_arn` to enable TLS, or set `allow_plaintext_alb = true` to opt into HTTP-only (trial / dev only)."
}
}

tags = local.tags
}

# HTTPS listener. Only created when an ACM cert ARN is supplied — terminates
Expand All @@ -122,6 +132,8 @@ resource "aws_lb_listener" "https" {
type = "forward"
target_group_arn = aws_lb_target_group.backend.arn
}

tags = local.tags
}

# UI exact paths (/, /favicon.ico, /ui) — priority 10.
Expand All @@ -139,6 +151,8 @@ resource "aws_lb_listener_rule" "ui_exact" {
values = local.ui_exact_paths
}
}

tags = local.tags
}

# UI prefix paths (/_next/*, /litellm-asset-prefix/*, /assets/*, /ui/*) — priority 20.
Expand All @@ -156,6 +170,8 @@ resource "aws_lb_listener_rule" "ui_prefix" {
values = local.ui_path_prefixes
}
}

tags = local.tags
}

# Gateway prefix rules — one per chunk-of-5 because ALB caps a path-pattern
Expand All @@ -176,4 +192,6 @@ resource "aws_lb_listener_rule" "gateway" {
values = each.value
}
}

tags = local.tags
}
Loading
Loading