Text Embeddings API on Docker

Docker image to run a self-hosted text embeddings server, powered by Hugging Face Text Embeddings Inference (TEI). Provides an OpenAI-compatible /v1/embeddings API. Designed to be simple, private, and self-hosted.

Features:

OpenAI-compatible POST /v1/embeddings endpoint — any app using the OpenAI embeddings API switches with a one-line change
Powered by Hugging Face TEI — a high-performance Rust-based embeddings server
Supports popular embedding models: BAAI/bge-small-en-v1.5, BAAI/bge-m3, nomic-embed-text-v1.5 and more
Model management via a helper script (embed_manage)
Text data stays on your server — no data sent to third parties
Offline/air-gapped mode — run without internet access using pre-cached models (EMBED_LOCAL_ONLY)
Automatically built and published via GitHub Actions
Persistent model cache via a Docker volume
Supported platform: linux/amd64

Also available:

AI/Audio: Whisper (STT), Kokoro (TTS), LiteLLM, Ollama (LLM)
VPN: WireGuard, OpenVPN, IPsec VPN, Headscale
Tools: MCP Gateway

Tip: Whisper, Kokoro, Embeddings, LiteLLM, Ollama, and MCP Gateway can be used together to build a complete, self-hosted AI stack on your own server. See Docker AI Stack for ready-made configurations and pipeline examples.

Quick start

Use this command to set up a text embeddings server:

docker run \
    --name embeddings \
    --restart=always \
    -v embeddings-data:/var/lib/embeddings \
    -p 8000:8000 \
    -d hwdsl2/embeddings-server

Note: For internet-facing deployments, using a reverse proxy to add HTTPS is strongly recommended. In that case, also replace -p 8000:8000 with -p 127.0.0.1:8000:8000 in the docker run command above, to prevent direct access to the unencrypted port. Set EMBED_API_KEY in your env file when the server is accessible from the public internet.

The default model BAAI/bge-small-en-v1.5 (~130 MB) is downloaded and cached on first start. Check the logs to confirm the server is ready:

docker logs embeddings

Once you see "Text embeddings server is ready", generate your first embeddings:

curl http://your_server_ip:8000/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{"input": "The quick brown fox", "model": "text-embedding-ada-002"}'

Response:

{"object":"list","data":[{"object":"embedding","embedding":[0.032,...,-0.017],"index":0}],"model":"BAAI/bge-small-en-v1.5","usage":{"prompt_tokens":5,"total_tokens":5}}

Requirements

A Linux server (local or cloud) with Docker installed
Supported architecture: amd64 (x86_64)
Minimum RAM: ~250 MB free for the default BAAI/bge-small-en-v1.5 model (see model table)
Internet access for the initial model download (the model is cached locally afterwards). Not required if using EMBED_LOCAL_ONLY=true with pre-cached models.

For internet-facing deployments, see Using a reverse proxy to add HTTPS.

Download

Get the trusted build from the Docker Hub registry:

docker pull hwdsl2/embeddings-server

Alternatively, you may download from Quay.io:

docker pull quay.io/hwdsl2/embeddings-server
docker image tag quay.io/hwdsl2/embeddings-server hwdsl2/embeddings-server

Supported platform: linux/amd64.

Environment variables

All variables are optional. Set EMBED_API_KEY to enable Bearer token authentication.

This Docker image uses the following variables, that can be declared in an env file (see example):

Variable	Description	Default
`EMBED_MODEL`	HuggingFace model ID to use for embeddings. See model table for options.	`BAAI/bge-small-en-v1.5`
`EMBED_PORT`	HTTP port for the API (1–65535).	`8000`
`EMBED_API_KEY`	Optional Bearer token. If set, all API requests must include `Authorization: Bearer <key>`.	(not set)
`EMBED_HF_TOKEN`	HuggingFace Hub token for accessing private or gated models. Not required for public models.	(not set)
`EMBED_LOCAL_ONLY`	When set to any non-empty value (e.g. `true`), disables all HuggingFace model downloads. For offline or air-gapped deployments with pre-cached models.	(not set)

Note: In your env file, you may enclose values in single quotes, e.g. VAR='value'. Do not add spaces around =. If you change EMBED_PORT, update the -p flag in the docker run command accordingly.

Example using an env file:

cp embed.env.example embed.env
# Edit embed.env with your settings, then:
docker run \
    --name embeddings \
    --restart=always \
    -v embeddings-data:/var/lib/embeddings \
    -v ./embed.env:/embed.env:ro \
    -p 8000:8000 \
    -d hwdsl2/embeddings-server

The env file is bind-mounted into the container, so changes are picked up on every restart without recreating the container.

Alternatively, pass it with --env-file

docker run \
    --name embeddings \
    --restart=always \
    -v embeddings-data:/var/lib/embeddings \
    -p 8000:8000 \
    --env-file=embed.env \
    -d hwdsl2/embeddings-server

Using docker-compose

cp embed.env.example embed.env
# Edit embed.env as needed, then:
docker compose up -d
docker logs embeddings

Example docker-compose.yml (already included):

services:
  embeddings:
    image: hwdsl2/embeddings-server
    container_name: embeddings
    restart: always
    ports:
      - "8000:8000/tcp"  # For a host-based reverse proxy, change to "127.0.0.1:8000:8000/tcp"
    volumes:
      - embeddings-data:/var/lib/embeddings
      - ./embed.env:/embed.env:ro

volumes:
  embeddings-data:

Note: For internet-facing deployments, using a reverse proxy to add HTTPS is strongly recommended. In that case, also change "8000:8000/tcp" to "127.0.0.1:8000:8000/tcp" in docker-compose.yml, to prevent direct access to the unencrypted port. Set EMBED_API_KEY in your env file when the server is accessible from the public internet.

API reference

The API is compatible with OpenAI's embeddings endpoint. Any application already calling https://api.openai.com/v1/embeddings can switch to self-hosted by setting:

OPENAI_BASE_URL=http://your_server_ip:8000

Generate embeddings

POST /v1/embeddings
Content-Type: application/json

Parameters:

Parameter	Type	Required	Description
`input`	string or array	✅	Text to embed. Pass a string for a single input or an array of strings for batch embedding.
`model`	string	✅	Pass any string (e.g. `text-embedding-ada-002`). The value is accepted for API compatibility; the active model set by `EMBED_MODEL` is always used.

Example — single input:

curl http://your_server_ip:8000/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{"input": "The quick brown fox", "model": "text-embedding-ada-002"}'

Example — batch input:

curl http://your_server_ip:8000/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{"input": ["First sentence", "Second sentence"], "model": "text-embedding-ada-002"}'

With API key authentication:

curl http://your_server_ip:8000/v1/embeddings \
    -H "Authorization: Bearer your_api_key" \
    -H "Content-Type: application/json" \
    -d '{"input": "Your text here", "model": "text-embedding-ada-002"}'

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.032, -0.018, ...],
      "index": 0
    }
  ],
  "model": "BAAI/bge-small-en-v1.5",
  "usage": { "prompt_tokens": 5, "total_tokens": 5 }
}

Model info

GET /info

Returns the active model ID, maximum input length, and server version.

curl http://your_server_ip:8000/info

Interactive API docs

An interactive Swagger UI is available at:

http://your_server_ip:8000/docs

Persistent data

All server data is stored in the Docker volume (/var/lib/embeddings inside the container):

/var/lib/embeddings/
├── models--BAAI--bge-small-en-v1.5/   # Cached model files (downloaded from HuggingFace)
├── .port                # Active port (used by embed_manage)
├── .model               # Active model ID (used by embed_manage)
└── .server_addr         # Cached server IP (used by embed_manage)

Back up the Docker volume to preserve downloaded models. Models range from ~90 MB to ~1.3 GB and are only downloaded once; preserving the volume avoids re-downloading on container recreation.

Managing the server

Use embed_manage inside the running container to inspect and manage the server.

Show server info:

docker exec embeddings embed_manage --showinfo

List recommended models:

docker exec embeddings embed_manage --listmodels

Pre-download a model:

docker exec embeddings embed_manage --pullmodel BAAI/bge-base-en-v1.5

Switching the model

To change the active model:

(Optional but recommended) Pre-download the new model while the server is running:
```
docker exec embeddings embed_manage --pullmodel BAAI/bge-base-en-v1.5
```
Update EMBED_MODEL in your embed.env file (or add -e EMBED_MODEL=BAAI/bge-base-en-v1.5 to your docker run command).
Restart the container:
```
docker restart embeddings
```

Recommended models:

Model	Disk	RAM (approx)	Notes
`BAAI/bge-small-en-v1.5`	~130 MB	~250 MB	Fastest; English — default
`BAAI/bge-base-en-v1.5`	~440 MB	~700 MB	Good balance; English
`BAAI/bge-large-en-v1.5`	~1.3 GB	~2 GB	High accuracy; English
`BAAI/bge-m3`	~570 MB	~1 GB	Multilingual; cross-lingual retrieval
`nomic-ai/nomic-embed-text-v1.5`	~550 MB	~1 GB	Multilingual; long context (8192 tokens)
`sentence-transformers/all-MiniLM-L6-v2`	~90 MB	~200 MB	Very small; fast; popular for semantic search

Tip: BAAI/bge-m3 and nomic-ai/nomic-embed-text-v1.5 are recommended for non-English or multilingual workloads. For English RAG pipelines, BAAI/bge-base-en-v1.5 offers a good accuracy-to-resource balance.

Models are cached in the /var/lib/embeddings Docker volume and only downloaded once. Any HuggingFace model supported by TEI can be used — see the TEI supported models list.

Using a reverse proxy

For internet-facing deployments, place a reverse proxy in front of the embeddings server to handle HTTPS termination. The server works without HTTPS on a local or trusted network, but HTTPS is recommended when the API endpoint is exposed to the internet.

Use one of the following addresses to reach the embeddings container from your reverse proxy:

embeddings:8000 — if your reverse proxy runs as a container in the same Docker network as the embeddings server (e.g. defined in the same docker-compose.yml).
127.0.0.1:8000 — if your reverse proxy runs on the host and port 8000 is published (the default docker-compose.yml publishes it).

Example with Caddy (Docker image) (automatic TLS via Let's Encrypt, reverse proxy in the same Docker network):

Caddyfile:

embeddings.example.com {
  reverse_proxy embeddings:8000
}

Example with nginx (reverse proxy on the host):

server {
    listen 443 ssl;
    server_name embeddings.example.com;

    ssl_certificate     /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    location / {
        proxy_pass         http://127.0.0.1:8000;
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;
        proxy_read_timeout 120s;
    }
}

Set EMBED_API_KEY in your env file when the server is accessible from the public internet.

Update Docker image

To update the Docker image and container, first download the latest version:

docker pull hwdsl2/embeddings-server

If the Docker image is already up to date, you should see:

Status: Image is up to date for hwdsl2/embeddings-server:latest

Otherwise, it will download the latest version. Remove and re-create the container:

docker rm -f embeddings
# Then re-run the docker run command from Quick start with the same volume and port.

Your downloaded models are preserved in the embeddings-data volume.

Using with other AI services

The Whisper (STT), Embeddings, LiteLLM, Kokoro (TTS), Ollama (LLM), and MCP Gateway images can be combined to build a complete, self-hosted AI stack on your own server — from semantic document search and RAG to full voice I/O. Whisper, Kokoro, and Embeddings run fully locally. Ollama runs all LLM inference locally, so no data is sent to third parties. When using LiteLLM with external providers (e.g., OpenAI, Anthropic), your data will be sent to those providers.

Service	Role	Default port
Embeddings	Converts text to vectors for semantic search and RAG	`8000`
Whisper (STT)	Transcribes spoken audio to text	`9000`
LiteLLM	AI gateway — routes requests to Ollama, OpenAI, Anthropic, and 100+ providers	`4000`
Kokoro (TTS)	Converts text to natural-sounding speech	`8880`
Ollama (LLM)	Runs local LLM models (llama3, qwen, mistral, etc.)	`11434`
MCP Gateway	Exposes AI services as MCP tools for AI assistants (Claude, Cursor, etc.)	`3000`

See also: Docker AI Stack — ready-made docker-compose configurations and pipeline examples. Learn more about deploying the full AI stack.

Technical details

Base image: ghcr.io/huggingface/text-embeddings-inference:cpu-latest (Debian)
Embeddings engine: Hugging Face TEI (Rust-based, high-performance)
API: OpenAI-compatible /v1/embeddings endpoint (served directly by TEI)
Data directory: /var/lib/embeddings (Docker volume)
Model storage: HuggingFace Hub format inside the volume — downloaded once, reused on restarts
Model management: Python (huggingface_hub) for pre-download via embed_manage --pullmodel

License

Note: The software components inside the pre-built image (such as Hugging Face TEI and its dependencies) are under the respective licenses chosen by their respective copyright holders. As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

Hugging Face Text Embeddings Inference (TEI) is Copyright (C) Hugging Face, Inc., and is distributed under the Apache License 2.0.

This project is an independent Docker setup for Hugging Face TEI and is not affiliated with, endorsed by, or sponsored by Hugging Face, Inc.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
docs/images		docs/images
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README-ru.md		README-ru.md
README-zh-Hant.md		README-zh-Hant.md
README-zh.md		README-zh.md
README.md		README.md
docker-compose.yml		docker-compose.yml
embed.env.example		embed.env.example
manage.sh		manage.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Embeddings API on Docker

Quick start

Requirements

Download

Environment variables

Using docker-compose

API reference

Generate embeddings

Model info

Interactive API docs

Persistent data

Managing the server

Switching the model

Using a reverse proxy

Update Docker image

Using with other AI services

Technical details

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Embeddings API on Docker

Quick start

Requirements

Download

Environment variables

Using docker-compose

API reference

Generate embeddings

Model info

Interactive API docs

Persistent data

Managing the server

Switching the model

Using a reverse proxy

Update Docker image

Using with other AI services

Technical details

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages