From the repo root, merge the base stack with the Windows CUDA llmservice file. Use the optional third file for the Qwen3.5 35B A3B GGUF preset; omit it for the default 9B chat weights (see README-llmservice.md).
9B (default chat GGUF + registry qwen3.5-9b-chat):
docker compose -f docker-compose.yml -f docker-compose.llmservice.windows.yml up -d --build35B (overlay overrides llmservice-chat paths and tuning; use registry id qwen3.5-35b-a3b-chat in clients):
docker compose -f docker-compose.yml -f docker-compose.llmservice.windows.yml -f docker-compose.llmservice.windows.chat-35b.yml up -d --buildAfter switching presets, recreate llmservice-chat if the stack is already running (same -f list as above), e.g. ... up -d --force-recreate llmservice-chat. Rebuild llmservice-gateway when gateway code or models/registry/models.yaml schema expectations change.
Compose stack for Qdrant, Qdrant MCP (SSE), a .NET-focused MCP (streamable HTTP), plus the reference filesystem and git MCP images. A small debug container (Alpine + curl + bash) is included for probing the network.
TypeScript CLI in [tools/file-indexer/](tools/file-indexer/README.md): add **indexer.config.json** to any repo, **watch** files with allow/deny globs and ignore files, run hook scripts, **sync-services** (Qdrant collection + mcp-router TOML), and **mcp status / mcp call** against the Compose services.
This workspace already has **[indexer.config.json](indexer.config.json)** at the repo root: Markdown and source files (see glob in config) run [tools/file-indexer/examples/openai-embed-qdrant.ts](tools/file-indexer/examples/openai-embed-qdrant.ts) via npx tsx to embed into Qdrant collection **INDEXER_QDRANT_COLLECTION** (default code-index, separate from **mcp-memories** / MiniLM). Set **OPENAI_API_KEY** and related vars in [.env](env.example), or point **OPENAI_BASE_URL** at a local OpenAI-compatible server (e.g. Ollama, or the Compose **openai-compat-stub** on port 1234) if your org blocks cloud embedding APIs. **Dockerfile** changes run [scripts/dockerfile-compose-rebuild.mjs](scripts/dockerfile-compose-rebuild.mjs) (see INDEXER_COMPOSE_* in [env.example](env.example)). From here:
cd C:\repos\assistants
npm install
npm run indexer:watch # watch + hooks
npm run indexer:sync # Qdrant collection + write orchestration/mcp-router.fragment.toml
npm run indexer:status # probe Qdrant + MCP URLs from configOther repos (or a custom root):
npm run file-indexer:build
npm exec -w file-indexer -- file-indexer init --root C:\path\to\any\repo
npm exec -w file-indexer -- file-indexer watch --root C:\path\to\any\repoFor a global file-indexer command: npm install -g .\tools\file-indexer (after the build step), or put node C:\repos\assistants\tools\file-indexer\dist\cli.js in a shell alias.
Copy **[env.example](env.example)** to **.env** and keep **COMPOSE_PROJECT_NAME=mcp** so the Compose project name is **mcp** every time (not derived from the repo folder name). Container names use short names without the old **assistants-** prefix (e.g. **qdrant**, **mcp-debug**, **llmservice-gateway**). Networks and named volumes keep their existing names (**assistants-mcpnet**, **assistants-qdrant-storage**, …) so data is preserved. After this change, stop/remove old **assistants-*** containers if ports conflict, then **docker compose up** again.
| Service | Image / build | Ports (host) | Notes |
|---|---|---|---|
qdrant |
qdrant/qdrant |
6333, 6334 | Vector DB + gRPC |
qdrant-mcp |
./docker/qdrant-mcp |
8000 | SSE MCP → Qdrant |
dotnet-mcp |
./docker/dotnet-mcp |
8010 | dotnet restore/build/test in /workspace |
llmservice-gateway (+ chat / embed) |
./docker/llmservice-gateway, ./docker/llmservice-llamacpp; merge **docker-compose.llmservice.windows.yml** or **docker-compose.llmservice.mac.yml** |
8888 | Real llama.cpp OpenAI **/v1** — [README-llmservice.md](README-llmservice.md) |
mcp-filesystem |
mcp/filesystem:1.0.2 |
— (stdio) | Profile stdio |
mcp-git |
mcp/git:latest |
— (stdio) | Profile stdio |
debug |
./docker/debug |
— | docker compose exec debug bash (curl installed); repo ./debug mounted at **/debug** for validation scripts |
All application containers attach to the **mcpnet** bridge network.
- Qdrant:
http://localhost:6333(dashboard often athttp://localhost:6333/dashboard) - Qdrant MCP (SSE):
http://localhost:8000— use your client’s “remote MCP” / SSE URL (path is whatever the server prints on startup, often/sse). - .NET MCP (streamable HTTP):
http://localhost:8010/mcp— tools:dotnet_restore,dotnet_build,dotnet_test. Mount your repo under**./workspace** (or bind a host path there) so paths stay under/workspacein the container. - llmservice (optional):
**http://localhost:8888/v1** — local GGUF chat + embeddings via llama.cpp (does not use port 8000). See[README-llmservice.md](README-llmservice.md)and[docs/README.md](docs/README.md)(port map).
Put repositories and files you want exposed under **./workspace** so filesystem / git stdio containers see /workspace.
- Copy
**env.example** to**.env**at the repo root when you need to override defaults. - Qdrant and Qdrant MCP → Qdrant use plain HTTP on the Compose network (
http://qdrant:6333). No API key is configured by default.
Upstream mcp-server-qdrant only supports FastEmbed. By default this repo’s **docker compose build** downloads the same MiniLM tarball FastEmbed uses from Google Cloud Storage (qdrant-fastembed/…) and installs it under **/root/.cache/fastembed/fast-all-MiniLM-L6-v2** — no Hugging Face at build time. Compose sets **FASTEMBED_CACHE_PATH=/root/.cache/fastembed** because FastEmbed’s default is otherwise **/tmp/fastembed_cache**, which would ignore the baked model.
- Custom CA / TLS: optional PEMs in
**certs/** —**ca.crt**(stack) and**extra-ca.crt**(e.g. corporate root) are merged at build and runtime for**qdrant-mcp, and at runtime for**mcp-orchestration(catalog HTTPS). If the GCS download still fails TLS during**docker compose build**, set**HF_HUB_TLS_INSECURE=1**in**.env**(passes**curl -k**for that step only). At runtime,**HF_HUB_DISABLE_SSL_VERIFY=1**remains the last resort for any remaining HTTPS (hf_tls.py+ certifi). - Skip bake:
**SKIP_EMBEDDING_PREFETCH=1** in**.env**skips the GCSRUN(for air-gapped builds). You must then build on a machine that can run that step once, or inject a model directory yourself (see below).
If GCS is blocked at build time (403 / timeout), set **SKIP_EMBEDDING_PREFETCH=1**, build the image, then either:
- Copy the baked layer from an image built elsewhere (
docker create+docker cpfrom/root/.cache/fastembed), or - Use a compose override to bind-mount a pre-filled
**fast-all-MiniLM-L6-v2** directory onto**/root/.cache/fastembed/fast-all-MiniLM-L6-v2**.
For embeddings without this MCP server, see [tools/file-indexer/examples/openai-embed-qdrant.ts](tools/file-indexer/examples/openai-embed-qdrant.ts). If a proxy blocks hosted generative-AI URLs, use a local OpenAI-compatible endpoint (OPENAI_BASE_URL / model documented in [env.example](env.example)), remove the openai-embed-to-qdrant rule from [indexer.config.json](indexer.config.json), or ask IT for an exception.
- Full service catalog (ports, transports, env):
[docs/README.md](docs/README.md)(per-service:[docs/](docs/)). VS Code + Continue (LM Studio–style base URL, sampleconfig.yaml):[vscode-continue/](vscode-continue/). - Qdrant MCP: configure as a remote MCP server using the SSE URL on port 8000 once the container is up.
- .NET MCP: configure URL
**http://localhost:8010/mcp** if your client supports streamable HTTP MCP. - Filesystem / Git: use stdio + Docker (
docker compose run -i --rm …) per the Hub examples.
[docker-compose.yml](docker-compose.yml) starts task/state, Docker MCP Gateway (catalog), and gilwo/mcp-router with a plain docker compose up -d (no extra profile).
| Service | Role | Host |
|---|---|---|
mcp-task-state |
Task Orchestrator — persistent tasks / work graph (stdio) | — (use compose run) |
mcp-orchestration |
Docker MCP Gateway (image built from [docker/mcp-gateway/](docker/mcp-gateway/Dockerfile)) — runs MCP servers from the catalog via **docker.sock** |
streaming MCP on 8820 |
mcp-agent-router |
gilwo/mcp-router — aggregates many MCP servers behind one streamable-HTTP endpoint | MCP 9914 (/mcp), dashboard 9915 |
Orchestration gateway — mounting **/var/run/docker.sock** is powerful (containers on your engine). Set MCP_GATEWAY_AUTH_TOKEN in **.env** when exposing port 8820. For Docker Desktop quirks, see upstream notes on DOCKER_MCP_IN_CONTAINER.
Agent router — set **MCP_ROUTER_MASTER_KEY** (required) and optionally **MCP_ROUTER_ADMIN_KEY** in **.env** (see **env.example**). Upstream stores these in ~/.mcp-router/.env when run bare; with Compose they must be passed as environment variables. Configure downstream servers in /app/config, persisted on the host as **[orchestration/mcp-router-config/](orchestration/mcp-router-config)** (gitignored). From inside the stack, HTTP MCP services are reachable by Compose DNS (e.g. http://qdrant-mcp:8000/..., http://dotnet-mcp:8010/mcp); from the router container you can also use http://host.docker.internal:… for ports bound on the host. See the mcp-router docs for config.toml and auth keys.
Optional workflow YAML: add files under [orchestration/taskorchestrator/](orchestration/taskorchestrator) (see upstream docs). The task-state container sets AGENT_CONFIG_DIR=/project with that folder mounted read-only at /project/.taskorchestrator.
See [debug/README.md](debug/README.md) and run .\debug\run-all.ps1 from the repo root.
cd C:\repos\assistants
Copy-Item env.example .env # optional
docker compose up -d --buildRun this after the containers are up to snapshot CPU%, RAM, and (if available) GPU VRAM usage for llmservice-chat and llmservice-embed.
$chat = (docker ps --filter "name=llmservice-chat" --format "{{.ID}}" | Select-Object -First 1)
$embed = (docker ps --filter "name=llmservice-embed" --format "{{.ID}}" | Select-Object -First 1)
docker stats --no-stream --format "llmservice-chat CPU={{.CPUPerc}} MEM={{.MemUsage}}" $chat
docker stats --no-stream --format "llmservice-embed CPU={{.CPUPerc}} MEM={{.MemUsage}}" $embed
# GPU memory (requires `nvidia-smi` in the container)
docker exec $chat nvidia-smi --query-gpu=index,name,memory.total,memory.used --format=csv,noheader
docker exec $embed nvidia-smi --query-gpu=index,name,memory.total,memory.used --format=csv,noheader
# Per-process GPU memory (useful to see the model process)
docker exec $chat nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv,noheader
docker exec $embed nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv,noheaderFrom the debug container, smoke-test in-network reachability:
docker compose exec debug bash
curl -sS http://qdrant:6333/healthz
curl -sS -o /dev/null -w "%{http_code}" http://qdrant-mcp:8000/The official mcp/filesystem and mcp/git images speak MCP over stdin/stdout. They are not meant to stay up detached like HTTP services. This compose file defines them with profile **stdio** so they are not started by a plain docker compose up -d.
From the repo root:
docker compose --profile stdio run --rm -it mcp-filesystem
docker compose --profile stdio run --rm -it mcp-gitPoint your MCP client at the same docker compose run … pattern (or plain docker run -i with the same image and volume flags) as in the filesystem and git image docs.
docker compose run --rm -it mcp-task-state