Assistants — Docker MCP stack

llmservice (Windows): 9B vs 35B chat model

From the repo root, merge the base stack with the Windows CUDA llmservice file. Use the optional third file for the Qwen3.5 35B A3B GGUF preset; omit it for the default 9B chat weights (see README-llmservice.md).

9B (default chat GGUF + registry qwen3.5-9b-chat):

docker compose -f docker-compose.yml -f docker-compose.llmservice.windows.yml up -d --build

35B (overlay overrides llmservice-chat paths and tuning; use registry id qwen3.5-35b-a3b-chat in clients):

docker compose -f docker-compose.yml -f docker-compose.llmservice.windows.yml -f docker-compose.llmservice.windows.chat-35b.yml up -d --build

After switching presets, recreate llmservice-chat if the stack is already running (same -f list as above), e.g. ... up -d --force-recreate llmservice-chat. Rebuild llmservice-gateway when gateway code or models/registry/models.yaml schema expectations change.

Compose stack for Qdrant, Qdrant MCP (SSE), a .NET-focused MCP (streamable HTTP), plus the reference filesystem and git MCP images. A small debug container (Alpine + curl + bash) is included for probing the network.

Portable tooling (`file-indexer`)

TypeScript CLI in [tools/file-indexer/](tools/file-indexer/README.md): add **indexer.config.json** to any repo, **watch** files with allow/deny globs and ignore files, run hook scripts, **sync-services** (Qdrant collection + mcp-router TOML), and **mcp status / mcp call** against the Compose services.

This workspace already has **[indexer.config.json](indexer.config.json)** at the repo root: Markdown and source files (see glob in config) run [tools/file-indexer/examples/openai-embed-qdrant.ts](tools/file-indexer/examples/openai-embed-qdrant.ts) via npx tsx to embed into Qdrant collection **INDEXER_QDRANT_COLLECTION** (default code-index, separate from **mcp-memories** / MiniLM). Set **OPENAI_API_KEY** and related vars in [.env](env.example), or point **OPENAI_BASE_URL** at a local OpenAI-compatible server (e.g. Ollama, or the Compose **openai-compat-stub** on port 1234) if your org blocks cloud embedding APIs. **Dockerfile** changes run [scripts/dockerfile-compose-rebuild.mjs](scripts/dockerfile-compose-rebuild.mjs) (see INDEXER_COMPOSE_* in [env.example](env.example)). From here:

cd C:\repos\assistants
npm install
npm run indexer:watch       # watch + hooks
npm run indexer:sync        # Qdrant collection + write orchestration/mcp-router.fragment.toml
npm run indexer:status      # probe Qdrant + MCP URLs from config

Other repos (or a custom root):

npm run file-indexer:build
npm exec -w file-indexer -- file-indexer init --root C:\path\to\any\repo
npm exec -w file-indexer -- file-indexer watch --root C:\path\to\any\repo

For a global file-indexer command: npm install -g .\tools\file-indexer (after the build step), or put node C:\repos\assistants\tools\file-indexer\dist\cli.js in a shell alias.

Configuration

Compose project name and container names

Copy **[env.example](env.example)** to **.env** and keep **COMPOSE_PROJECT_NAME=mcp** so the Compose project name is **mcp** every time (not derived from the repo folder name). Container names use short names without the old **assistants-** prefix (e.g. **qdrant**, **mcp-debug**, **llmservice-gateway**). Networks and named volumes keep their existing names (**assistants-mcpnet**, **assistants-qdrant-storage**, …) so data is preserved. After this change, stop/remove old **assistants-*** containers if ports conflict, then **docker compose up** again.

Services and ports

Service	Image / build	Ports (host)	Notes
`qdrant`	`qdrant/qdrant`	6333, 6334	Vector DB + gRPC
`qdrant-mcp`	`./docker/qdrant-mcp`	8000	SSE MCP → Qdrant
`dotnet-mcp`	`./docker/dotnet-mcp`	8010	`dotnet` restore/build/test in `/workspace`
`llmservice-gateway` (+ chat / embed)	`./docker/llmservice-gateway`, `./docker/llmservice-llamacpp`; merge `docker-compose.llmservice.windows.yml` or `docker-compose.llmservice.mac.yml`	8888	Real llama.cpp OpenAI `/v1` — `[README-llmservice.md](README-llmservice.md)`
`mcp-filesystem`	`mcp/filesystem:1.0.2`	— (stdio)	Profile `stdio`
`mcp-git`	`mcp/git:latest`	— (stdio)	Profile `stdio`
`debug`	`./docker/debug`	—	`docker compose exec debug bash` (`curl` installed); repo `./debug` mounted at `/debug` for validation scripts

All application containers attach to the **mcpnet** bridge network.

Host URLs and workspace

Qdrant: http://localhost:6333 (dashboard often at http://localhost:6333/dashboard)
Qdrant MCP (SSE): http://localhost:8000 — use your client’s “remote MCP” / SSE URL (path is whatever the server prints on startup, often /sse).
.NET MCP (streamable HTTP): http://localhost:8010/mcp — tools: dotnet_restore, dotnet_build, dotnet_test. Mount your repo under **./workspace** (or bind a host path there) so paths stay under /workspace in the container.
llmservice (optional): **http://localhost:8888/v1** — local GGUF chat + embeddings via llama.cpp (does not use port 8000). See [README-llmservice.md](README-llmservice.md) and [docs/README.md](docs/README.md) (port map).

Put repositories and files you want exposed under **./workspace** so filesystem / git stdio containers see /workspace.

Environment and certificates

Copy **env.example** to **.env** at the repo root when you need to override defaults.
Qdrant and Qdrant MCP → Qdrant use plain HTTP on the Compose network (http://qdrant:6333). No API key is configured by default.

FastEmbed model (baked in the image)

Upstream mcp-server-qdrant only supports FastEmbed. By default this repo’s **docker compose build** downloads the same MiniLM tarball FastEmbed uses from Google Cloud Storage (qdrant-fastembed/…) and installs it under **/root/.cache/fastembed/fast-all-MiniLM-L6-v2** — no Hugging Face at build time. Compose sets **FASTEMBED_CACHE_PATH=/root/.cache/fastembed** because FastEmbed’s default is otherwise **/tmp/fastembed_cache**, which would ignore the baked model.

Custom CA / TLS: optional PEMs in **certs/** — **ca.crt** (stack) and **extra-ca.crt** (e.g. corporate root) are merged at build and runtime for **qdrant-mcp, and at runtime for **mcp-orchestration (catalog HTTPS). If the GCS download still fails TLS during **docker compose build**, set **HF_HUB_TLS_INSECURE=1** in **.env** (passes **curl -k** for that step only). At runtime, **HF_HUB_DISABLE_SSL_VERIFY=1** remains the last resort for any remaining HTTPS (hf_tls.py + certifi).
Skip bake: **SKIP_EMBEDDING_PREFETCH=1** in **.env** skips the GCS RUN (for air-gapped builds). You must then build on a machine that can run that step once, or inject a model directory yourself (see below).

Air-gapped runtime or blocked GCS

If GCS is blocked at build time (403 / timeout), set **SKIP_EMBEDDING_PREFETCH=1**, build the image, then either:

Copy the baked layer from an image built elsewhere (docker create + docker cp from /root/.cache/fastembed), or
Use a compose override to bind-mount a pre-filled **fast-all-MiniLM-L6-v2** directory onto **/root/.cache/fastembed/fast-all-MiniLM-L6-v2**.

For embeddings without this MCP server, see [tools/file-indexer/examples/openai-embed-qdrant.ts](tools/file-indexer/examples/openai-embed-qdrant.ts). If a proxy blocks hosted generative-AI URLs, use a local OpenAI-compatible endpoint (OPENAI_BASE_URL / model documented in [env.example](env.example)), remove the openai-embed-to-qdrant rule from [indexer.config.json](indexer.config.json), or ask IT for an exception.

MCP clients (Cursor / VS Code)

Full service catalog (ports, transports, env): [docs/README.md](docs/README.md) (per-service: [docs/](docs/)). VS Code + Continue (LM Studio–style base URL, sample config.yaml): [vscode-continue/](vscode-continue/).
Qdrant MCP: configure as a remote MCP server using the SSE URL on port 8000 once the container is up.
.NET MCP: configure URL **http://localhost:8010/mcp** if your client supports streamable HTTP MCP.
Filesystem / Git: use stdio + Docker (docker compose run -i --rm …) per the Hub examples.

Agent orchestration (default stack)

[docker-compose.yml](docker-compose.yml) starts task/state, Docker MCP Gateway (catalog), and gilwo/mcp-router with a plain docker compose up -d (no extra profile).

Service	Role	Host
`mcp-task-state`	Task Orchestrator — persistent tasks / work graph (stdio)	— (use `compose run`)
`mcp-orchestration`	Docker MCP Gateway (image built from `[docker/mcp-gateway/](docker/mcp-gateway/Dockerfile)`) — runs MCP servers from the catalog via `docker.sock`	streaming MCP on 8820
`mcp-agent-router`	gilwo/mcp-router — aggregates many MCP servers behind one streamable-HTTP endpoint	MCP 9914 (`/mcp`), dashboard 9915

Orchestration gateway — mounting **/var/run/docker.sock** is powerful (containers on your engine). Set MCP_GATEWAY_AUTH_TOKEN in **.env** when exposing port 8820. For Docker Desktop quirks, see upstream notes on DOCKER_MCP_IN_CONTAINER.

Agent router — set **MCP_ROUTER_MASTER_KEY** (required) and optionally **MCP_ROUTER_ADMIN_KEY** in **.env** (see **env.example**). Upstream stores these in ~/.mcp-router/.env when run bare; with Compose they must be passed as environment variables. Configure downstream servers in /app/config, persisted on the host as **[orchestration/mcp-router-config/](orchestration/mcp-router-config)** (gitignored). From inside the stack, HTTP MCP services are reachable by Compose DNS (e.g. http://qdrant-mcp:8000/..., http://dotnet-mcp:8010/mcp); from the router container you can also use http://host.docker.internal:… for ports bound on the host. See the mcp-router docs for config.toml and auth keys.

Optional workflow YAML: add files under [orchestration/taskorchestrator/](orchestration/taskorchestrator) (see upstream docs). The task-state container sets AGENT_CONFIG_DIR=/project with that folder mounted read-only at /project/.taskorchestrator.

Validation

See [debug/README.md](debug/README.md) and run .\debug\run-all.ps1 from the repo root.

Quick start

cd C:\repos\assistants
Copy-Item env.example .env   # optional
docker compose up -d --build

Check llmservice resource usage (CPU/RAM/GPU)

Run this after the containers are up to snapshot CPU%, RAM, and (if available) GPU VRAM usage for llmservice-chat and llmservice-embed.

$chat = (docker ps --filter "name=llmservice-chat" --format "{{.ID}}" | Select-Object -First 1)
$embed = (docker ps --filter "name=llmservice-embed" --format "{{.ID}}" | Select-Object -First 1)

docker stats --no-stream --format "llmservice-chat CPU={{.CPUPerc}} MEM={{.MemUsage}}" $chat
docker stats --no-stream --format "llmservice-embed CPU={{.CPUPerc}} MEM={{.MemUsage}}" $embed

# GPU memory (requires `nvidia-smi` in the container)
docker exec $chat nvidia-smi --query-gpu=index,name,memory.total,memory.used --format=csv,noheader
docker exec $embed nvidia-smi --query-gpu=index,name,memory.total,memory.used --format=csv,noheader

# Per-process GPU memory (useful to see the model process)
docker exec $chat nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv,noheader
docker exec $embed nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv,noheader

From the debug container, smoke-test in-network reachability:

docker compose exec debug bash
curl -sS http://qdrant:6333/healthz
curl -sS -o /dev/null -w "%{http_code}" http://qdrant-mcp:8000/

Filesystem and git MCP (stdio)

The official mcp/filesystem and mcp/git images speak MCP over stdin/stdout. They are not meant to stay up detached like HTTP services. This compose file defines them with profile **stdio** so they are not started by a plain docker compose up -d.

From the repo root:

docker compose --profile stdio run --rm -it mcp-filesystem
docker compose --profile stdio run --rm -it mcp-git

Point your MCP client at the same docker compose run … pattern (or plain docker run -i with the same image and volume flags) as in the filesystem and git image docs.

Task state (stdio MCP)

docker compose run --rm -it mcp-task-state

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assistants — Docker MCP stack

llmservice (Windows): 9B vs 35B chat model

Portable tooling (`file-indexer`)

Configuration

Compose project name and container names

Services and ports

Host URLs and workspace

Environment and certificates

FastEmbed model (baked in the image)

Air-gapped runtime or blocked GCS

MCP clients (Cursor / VS Code)

Agent orchestration (default stack)

Validation

Quick start

Check llmservice resource usage (CPU/RAM/GPU)

Filesystem and git MCP (stdio)

Task state (stdio MCP)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.continue		.continue
.cursor		.cursor
.vscode		.vscode
certs		certs
debug		debug
docker		docker
docs		docs
models		models
orchestration/taskorchestrator		orchestration/taskorchestrator
scripts		scripts
templates		templates
tools		tools
workspace		workspace
.gitattributes		.gitattributes
.gitignore		.gitignore
DOCKER_SERVICES.md		DOCKER_SERVICES.md
README-llmservice.md		README-llmservice.md
README.md		README.md
docker-compose.llmservice.mac.yml		docker-compose.llmservice.mac.yml
docker-compose.llmservice.windows.chat-35b.yml		docker-compose.llmservice.windows.chat-35b.yml
docker-compose.llmservice.windows.yml		docker-compose.llmservice.windows.yml
docker-compose.orchestration.yml		docker-compose.orchestration.yml
docker-compose.yml		docker-compose.yml
env.example		env.example
indexer.config.json		indexer.config.json
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

Assistants — Docker MCP stack

llmservice (Windows): 9B vs 35B chat model

Portable tooling (file-indexer)

Configuration

Compose project name and container names

Services and ports

Host URLs and workspace

Environment and certificates

FastEmbed model (baked in the image)

Air-gapped runtime or blocked GCS

MCP clients (Cursor / VS Code)

Agent orchestration (default stack)

Validation

Quick start

Check llmservice resource usage (CPU/RAM/GPU)

Filesystem and git MCP (stdio)

Task state (stdio MCP)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Portable tooling (`file-indexer`)

Packages