Skip to content

lynnvfrank/assistants

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assistants — Docker MCP stack

llmservice (Windows): 9B vs 35B chat model

From the repo root, merge the base stack with the Windows CUDA llmservice file. Use the optional third file for the Qwen3.5 35B A3B GGUF preset; omit it for the default 9B chat weights (see README-llmservice.md).

9B (default chat GGUF + registry qwen3.5-9b-chat):

docker compose -f docker-compose.yml -f docker-compose.llmservice.windows.yml up -d --build

35B (overlay overrides llmservice-chat paths and tuning; use registry id qwen3.5-35b-a3b-chat in clients):

docker compose -f docker-compose.yml -f docker-compose.llmservice.windows.yml -f docker-compose.llmservice.windows.chat-35b.yml up -d --build

After switching presets, recreate llmservice-chat if the stack is already running (same -f list as above), e.g. ... up -d --force-recreate llmservice-chat. Rebuild llmservice-gateway when gateway code or models/registry/models.yaml schema expectations change.


Compose stack for Qdrant, Qdrant MCP (SSE), a .NET-focused MCP (streamable HTTP), plus the reference filesystem and git MCP images. A small debug container (Alpine + curl + bash) is included for probing the network.

Portable tooling (file-indexer)

TypeScript CLI in [tools/file-indexer/](tools/file-indexer/README.md): add **indexer.config.json** to any repo, **watch** files with allow/deny globs and ignore files, run hook scripts, **sync-services** (Qdrant collection + mcp-router TOML), and **mcp status / mcp call** against the Compose services.

This workspace already has **[indexer.config.json](indexer.config.json)** at the repo root: Markdown and source files (see glob in config) run [tools/file-indexer/examples/openai-embed-qdrant.ts](tools/file-indexer/examples/openai-embed-qdrant.ts) via npx tsx to embed into Qdrant collection **INDEXER_QDRANT_COLLECTION** (default code-index, separate from **mcp-memories** / MiniLM). Set **OPENAI_API_KEY** and related vars in [.env](env.example), or point **OPENAI_BASE_URL** at a local OpenAI-compatible server (e.g. Ollama, or the Compose **openai-compat-stub** on port 1234) if your org blocks cloud embedding APIs. **Dockerfile** changes run [scripts/dockerfile-compose-rebuild.mjs](scripts/dockerfile-compose-rebuild.mjs) (see INDEXER_COMPOSE_* in [env.example](env.example)). From here:

cd C:\repos\assistants
npm install
npm run indexer:watch       # watch + hooks
npm run indexer:sync        # Qdrant collection + write orchestration/mcp-router.fragment.toml
npm run indexer:status      # probe Qdrant + MCP URLs from config

Other repos (or a custom root):

npm run file-indexer:build
npm exec -w file-indexer -- file-indexer init --root C:\path\to\any\repo
npm exec -w file-indexer -- file-indexer watch --root C:\path\to\any\repo

For a global file-indexer command: npm install -g .\tools\file-indexer (after the build step), or put node C:\repos\assistants\tools\file-indexer\dist\cli.js in a shell alias.

Configuration

Compose project name and container names

Copy **[env.example](env.example)** to **.env** and keep **COMPOSE_PROJECT_NAME=mcp** so the Compose project name is **mcp** every time (not derived from the repo folder name). Container names use short names without the old **assistants-** prefix (e.g. **qdrant**, **mcp-debug**, **llmservice-gateway**). Networks and named volumes keep their existing names (**assistants-mcpnet**, **assistants-qdrant-storage**, …) so data is preserved. After this change, stop/remove old **assistants-*** containers if ports conflict, then **docker compose up** again.

Services and ports

Service Image / build Ports (host) Notes
qdrant qdrant/qdrant 6333, 6334 Vector DB + gRPC
qdrant-mcp ./docker/qdrant-mcp 8000 SSE MCP → Qdrant
dotnet-mcp ./docker/dotnet-mcp 8010 dotnet restore/build/test in /workspace
llmservice-gateway (+ chat / embed) ./docker/llmservice-gateway, ./docker/llmservice-llamacpp; merge **docker-compose.llmservice.windows.yml** or **docker-compose.llmservice.mac.yml** 8888 Real llama.cpp OpenAI **/v1** — [README-llmservice.md](README-llmservice.md)
mcp-filesystem mcp/filesystem:1.0.2 — (stdio) Profile stdio
mcp-git mcp/git:latest — (stdio) Profile stdio
debug ./docker/debug docker compose exec debug bash (curl installed); repo ./debug mounted at **/debug** for validation scripts

All application containers attach to the **mcpnet** bridge network.

Host URLs and workspace

  • Qdrant: http://localhost:6333 (dashboard often at http://localhost:6333/dashboard)
  • Qdrant MCP (SSE): http://localhost:8000 — use your client’s “remote MCP” / SSE URL (path is whatever the server prints on startup, often /sse).
  • .NET MCP (streamable HTTP): http://localhost:8010/mcp — tools: dotnet_restore, dotnet_build, dotnet_test. Mount your repo under **./workspace** (or bind a host path there) so paths stay under /workspace in the container.
  • llmservice (optional): **http://localhost:8888/v1** — local GGUF chat + embeddings via llama.cpp (does not use port 8000). See [README-llmservice.md](README-llmservice.md) and [docs/README.md](docs/README.md) (port map).

Put repositories and files you want exposed under **./workspace** so filesystem / git stdio containers see /workspace.

Environment and certificates

  • Copy **env.example** to **.env** at the repo root when you need to override defaults.
  • Qdrant and Qdrant MCP → Qdrant use plain HTTP on the Compose network (http://qdrant:6333). No API key is configured by default.

FastEmbed model (baked in the image)

Upstream mcp-server-qdrant only supports FastEmbed. By default this repo’s **docker compose build** downloads the same MiniLM tarball FastEmbed uses from Google Cloud Storage (qdrant-fastembed/…) and installs it under **/root/.cache/fastembed/fast-all-MiniLM-L6-v2** — no Hugging Face at build time. Compose sets **FASTEMBED_CACHE_PATH=/root/.cache/fastembed** because FastEmbed’s default is otherwise **/tmp/fastembed_cache**, which would ignore the baked model.

  • Custom CA / TLS: optional PEMs in **certs/** — **ca.crt** (stack) and **extra-ca.crt** (e.g. corporate root) are merged at build and runtime for **qdrant-mcp, and at runtime for **mcp-orchestration (catalog HTTPS). If the GCS download still fails TLS during **docker compose build**, set **HF_HUB_TLS_INSECURE=1** in **.env** (passes **curl -k** for that step only). At runtime, **HF_HUB_DISABLE_SSL_VERIFY=1** remains the last resort for any remaining HTTPS (hf_tls.py + certifi).
  • Skip bake: **SKIP_EMBEDDING_PREFETCH=1** in **.env** skips the GCS RUN (for air-gapped builds). You must then build on a machine that can run that step once, or inject a model directory yourself (see below).

Air-gapped runtime or blocked GCS

If GCS is blocked at build time (403 / timeout), set **SKIP_EMBEDDING_PREFETCH=1**, build the image, then either:

  1. Copy the baked layer from an image built elsewhere (docker create + docker cp from /root/.cache/fastembed), or
  2. Use a compose override to bind-mount a pre-filled **fast-all-MiniLM-L6-v2** directory onto **/root/.cache/fastembed/fast-all-MiniLM-L6-v2**.

For embeddings without this MCP server, see [tools/file-indexer/examples/openai-embed-qdrant.ts](tools/file-indexer/examples/openai-embed-qdrant.ts). If a proxy blocks hosted generative-AI URLs, use a local OpenAI-compatible endpoint (OPENAI_BASE_URL / model documented in [env.example](env.example)), remove the openai-embed-to-qdrant rule from [indexer.config.json](indexer.config.json), or ask IT for an exception.

MCP clients (Cursor / VS Code)

  • Full service catalog (ports, transports, env): [docs/README.md](docs/README.md) (per-service: [docs/](docs/)). VS Code + Continue (LM Studio–style base URL, sample config.yaml): [vscode-continue/](vscode-continue/).
  • Qdrant MCP: configure as a remote MCP server using the SSE URL on port 8000 once the container is up.
  • .NET MCP: configure URL **http://localhost:8010/mcp** if your client supports streamable HTTP MCP.
  • Filesystem / Git: use stdio + Docker (docker compose run -i --rm …) per the Hub examples.

Agent orchestration (default stack)

[docker-compose.yml](docker-compose.yml) starts task/state, Docker MCP Gateway (catalog), and gilwo/mcp-router with a plain docker compose up -d (no extra profile).

Service Role Host
mcp-task-state Task Orchestrator — persistent tasks / work graph (stdio) — (use compose run)
mcp-orchestration Docker MCP Gateway (image built from [docker/mcp-gateway/](docker/mcp-gateway/Dockerfile)) — runs MCP servers from the catalog via **docker.sock** streaming MCP on 8820
mcp-agent-router gilwo/mcp-router — aggregates many MCP servers behind one streamable-HTTP endpoint MCP 9914 (/mcp), dashboard 9915

Orchestration gateway — mounting **/var/run/docker.sock** is powerful (containers on your engine). Set MCP_GATEWAY_AUTH_TOKEN in **.env** when exposing port 8820. For Docker Desktop quirks, see upstream notes on DOCKER_MCP_IN_CONTAINER.

Agent router — set **MCP_ROUTER_MASTER_KEY** (required) and optionally **MCP_ROUTER_ADMIN_KEY** in **.env** (see **env.example**). Upstream stores these in ~/.mcp-router/.env when run bare; with Compose they must be passed as environment variables. Configure downstream servers in /app/config, persisted on the host as **[orchestration/mcp-router-config/](orchestration/mcp-router-config)** (gitignored). From inside the stack, HTTP MCP services are reachable by Compose DNS (e.g. http://qdrant-mcp:8000/..., http://dotnet-mcp:8010/mcp); from the router container you can also use http://host.docker.internal:… for ports bound on the host. See the mcp-router docs for config.toml and auth keys.

Optional workflow YAML: add files under [orchestration/taskorchestrator/](orchestration/taskorchestrator) (see upstream docs). The task-state container sets AGENT_CONFIG_DIR=/project with that folder mounted read-only at /project/.taskorchestrator.

Validation

See [debug/README.md](debug/README.md) and run .\debug\run-all.ps1 from the repo root.

Quick start

cd C:\repos\assistants
Copy-Item env.example .env   # optional
docker compose up -d --build

Check llmservice resource usage (CPU/RAM/GPU)

Run this after the containers are up to snapshot CPU%, RAM, and (if available) GPU VRAM usage for llmservice-chat and llmservice-embed.

$chat = (docker ps --filter "name=llmservice-chat" --format "{{.ID}}" | Select-Object -First 1)
$embed = (docker ps --filter "name=llmservice-embed" --format "{{.ID}}" | Select-Object -First 1)

docker stats --no-stream --format "llmservice-chat CPU={{.CPUPerc}} MEM={{.MemUsage}}" $chat
docker stats --no-stream --format "llmservice-embed CPU={{.CPUPerc}} MEM={{.MemUsage}}" $embed

# GPU memory (requires `nvidia-smi` in the container)
docker exec $chat nvidia-smi --query-gpu=index,name,memory.total,memory.used --format=csv,noheader
docker exec $embed nvidia-smi --query-gpu=index,name,memory.total,memory.used --format=csv,noheader

# Per-process GPU memory (useful to see the model process)
docker exec $chat nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv,noheader
docker exec $embed nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv,noheader

From the debug container, smoke-test in-network reachability:

docker compose exec debug bash
curl -sS http://qdrant:6333/healthz
curl -sS -o /dev/null -w "%{http_code}" http://qdrant-mcp:8000/

Filesystem and git MCP (stdio)

The official mcp/filesystem and mcp/git images speak MCP over stdin/stdout. They are not meant to stay up detached like HTTP services. This compose file defines them with profile **stdio** so they are not started by a plain docker compose up -d.

From the repo root:

docker compose --profile stdio run --rm -it mcp-filesystem
docker compose --profile stdio run --rm -it mcp-git

Point your MCP client at the same docker compose run … pattern (or plain docker run -i with the same image and volume flags) as in the filesystem and git image docs.

Task state (stdio MCP)

docker compose run --rm -it mcp-task-state

About

some exploration into building llm and mcp systems in docker on linux

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors