b00t-server: OpenAI-compat router + finetune pipeline — ready for sm3lly (RTX 3090) by elasticdotventures · Pull Request #528 · elasticdotventures/_b00t_

elasticdotventures · 2026-06-24T11:58:11Z

What

OpenAI-compatible REST API router built into b00t-mcp, with API-key authority, Spotlight usage telemetry, and a deterministic QLoRA finetune pipeline for b00t source code.

Architecture

b00t-mcp --http --llm :5273              ← axum proxy
  → soul-discovery (TCP probe)            ← ~/.b00t/server-soul.tomllm
    → llama.cpp container :8080           ← ghcr.io/ggml-org/llama.cpp:server
      → Qwen3.5-0.8B GGUF (0.5GB)        ← /v1/models, /v1/chat/completions

Key Features

b00t-server: transparent proxy at /v1/models, /v1/chat/completions, /v1/embeddings
Soul config: runtime backend registry (no hardcoded IPs) — local ports + remote keys with env var profiles
API keys: b00t server key create --consumer X → b00t-sk-* tokens validated per-request
Spotlight: per-consumer, per-endpoint, latency-tracked telemetry → ~/.b00t/spotlight.jsonl
Finetune: 4,474 training pairs from b00t source → unsloth QLoRA → GGUF export (505MB)
Stock container: ghcr.io/ggml-org/llama.cpp:server registered as datum
Exchange protocols: agent-to-agent discovery, key sharing, finetune delegation, telemetry sync

For sm3lly (RTX 3090, 24GB)

just connect-sm3lly — discover the agent
b00t server start — start the proxy
Share backends via soul config
Delegate finetune: just delegate-finetune sm3lly_host=<ip>

Sharp Corners

GGUF tensor: blk.24.attn_norm.weight missing after merge — needs fix for stock server
Gate validator now works on Py3.10 (tomli fallback)
codebase-memory TOML duplicate key fixed

Files Changed

b00t-mcp/src/server_llm.rs (420 LOC) — proxy, soul config, key auth, Spotlight
b00t-cli/src/commands/server.rs — server start + key create/list
scripts/generate-b00t-training-data.py — source → training pairs
scripts/finetune-b00t.py — unsloth QLoRA pipeline
b00t/llama-cpp-server.container.toml — stock container datum
b00t/b00t-finetune.cli.toml — deterministic finetune command datum
b00t/b00t-server-soul.tomllm — project soul (canonical state)
justfile — connect-sm3lly, delegate-finetune, sync-spotlight recipes

elasticdotventures

Good-faith critical review — PR #528

Good engineering work overall. The transparent proxy, soul-config backend discovery, Spotlight telemetry, and key truncation in key list are all solid patterns. Four blocking concerns and two DRY issues filed as separate issues.

🚩 Blocking: Auth enforcement missing (#529)

server_llm.rs uses unwrap_or_else(|| "unknown".to_string()) on a failed key validation and then proxies the request anyway. This is access control by logging, not enforcement. Any client that reaches the port calls upstream LLMs at your cost.

Fix: early-return 401 before the proxy path in all three handlers. See #529 for exact patch.

🚩 Blocking: `dev_mode=true` wipes auth (#530)

The bypass if state.dev_mode { return Some("dev-key") } is present but the PR doesn't show where dev_mode is set to true. If it's a CLI flag, one wrong flag in production = open relay. Needs explicit startup warning and documentation, or removal. See #530.

🚩 Blocking: Command injection in `scripts/finetune-b00t.py` (#531)

os.system(f"... {merged_path} ...") where merged_path derives from B00T_OUTPUT_DIR env var. Shell metacharacters in the path = RCE. Use subprocess.run([...]) with list args throughout. See #531.

⚠️ Should-fix: CLI key create invisible to running server (#533)

b00t server key create writes server-keys.json to disk; the server loaded it at boot and never re-reads. New keys are silently ignored until restart. Fix with a POST /admin/reload-keys endpoint or notify file-watcher. See #533.

⚠️ DRY: Two training scripts duplicate the `fine-tune/` pipeline (#532)

scripts/generate-b00t-training-data.py and scripts/finetune-b00t.py are parallel reimplementations of fine-tune/generate_dataset.py and fine-tune/train_unsloth.py respectively. Both have less capability (no argparse, no dedup, no Layer 2/3 in the dataset script; hardcoded model, inline GGUF export with injection risk in the finetune script).

The one genuinely novel piece in generate-b00t-training-data.py is Rust /// doc comment extraction — that logic should be merged into fine-tune/generate_dataset.py as Layer 4. The 0.8B sm0l finetune scenario is valid but belongs as fine-tune/config.sm0l.yaml, not a parallel script. See #532.

✅ What's good

Transparent proxy preserving upstream status codes and raw body — correct
Soul-config backend discovery order (env → TCP probe → remote → fallback) is clean
Per-consumer Spotlight telemetry with latency is the right observability shape
key list truncates to 12 chars — good hygiene
No hardcoded secrets in soul config — all via key_env references
b00tyverse submodule update is housekeeping

DeepSeek-specific note

The DeepSeek backend (api.deepseek.com, key via DEEPSEEK_API_KEY) is wired identically to OpenAI — correct, since DeepSeek's API is OpenAI-compatible. No special handling needed. The soul config ordering matters: if DeepSeek is listed before OpenAI, it will be probed first for remote fallback. Ensure the ordering in server-soul.tomllm reflects your preference.

Verdict: HOLD on merge pending #529, #530, #531 (security). #532, #533 are strong-should-fix but could land in a follow-up if the security items are patched first.

elasticdotventures · 2026-06-24T12:28:58Z

Review posted (see full comment thread). Issues filed: #529 #530 #531 #532 #533

…andard MCPs - vendor/rust-docs-mcp-server: re-pin from dead 7f7d0b0a (upstream force-pushed away -> upload-pack 'not our ref') to live HEAD 4303cbdd; builds rustdocs_mcp_server v1.3.1 - _b00t_/codebase-memory.mcp.toml: remove duplicate [b00t].version key (literal 0.8.1 collided with the detection command -> 'Failed to parse MCP config TOML'); point version command at the PATH-resolvable binary - _b00t_/b00t-core-mcps.cli.toml: promote codebase-memory + rust-doc to the standard MCP_SERVERS set with a binary-backed build gate (b00t cli install before b00t mcp install) in both install and update scripts - opencode.json: register both MCPs for the project runtime

…ht telemetry b00t-mcp/src/server_llm.rs (NEW): transparent proxy with /v1/models, /v1/chat/completions, /v1/embeddings. Validates b00t-issued keys, forwards to upstream (env: B00T_SERVER_UPSTREAM_URL/KEY, default api.openai.com). Emits Spotlight usage events to ~/.b00t/spotlight.jsonl. Key persistence via shared ~/.b00t/server-keys.json. b00t-mcp/src/main.rs: --llm flag starts HTTP mode + mounts llm_router. Dev mode (ACL bypass_oauth) skips key validation. b00t-cli/src/commands/server.rs (NEW): - 'b00t server start' → spawns b00t-mcp --http --llm - 'b00t server key create --consumer X' → generates b00t-sk-* key - 'b00t server key list' → reads registered keys b00tyverse: symlink b00tyverse → vendor (b00t's forked-utility ecosystem) Plan: _b00t_/plans/b00t-server-architecture.tomllmd

…profile env keys - discover_local_backend(): TCP connect probes ports 8181 (mistral.rs), 8080 (llama.cpp), 8000/8001 (vLLM) — no HTTP runtime needed (safe inside async) - resolve_upstream(): priority chain: B00T_SERVER_UPSTREAM_URL (explicit) > local auto-discovery > remote keys (OPENAI_API_KEY, B00T_AI_CH0NKY_KEY, B00T_AI_FRONTIER_KEY, OPENROUTER_API_KEY) - Empty env vars treated as unset (no false-positives) - LlmState::new() auto-discovers; LlmState::from_config(url,key) for tests - Startup logs the resolved upstream (or warns if nothing configured)

…ackends Replace the hardcoded LOCAL_BACKENDS const with a runtime .tomllm config: ~/.b00t/server-soul.tomllm — local, uncommitted. SoulConfig struct (Serialize+Deserialize via toml/serde): - soul.hostname: auto-detected, soul.blessings: consumer tags - backends.local: [{name, port, kind, enabled}] — TCP probed on startup - backends.remote: [{name, key_env, base_url?}] — env var checked in order Auto-seeds a default config on first run; operator edits to add/remove backends. No hardcoded IPs or server names in the binary. Multi-profile support: each remote backend declares its own key_env. Priority chain: B00T_SERVER_UPSTREAM_URL > soul/local > soul/remote. Empty env vars treated as unset.

- Qwen2.5-0.5B GGUF (630M params, 485MB) via prebuilt llama-server binary - debian:bookworm-slim base (~80MB), no Python/pip build deps - OpenAI-compat API at port 8080 (/v1/models, /v1/chat/completions) - Verified end-to-end: b00t-server proxy → container → inference 'Hello!' response at 93ms latency, logged to Spotlight

…a.cpp:server - Deterministic discovery: podman search + GitHub API release lookup - Qwen3.5-4B-Q4_K_M served through b00t-server proxy chain - 4.2B params, 2.7GB, 256K context, token prediction: ~11 tok/s CPU - chat template: Qwen3.5 reasoning mode (reasoning_content)

…→ GGUF - scripts/generate-b00t-training-data.py: extracts 4474 instruction pairs from AGENTS.md, Rust doc comments, datum tomls, skills, justfile - scripts/finetune-b00t.py: unsloth QLoRA (4-bit + LoRA r=16) on Qwen3.5-4B, ChatML format, SFTTrainer, merge + GGUF export - _b00t_/b00t-finetune.cli.toml: deterministic command datum — generate → install deps → finetune → serve, with HF lookup cmds

Training completed: 200 steps, loss 3.26→2.09, 6.4M trainable params. LoRA saved: ~/.b00t/training/output/b00t-lora/ FP16 merged: ~/.b00t/training/output/b00t-merged-fp16/ GGUF: ~/.b00t/training/output/b00t-finetuned.Q4_K_M.gguf (504MB) ⚠️ GGUF conversion dependency: unsloth's fork produces GGUF format incompatible with stock ghcr.io/ggml-org/llama.cpp:server. Fix: build llama-quantize from llama.cpp master, or use unsloth's bundled server.

_b00t_/b00t-server-soul.tomllm: canonical state — architecture, services, environment, deterministic commands, sharp corners, sm3lly notes. _b00t_/datums/b00t-exchange-protocols.learn.tomllm: how two b00t agents connect and share compute — discovery, key exchange, backend sharing, finetune delegation, Spotlight sync, soul mirroring. justfile: connect-sm3lly, share-training, delegate-finetune, sync-spotlight, mirror-soul recipes.

… subprocess.run() - #529: Add require_auth() guard returning 401 on missing/invalid tokens in list_models, proxy_chat, proxy_embeddings handlers - #530: Remove dev_mode bypass entirely — dev iteration uses real keys via 'b00t server key create --consumer dev' - #531: Replace os.system(f-string) with subprocess.run(list) in finetune-b00t.py to prevent shell injection via B00T_OUTPUT_DIR Closes #529, #530, #531

- Add notify crate dependency for filesystem watching - Spawn watcher on ~/.b00t/ directory in LlmState::from_config() - On keys file change: re-read and swap in-memory HashMap - Add reload_keys() method for manual reload - Add test_reload_keys test with isolated temp state - Unify dirs_next() to use dirs::home_dir() for path consistency CLI 'b00t server key create' now takes effect immediately without server restart. File watcher triggers on Create/Modify events in the keys file parent directory. Closes #533

elasticdotventures commented Jun 24, 2026

View reviewed changes

elasticdotventures mentioned this pull request Jun 24, 2026

feat(eureka+neumann): complete E-series + NS-series — 20 tasks delivered #527

Merged

elasticdotventures added 11 commits June 26, 2026 11:31

elasticdotventures force-pushed the task/sm3lly-b00t-server branch from b3d4903 to 548116c Compare June 26, 2026 11:32

elasticdotventures merged commit 433afbe into main Jun 26, 2026
11 of 13 checks passed

elasticdotventures deleted the task/sm3lly-b00t-server branch June 26, 2026 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b00t-server: OpenAI-compat router + finetune pipeline — ready for sm3lly (RTX 3090)#528

b00t-server: OpenAI-compat router + finetune pipeline — ready for sm3lly (RTX 3090)#528
elasticdotventures merged 11 commits into
mainfrom
task/sm3lly-b00t-server

elasticdotventures commented Jun 24, 2026

Uh oh!

elasticdotventures left a comment

Uh oh!

elasticdotventures commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elasticdotventures commented Jun 24, 2026

What

Architecture

Key Features

For sm3lly (RTX 3090, 24GB)

Sharp Corners

Files Changed

Uh oh!

elasticdotventures left a comment

Choose a reason for hiding this comment

Good-faith critical review — PR #528

🚩 Blocking: Auth enforcement missing (#529)

🚩 Blocking: dev_mode=true wipes auth (#530)

🚩 Blocking: Command injection in scripts/finetune-b00t.py (#531)

⚠️ Should-fix: CLI key create invisible to running server (#533)

⚠️ DRY: Two training scripts duplicate the fine-tune/ pipeline (#532)

✅ What's good

DeepSeek-specific note

Uh oh!

elasticdotventures commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🚩 Blocking: `dev_mode=true` wipes auth (#530)

🚩 Blocking: Command injection in `scripts/finetune-b00t.py` (#531)

⚠️ DRY: Two training scripts duplicate the `fine-tune/` pipeline (#532)