b00t-server: OpenAI-compat router + finetune pipeline β ready for sm3lly (RTX 3090)#528
Conversation
elasticdotventures
left a comment
There was a problem hiding this comment.
Good-faith critical review β PR #528
Good engineering work overall. The transparent proxy, soul-config backend discovery, Spotlight telemetry, and key truncation in key list are all solid patterns. Four blocking concerns and two DRY issues filed as separate issues.
π© Blocking: Auth enforcement missing (#529)
server_llm.rs uses unwrap_or_else(|| "unknown".to_string()) on a failed key validation and then proxies the request anyway. This is access control by logging, not enforcement. Any client that reaches the port calls upstream LLMs at your cost.
Fix: early-return 401 before the proxy path in all three handlers. See #529 for exact patch.
π© Blocking: dev_mode=true wipes auth (#530)
The bypass if state.dev_mode { return Some("dev-key") } is present but the PR doesn't show where dev_mode is set to true. If it's a CLI flag, one wrong flag in production = open relay. Needs explicit startup warning and documentation, or removal. See #530.
π© Blocking: Command injection in scripts/finetune-b00t.py (#531)
os.system(f"... {merged_path} ...") where merged_path derives from B00T_OUTPUT_DIR env var. Shell metacharacters in the path = RCE. Use subprocess.run([...]) with list args throughout. See #531.
β οΈ Should-fix: CLI key create invisible to running server (#533)
b00t server key create writes server-keys.json to disk; the server loaded it at boot and never re-reads. New keys are silently ignored until restart. Fix with a POST /admin/reload-keys endpoint or notify file-watcher. See #533.
β οΈ DRY: Two training scripts duplicate the fine-tune/ pipeline (#532)
scripts/generate-b00t-training-data.py and scripts/finetune-b00t.py are parallel reimplementations of fine-tune/generate_dataset.py and fine-tune/train_unsloth.py respectively. Both have less capability (no argparse, no dedup, no Layer 2/3 in the dataset script; hardcoded model, inline GGUF export with injection risk in the finetune script).
The one genuinely novel piece in generate-b00t-training-data.py is Rust /// doc comment extraction β that logic should be merged into fine-tune/generate_dataset.py as Layer 4. The 0.8B sm0l finetune scenario is valid but belongs as fine-tune/config.sm0l.yaml, not a parallel script. See #532.
β What's good
- Transparent proxy preserving upstream status codes and raw body β correct
- Soul-config backend discovery order (env β TCP probe β remote β fallback) is clean
- Per-consumer Spotlight telemetry with latency is the right observability shape
key listtruncates to 12 chars β good hygiene- No hardcoded secrets in soul config β all via
key_envreferences b00tyversesubmodule update is housekeeping
DeepSeek-specific note
The DeepSeek backend (api.deepseek.com, key via DEEPSEEK_API_KEY) is wired identically to OpenAI β correct, since DeepSeek's API is OpenAI-compatible. No special handling needed. The soul config ordering matters: if DeepSeek is listed before OpenAI, it will be probed first for remote fallback. Ensure the ordering in server-soul.tomllm reflects your preference.
Verdict: HOLD on merge pending #529, #530, #531 (security). #532, #533 are strong-should-fix but could land in a follow-up if the security items are patched first.
β¦andard MCPs - vendor/rust-docs-mcp-server: re-pin from dead 7f7d0b0a (upstream force-pushed away -> upload-pack 'not our ref') to live HEAD 4303cbdd; builds rustdocs_mcp_server v1.3.1 - _b00t_/codebase-memory.mcp.toml: remove duplicate [b00t].version key (literal 0.8.1 collided with the detection command -> 'Failed to parse MCP config TOML'); point version command at the PATH-resolvable binary - _b00t_/b00t-core-mcps.cli.toml: promote codebase-memory + rust-doc to the standard MCP_SERVERS set with a binary-backed build gate (b00t cli install before b00t mcp install) in both install and update scripts - opencode.json: register both MCPs for the project runtime
β¦ht telemetry b00t-mcp/src/server_llm.rs (NEW): transparent proxy with /v1/models, /v1/chat/completions, /v1/embeddings. Validates b00t-issued keys, forwards to upstream (env: B00T_SERVER_UPSTREAM_URL/KEY, default api.openai.com). Emits Spotlight usage events to ~/.b00t/spotlight.jsonl. Key persistence via shared ~/.b00t/server-keys.json. b00t-mcp/src/main.rs: --llm flag starts HTTP mode + mounts llm_router. Dev mode (ACL bypass_oauth) skips key validation. b00t-cli/src/commands/server.rs (NEW): - 'b00t server start' β spawns b00t-mcp --http --llm - 'b00t server key create --consumer X' β generates b00t-sk-* key - 'b00t server key list' β reads registered keys b00tyverse: symlink b00tyverse β vendor (b00t's forked-utility ecosystem) Plan: _b00t_/plans/b00t-server-architecture.tomllmd
β¦profile env keys - discover_local_backend(): TCP connect probes ports 8181 (mistral.rs), 8080 (llama.cpp), 8000/8001 (vLLM) β no HTTP runtime needed (safe inside async) - resolve_upstream(): priority chain: B00T_SERVER_UPSTREAM_URL (explicit) > local auto-discovery > remote keys (OPENAI_API_KEY, B00T_AI_CH0NKY_KEY, B00T_AI_FRONTIER_KEY, OPENROUTER_API_KEY) - Empty env vars treated as unset (no false-positives) - LlmState::new() auto-discovers; LlmState::from_config(url,key) for tests - Startup logs the resolved upstream (or warns if nothing configured)
β¦ackends
Replace the hardcoded LOCAL_BACKENDS const with a runtime .tomllm config:
~/.b00t/server-soul.tomllm β local, uncommitted.
SoulConfig struct (Serialize+Deserialize via toml/serde):
- soul.hostname: auto-detected, soul.blessings: consumer tags
- backends.local: [{name, port, kind, enabled}] β TCP probed on startup
- backends.remote: [{name, key_env, base_url?}] β env var checked in order
Auto-seeds a default config on first run; operator edits to add/remove
backends. No hardcoded IPs or server names in the binary.
Multi-profile support: each remote backend declares its own key_env.
Priority chain: B00T_SERVER_UPSTREAM_URL > soul/local > soul/remote.
Empty env vars treated as unset.
- Qwen2.5-0.5B GGUF (630M params, 485MB) via prebuilt llama-server binary - debian:bookworm-slim base (~80MB), no Python/pip build deps - OpenAI-compat API at port 8080 (/v1/models, /v1/chat/completions) - Verified end-to-end: b00t-server proxy β container β inference 'Hello!' response at 93ms latency, logged to Spotlight
β¦a.cpp:server - Deterministic discovery: podman search + GitHub API release lookup - Qwen3.5-4B-Q4_K_M served through b00t-server proxy chain - 4.2B params, 2.7GB, 256K context, token prediction: ~11 tok/s CPU - chat template: Qwen3.5 reasoning mode (reasoning_content)
β¦β GGUF - scripts/generate-b00t-training-data.py: extracts 4474 instruction pairs from AGENTS.md, Rust doc comments, datum tomls, skills, justfile - scripts/finetune-b00t.py: unsloth QLoRA (4-bit + LoRA r=16) on Qwen3.5-4B, ChatML format, SFTTrainer, merge + GGUF export - _b00t_/b00t-finetune.cli.toml: deterministic command datum β generate β install deps β finetune β serve, with HF lookup cmds
Training completed: 200 steps, loss 3.26β2.09, 6.4M trainable params. LoRA saved: ~/.b00t/training/output/b00t-lora/ FP16 merged: ~/.b00t/training/output/b00t-merged-fp16/ GGUF: ~/.b00t/training/output/b00t-finetuned.Q4_K_M.gguf (504MB)β οΈ GGUF conversion dependency: unsloth's fork produces GGUF format incompatible with stock ghcr.io/ggml-org/llama.cpp:server. Fix: build llama-quantize from llama.cpp master, or use unsloth's bundled server.
_b00t_/b00t-server-soul.tomllm: canonical state β architecture, services, environment, deterministic commands, sharp corners, sm3lly notes. _b00t_/datums/b00t-exchange-protocols.learn.tomllm: how two b00t agents connect and share compute β discovery, key exchange, backend sharing, finetune delegation, Spotlight sync, soul mirroring. justfile: connect-sm3lly, share-training, delegate-finetune, sync-spotlight, mirror-soul recipes.
β¦ subprocess.run() - #529: Add require_auth() guard returning 401 on missing/invalid tokens in list_models, proxy_chat, proxy_embeddings handlers - #530: Remove dev_mode bypass entirely β dev iteration uses real keys via 'b00t server key create --consumer dev' - #531: Replace os.system(f-string) with subprocess.run(list) in finetune-b00t.py to prevent shell injection via B00T_OUTPUT_DIR Closes #529, #530, #531
- Add notify crate dependency for filesystem watching - Spawn watcher on ~/.b00t/ directory in LlmState::from_config() - On keys file change: re-read and swap in-memory HashMap - Add reload_keys() method for manual reload - Add test_reload_keys test with isolated temp state - Unify dirs_next() to use dirs::home_dir() for path consistency CLI 'b00t server key create' now takes effect immediately without server restart. File watcher triggers on Create/Modify events in the keys file parent directory. Closes #533
b3d4903 to
548116c
Compare
What
OpenAI-compatible REST API router built into b00t-mcp, with API-key authority, Spotlight usage telemetry, and a deterministic QLoRA finetune pipeline for b00t source code.
Architecture
Key Features
b00t server key create --consumer Xβ b00t-sk-* tokens validated per-requestFor sm3lly (RTX 3090, 24GB)
just connect-sm3llyβ discover the agentb00t server startβ start the proxyjust delegate-finetune sm3lly_host=<ip>Sharp Corners
Files Changed