Skip to content

b00t-server: OpenAI-compat router + finetune pipeline β€” ready for sm3lly (RTX 3090)#528

Merged
elasticdotventures merged 11 commits into
mainfrom
task/sm3lly-b00t-server
Jun 26, 2026
Merged

b00t-server: OpenAI-compat router + finetune pipeline β€” ready for sm3lly (RTX 3090)#528
elasticdotventures merged 11 commits into
mainfrom
task/sm3lly-b00t-server

Conversation

@elasticdotventures

Copy link
Copy Markdown
Owner

What

OpenAI-compatible REST API router built into b00t-mcp, with API-key authority, Spotlight usage telemetry, and a deterministic QLoRA finetune pipeline for b00t source code.

Architecture

b00t-mcp --http --llm :5273              ← axum proxy
  β†’ soul-discovery (TCP probe)            ← ~/.b00t/server-soul.tomllm
    β†’ llama.cpp container :8080           ← ghcr.io/ggml-org/llama.cpp:server
      β†’ Qwen3.5-0.8B GGUF (0.5GB)        ← /v1/models, /v1/chat/completions

Key Features

  • b00t-server: transparent proxy at /v1/models, /v1/chat/completions, /v1/embeddings
  • Soul config: runtime backend registry (no hardcoded IPs) β€” local ports + remote keys with env var profiles
  • API keys: b00t server key create --consumer X β†’ b00t-sk-* tokens validated per-request
  • Spotlight: per-consumer, per-endpoint, latency-tracked telemetry β†’ ~/.b00t/spotlight.jsonl
  • Finetune: 4,474 training pairs from b00t source β†’ unsloth QLoRA β†’ GGUF export (505MB)
  • Stock container: ghcr.io/ggml-org/llama.cpp:server registered as datum
  • Exchange protocols: agent-to-agent discovery, key sharing, finetune delegation, telemetry sync

For sm3lly (RTX 3090, 24GB)

  1. just connect-sm3lly β€” discover the agent
  2. b00t server start β€” start the proxy
  3. Share backends via soul config
  4. Delegate finetune: just delegate-finetune sm3lly_host=<ip>

Sharp Corners

  • GGUF tensor: blk.24.attn_norm.weight missing after merge β€” needs fix for stock server
  • Gate validator now works on Py3.10 (tomli fallback)
  • codebase-memory TOML duplicate key fixed

Files Changed

  • b00t-mcp/src/server_llm.rs (420 LOC) β€” proxy, soul config, key auth, Spotlight
  • b00t-cli/src/commands/server.rs β€” server start + key create/list
  • scripts/generate-b00t-training-data.py β€” source β†’ training pairs
  • scripts/finetune-b00t.py β€” unsloth QLoRA pipeline
  • b00t/llama-cpp-server.container.toml β€” stock container datum
  • b00t/b00t-finetune.cli.toml β€” deterministic finetune command datum
  • b00t/b00t-server-soul.tomllm β€” project soul (canonical state)
  • justfile β€” connect-sm3lly, delegate-finetune, sync-spotlight recipes

@elasticdotventures elasticdotventures left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good-faith critical review β€” PR #528

Good engineering work overall. The transparent proxy, soul-config backend discovery, Spotlight telemetry, and key truncation in key list are all solid patterns. Four blocking concerns and two DRY issues filed as separate issues.


🚩 Blocking: Auth enforcement missing (#529)

server_llm.rs uses unwrap_or_else(|| "unknown".to_string()) on a failed key validation and then proxies the request anyway. This is access control by logging, not enforcement. Any client that reaches the port calls upstream LLMs at your cost.

Fix: early-return 401 before the proxy path in all three handlers. See #529 for exact patch.


🚩 Blocking: dev_mode=true wipes auth (#530)

The bypass if state.dev_mode { return Some("dev-key") } is present but the PR doesn't show where dev_mode is set to true. If it's a CLI flag, one wrong flag in production = open relay. Needs explicit startup warning and documentation, or removal. See #530.


🚩 Blocking: Command injection in scripts/finetune-b00t.py (#531)

os.system(f"... {merged_path} ...") where merged_path derives from B00T_OUTPUT_DIR env var. Shell metacharacters in the path = RCE. Use subprocess.run([...]) with list args throughout. See #531.


⚠️ Should-fix: CLI key create invisible to running server (#533)

b00t server key create writes server-keys.json to disk; the server loaded it at boot and never re-reads. New keys are silently ignored until restart. Fix with a POST /admin/reload-keys endpoint or notify file-watcher. See #533.


⚠️ DRY: Two training scripts duplicate the fine-tune/ pipeline (#532)

scripts/generate-b00t-training-data.py and scripts/finetune-b00t.py are parallel reimplementations of fine-tune/generate_dataset.py and fine-tune/train_unsloth.py respectively. Both have less capability (no argparse, no dedup, no Layer 2/3 in the dataset script; hardcoded model, inline GGUF export with injection risk in the finetune script).

The one genuinely novel piece in generate-b00t-training-data.py is Rust /// doc comment extraction β€” that logic should be merged into fine-tune/generate_dataset.py as Layer 4. The 0.8B sm0l finetune scenario is valid but belongs as fine-tune/config.sm0l.yaml, not a parallel script. See #532.


βœ… What's good

  • Transparent proxy preserving upstream status codes and raw body β€” correct
  • Soul-config backend discovery order (env β†’ TCP probe β†’ remote β†’ fallback) is clean
  • Per-consumer Spotlight telemetry with latency is the right observability shape
  • key list truncates to 12 chars β€” good hygiene
  • No hardcoded secrets in soul config β€” all via key_env references
  • b00tyverse submodule update is housekeeping

DeepSeek-specific note

The DeepSeek backend (api.deepseek.com, key via DEEPSEEK_API_KEY) is wired identically to OpenAI β€” correct, since DeepSeek's API is OpenAI-compatible. No special handling needed. The soul config ordering matters: if DeepSeek is listed before OpenAI, it will be probed first for remote fallback. Ensure the ordering in server-soul.tomllm reflects your preference.


Verdict: HOLD on merge pending #529, #530, #531 (security). #532, #533 are strong-should-fix but could land in a follow-up if the security items are patched first.

@elasticdotventures

Copy link
Copy Markdown
Owner Author

Review posted (see full comment thread). Issues filed: #529 #530 #531 #532 #533

…andard MCPs

- vendor/rust-docs-mcp-server: re-pin from dead 7f7d0b0a (upstream force-pushed
  away -> upload-pack 'not our ref') to live HEAD 4303cbdd; builds
  rustdocs_mcp_server v1.3.1
- _b00t_/codebase-memory.mcp.toml: remove duplicate [b00t].version key (literal
  0.8.1 collided with the detection command -> 'Failed to parse MCP config
  TOML'); point version command at the PATH-resolvable binary
- _b00t_/b00t-core-mcps.cli.toml: promote codebase-memory + rust-doc to the
  standard MCP_SERVERS set with a binary-backed build gate (b00t cli install
  before b00t mcp install) in both install and update scripts
- opencode.json: register both MCPs for the project runtime
…ht telemetry

b00t-mcp/src/server_llm.rs (NEW): transparent proxy with /v1/models,
  /v1/chat/completions, /v1/embeddings. Validates b00t-issued keys,
  forwards to upstream (env: B00T_SERVER_UPSTREAM_URL/KEY, default
  api.openai.com). Emits Spotlight usage events to ~/.b00t/spotlight.jsonl.
  Key persistence via shared ~/.b00t/server-keys.json.

b00t-mcp/src/main.rs: --llm flag starts HTTP mode + mounts llm_router.
  Dev mode (ACL bypass_oauth) skips key validation.

b00t-cli/src/commands/server.rs (NEW):
  - 'b00t server start' β†’ spawns b00t-mcp --http --llm
  - 'b00t server key create --consumer X' β†’ generates b00t-sk-* key
  - 'b00t server key list' β†’ reads registered keys

b00tyverse: symlink b00tyverse β†’ vendor (b00t's forked-utility ecosystem)

Plan: _b00t_/plans/b00t-server-architecture.tomllmd
…profile env keys

- discover_local_backend(): TCP connect probes ports 8181 (mistral.rs), 8080
  (llama.cpp), 8000/8001 (vLLM) β€” no HTTP runtime needed (safe inside async)
- resolve_upstream(): priority chain: B00T_SERVER_UPSTREAM_URL (explicit) >
  local auto-discovery > remote keys (OPENAI_API_KEY, B00T_AI_CH0NKY_KEY,
  B00T_AI_FRONTIER_KEY, OPENROUTER_API_KEY)
- Empty env vars treated as unset (no false-positives)
- LlmState::new() auto-discovers; LlmState::from_config(url,key) for tests
- Startup logs the resolved upstream (or warns if nothing configured)
…ackends

Replace the hardcoded LOCAL_BACKENDS const with a runtime .tomllm config:
~/.b00t/server-soul.tomllm β€” local, uncommitted.

SoulConfig struct (Serialize+Deserialize via toml/serde):
- soul.hostname: auto-detected, soul.blessings: consumer tags
- backends.local: [{name, port, kind, enabled}] β€” TCP probed on startup
- backends.remote: [{name, key_env, base_url?}] β€” env var checked in order

Auto-seeds a default config on first run; operator edits to add/remove
backends. No hardcoded IPs or server names in the binary.
Multi-profile support: each remote backend declares its own key_env.

Priority chain: B00T_SERVER_UPSTREAM_URL > soul/local > soul/remote.
Empty env vars treated as unset.
- Qwen2.5-0.5B GGUF (630M params, 485MB) via prebuilt llama-server binary
- debian:bookworm-slim base (~80MB), no Python/pip build deps
- OpenAI-compat API at port 8080 (/v1/models, /v1/chat/completions)
- Verified end-to-end: b00t-server proxy β†’ container β†’ inference
  'Hello!' response at 93ms latency, logged to Spotlight
…a.cpp:server

- Deterministic discovery: podman search + GitHub API release lookup
- Qwen3.5-4B-Q4_K_M served through b00t-server proxy chain
- 4.2B params, 2.7GB, 256K context, token prediction: ~11 tok/s CPU
- chat template: Qwen3.5 reasoning mode (reasoning_content)
…→ GGUF

- scripts/generate-b00t-training-data.py: extracts 4474 instruction pairs
  from AGENTS.md, Rust doc comments, datum tomls, skills, justfile
- scripts/finetune-b00t.py: unsloth QLoRA (4-bit + LoRA r=16) on
  Qwen3.5-4B, ChatML format, SFTTrainer, merge + GGUF export
- _b00t_/b00t-finetune.cli.toml: deterministic command datum β€”
  generate β†’ install deps β†’ finetune β†’ serve, with HF lookup cmds
Training completed: 200 steps, loss 3.26β†’2.09, 6.4M trainable params.
LoRA saved: ~/.b00t/training/output/b00t-lora/
FP16 merged: ~/.b00t/training/output/b00t-merged-fp16/
GGUF: ~/.b00t/training/output/b00t-finetuned.Q4_K_M.gguf (504MB)

⚠️ GGUF conversion dependency: unsloth's fork produces GGUF format
incompatible with stock ghcr.io/ggml-org/llama.cpp:server. Fix: build
llama-quantize from llama.cpp master, or use unsloth's bundled server.
_b00t_/b00t-server-soul.tomllm: canonical state β€” architecture, services,
  environment, deterministic commands, sharp corners, sm3lly notes.

_b00t_/datums/b00t-exchange-protocols.learn.tomllm: how two b00t agents
  connect and share compute β€” discovery, key exchange, backend sharing,
  finetune delegation, Spotlight sync, soul mirroring.

justfile: connect-sm3lly, share-training, delegate-finetune,
  sync-spotlight, mirror-soul recipes.
… subprocess.run()

- #529: Add require_auth() guard returning 401 on missing/invalid tokens
  in list_models, proxy_chat, proxy_embeddings handlers
- #530: Remove dev_mode bypass entirely β€” dev iteration uses real keys
  via 'b00t server key create --consumer dev'
- #531: Replace os.system(f-string) with subprocess.run(list) in
  finetune-b00t.py to prevent shell injection via B00T_OUTPUT_DIR

Closes #529, #530, #531
- Add notify crate dependency for filesystem watching
- Spawn watcher on ~/.b00t/ directory in LlmState::from_config()
- On keys file change: re-read and swap in-memory HashMap
- Add reload_keys() method for manual reload
- Add test_reload_keys test with isolated temp state
- Unify dirs_next() to use dirs::home_dir() for path consistency

CLI 'b00t server key create' now takes effect immediately without
server restart. File watcher triggers on Create/Modify events in the
keys file parent directory.

Closes #533
@elasticdotventures elasticdotventures force-pushed the task/sm3lly-b00t-server branch from b3d4903 to 548116c Compare June 26, 2026 11:32
@elasticdotventures elasticdotventures merged commit 433afbe into main Jun 26, 2026
11 of 13 checks passed
@elasticdotventures elasticdotventures deleted the task/sm3lly-b00t-server branch June 26, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant