miniLLM — Text-to-SQL Fine-tune + Infra Kickstart
Overview
This repository kickstarts a practical "Algorithm + Infra" project:
- Fine-tune a 7B–8B instruct model for Text-to-SQL with QLoRA
- Provide a baseline FastAPI inference service with model caching + adapter loading
- Include structured evaluation (Exact Match + optional execution match)
- Prepare for later optimization with vLLM
Hardware & Software
- GPU: 24GB VRAM recommended
- OS: Linux
- Python: 3.10+
- CUDA: 12.1+ (for GPU inference/training)
Quickstart
- Manage env and deps with uv (preferred)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync- Train with QLoRA (SFT)
# Default uses Qwen/Qwen2.5-7B-Instruct and b-mc2/sql-create-context
bash scripts/train.sh- Evaluate on 20 samples (and save a JSON report)
bash scripts/eval.sh
# optional execution-based metric:
WITH_EXECUTION=1 bash scripts/eval.sh- Serve a baseline API (FastAPI + Transformers, with in-process model cache)
bash scripts/serve.sh
# POST http://localhost:8000/generate_sql with JSON:
# {"schema": "CREATE TABLE ...", "question": "...", "adapter_path": "outputs/sft-qwen2.5-7b-instruct-sql"}Dataset (recommended start)
- b-mc2/sql-create-context — schema-in-context Text-to-SQL dataset
Notes
- Default base model is Qwen/Qwen2.5-7B-Instruct to avoid gated access; you can switch to Meta Llama 3 8B Instruct by passing --model-name-or-path.
- If running GPU training, ensure the correct CUDA-enabled torch is installed. See: https://download.pytorch.org/whl/
- You can still use pip: bash scripts/install.sh