DataSwift is a tool that speeds up query workloads while being safe and easily deployable.
Traditional learned query optimizers often suffer from unpredictable slowdowns on unseen or rare query patterns. DataSwift addresses this by integrating:
- LLM‐derived SQL embeddings and GNN‐based plan encodings for rich query representations
- Inductive Matrix Completion with uncertainty estimation to predict hint performance
- Embedding‐Indexed Memory (EIM) to recall proven hints for similar past queries
- Thompson‐sampling bandit for balanced exploration/exploitation and safe fallback to the default optimizer
- Zero Catastrophic Regressions: Only 0.7 % of queries ever slow down, and none exceed catastrophic thresholds.
- Tail‐Latency Speedups: Achieves a 1.4× speedup on the slowest 5 % of queries and a 1.1× end‐to‐end workload improvement
- Safe Hint Recommendation: Combines IMC’s calibrated predictions with a memory cache and bandit selector to ensure stability.
Full details are located in Extended Abstract.
- Query Embedding
- SQL text → SentenceTransformer → 120‐dim vector
- Plan DAG → GNN → 512‐dim structural vector
- IMC Predictor
- Concatenate embeddings → low‐rank IMC → mean latency (μ) + uncertainty (σ)
- Embedding‐Indexed Memory (EIM)
- Faiss L2 index of past (embedding, best‐hint) → retrieve safe hints
- Bandit Selector
- Arms: IMC suggestion, any EIM hints, default plan
- Thompson sampling to choose hint → execute → update bandit & EIM