-
Notifications
You must be signed in to change notification settings - Fork 467
Description
Summary
When joining vector_top_k() with a regular table using JOIN ... ON, the SQLite query planner inverts the join order when the regular table has WHERE predicates. This causes the virtual table xFilter to be called once per row of the regular table instead of once total, resulting in catastrophic performance degradation.
Reproduction
Database: ~12,000 entries with 512-dim float32 embeddings, WAL mode.
-- Index definition
CREATE INDEX idx_entries_embedding ON entries (
libsql_vector_idx(embedding, 'metric=cosine', 'compress_neighbors=float8', 'max_neighbors=50')
);
-- FAST (10ms): vector_top_k alone
SELECT id FROM vector_top_k('idx_entries_embedding', vector32(?), 50);
-- FAST (10ms): JOIN without WHERE clause on entries
SELECT e.id FROM vector_top_k('idx_entries_embedding', vector32(?), 50) AS v
JOIN entries AS e ON e.rowid = v.id;
-- SLOW (26,000ms, 100% CPU): JOIN with WHERE clause on entries
SELECT e.id FROM vector_top_k('idx_entries_embedding', vector32(?), 50) AS v
JOIN entries AS e ON e.rowid = v.id
WHERE e.superseded_by IS NULL;
-- FAST (6ms): CROSS JOIN forces correct order
SELECT e.id FROM vector_top_k('idx_entries_embedding', vector32(?), 50) AS v
CROSS JOIN entries AS e ON e.rowid = v.id
WHERE e.superseded_by IS NULL;
-- FAST (11ms): Subquery avoids join entirely
SELECT * FROM entries
WHERE rowid IN (SELECT id FROM vector_top_k('idx_entries_embedding', vector32(?), 50))
AND superseded_by IS NULL;Analysis
When the entries table has a WHERE predicate (superseded_by IS NULL), the query planner decides to scan entries first and probe vector_top_k in the inner loop. With ~11,500 qualifying rows, this means ~11,500 ANN searches instead of 50 rowid lookups.
V8 profiler confirms 99.9% of CPU time is spent in the libsql native module executing repeated vector searches.
The root cause appears to be that vector_top_k's xBestIndex does not report a high enough cost to prevent the planner from placing it in the inner loop when there are competing predicates on the joined table.
Workarounds
- Use
CROSS JOINinstead ofJOIN(forces left-to-right evaluation) - Use
WHERE rowid IN (SELECT id FROM vector_top_k(...))instead of JOIN - Split into two application-level queries
Environment
- libsql native: 0.5.22
- @libsql/client: 0.17.0
- macOS arm64 (Apple M2)
- Node.js v25.5.0
Notes
This is the same class of issue documented for FTS5 virtual tables on the SQLite forum:
- https://sqlite.org/forum/info/509bdbe534f58f20 (FTS5 JOIN: 170s vs subquery: 439ms)
- https://sqlite.org/forum/info/81e6aa8343402ba5 (virtual table covering index limitations)
The Turso DiskANN blog post (https://turso.tech/blog/approximate-nearest-neighbor-search-with-diskann-in-libsql) shows the WHERE rowid IN pattern first, suggesting awareness of this issue.