Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 684805a333
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
sia_code/storage/usearch_backend.py
Outdated
| INSERT OR REPLACE INTO chunks ( | ||
| uri, | ||
| symbol, | ||
| chunk_type, | ||
| file_path, |
There was a problem hiding this comment.
Purge old vectors when replacing chunk rows
Using INSERT OR REPLACE for chunks creates a new row id whenever an existing uri is reindexed, which deletes the old row and inserts a fresh one. The vector index is only ever appended to later in store_chunks_batch and there is no corresponding removal path for the old row id, so reindexing the same file leaves orphaned vectors in the HNSW index. When embeddings are enabled, semantic search can then return IDs that no longer exist in SQLite (they get filtered out), reducing recall and steadily bloating the vector index over time. Consider deleting the old vector entry before replace or switching to a deterministic row id/upsert that preserves IDs.
Useful? React with 👍 / 👎.
684805a to
865e368
Compare
Ensure sqlite-vec/usearch store and import commit_hash/commit_time consistently, apply schema migrations on open for writable indexes, and extend tests to cover parity and legacy schema.
Auto-detect linked git worktrees and store the index in the git common dir by default, while keeping per-repo for normal checkouts. Adds unit tests and updates docs.
Motivation
Description
UsearchSqliteBackend.store_chunks_batchwith anINSERT OR REPLACEupsert to avoid UNIQUE constraint errors and DB corruption during reindexing (plus schema/column safety checks in_create_tables).get_git_commit_contextincli,_get_git_commit_contextinindexer.coordinator, and attachingcommit_hash/commit_timeto chunks and memory writes soGitSyncServiceand backend storage receive and persist them.commit_hashandcommit_time(changes incore.models,storage.base, andusearch_backend), add DB columns for these fields, and propagate them through memory import/export and sync flows.PYTHONPATHfor subprocess runs), and converting memvid-specific test code to use the unified backend API (updates intests/test_basic.py,tests/test_cli_integration.py,tests/test_empty_code_fix.py,tests/test_usearch_backend.py, andtests/unit/test_git_sync.py).Testing
PYTHONPATH=. pytestwhich completed successfully with202 passed, 235 skippedand no failures.tests/integration/test_watch_mode.py::TestWatchModeIndexing::test_chunk_index_tracks_stale_chunkswhich passed after the upsert fix.Codex Task