Stabilize chunk upserts, attach git commit context to memory, and stabilize tests by DxTa · Pull Request #10 · DxTa/sia-code

DxTa · 2026-02-06T22:20:15Z

Motivation

Prevent SQLite corruption and inconsistent indexing during re-indexing by making chunk upserts idempotent.
Preserve git provenance (commit hash and commit time) for memory items (decisions, timeline events, changelogs) to improve traceability and auditing.
Make the test suite stable and offline-friendly by disabling embeddings in CI flows and aligning tests with the current backend API.

Description

Replace duplicate-chunk handling in UsearchSqliteBackend.store_chunks_batch with an INSERT OR REPLACE upsert to avoid UNIQUE constraint errors and DB corruption during reindexing (plus schema/column safety checks in _create_tables).
Thread git provenance through the stack by adding get_git_commit_context in cli, _get_git_commit_context in indexer.coordinator, and attaching commit_hash/commit_time to chunks and memory writes so GitSyncService and backend storage receive and persist them.
Extend models and storage APIs to include commit_hash and commit_time (changes in core.models, storage.base, and usearch_backend), add DB columns for these fields, and propagate them through memory import/export and sync flows.
Stabilize tests by updating imports to the current backend, disabling embeddings for CLI/integration tests (set PYTHONPATH for subprocess runs), and converting memvid-specific test code to use the unified backend API (updates in tests/test_basic.py, tests/test_cli_integration.py, tests/test_empty_code_fix.py, tests/test_usearch_backend.py, and tests/unit/test_git_sync.py).

Testing

Ran the full test suite with PYTHONPATH=. pytest which completed successfully with 202 passed, 235 skipped and no failures.
Re-ran the previously failing integration case tests/integration/test_watch_mode.py::TestWatchModeIndexing::test_chunk_index_tracks_stale_chunks which passed after the upsert fix.
Verified targeted backend behaviors (chunk upserts and incremental indexing) with small repro scripts during debugging, all showing expected outcomes.

Codex Task

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 684805a333

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-06T22:22:32Z

sia_code/storage/usearch_backend.py

+                INSERT OR REPLACE INTO chunks (
+                    uri,
+                    symbol,
+                    chunk_type,
+                    file_path,


Purge old vectors when replacing chunk rows

Using INSERT OR REPLACE for chunks creates a new row id whenever an existing uri is reindexed, which deletes the old row and inserts a fresh one. The vector index is only ever appended to later in store_chunks_batch and there is no corresponding removal path for the old row id, so reindexing the same file leaves orphaned vectors in the HNSW index. When embeddings are enabled, semantic search can then return IDs that no longer exist in SQLite (they get filtered out), reducing recall and steadily bloating the vector index over time. Consider deleting the old vector entry before replace or switching to a deterministic row id/upsert that preserves IDs.

Useful? React with 👍 / 👎.

Ensure sqlite-vec/usearch store and import commit_hash/commit_time consistently, apply schema migrations on open for writable indexes, and extend tests to cover parity and legacy schema.

Auto-detect linked git worktrees and store the index in the git common dir by default, while keeping per-repo for normal checkouts. Adds unit tests and updates docs.

DxTa added the codex label Feb 6, 2026 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Feb 6, 2026

View reviewed changes

Stabilize tests and chunk upserts

865e368

DxTa force-pushed the codex/find-multiple-git-worktrees-support branch from 684805a to 865e368 Compare February 7, 2026 14:18

DxTa added 2 commits February 7, 2026 18:42

fix: preserve commit context across backends

7093a79

Ensure sqlite-vec/usearch store and import commit_hash/commit_time consistently, apply schema migrations on open for writable indexes, and extend tests to cover parity and legacy schema.

feat: default to shared index in worktrees

d87ae0c

Auto-detect linked git worktrees and store the index in the git common dir by default, while keeping per-repo for normal checkouts. Adds unit tests and updates docs.

DxTa merged commit 2d688ee into main Feb 7, 2026
15 checks passed

DxTa deleted the codex/find-multiple-git-worktrees-support branch February 7, 2026 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize chunk upserts, attach git commit context to memory, and stabilize tests#10

Stabilize chunk upserts, attach git commit context to memory, and stabilize tests#10
DxTa merged 3 commits intomainfrom
codex/find-multiple-git-worktrees-support

DxTa commented Feb 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DxTa commented Feb 6, 2026

Motivation

Description

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant