Skip to content

Commit 474e507

Browse files
committed
fix: transaction safety, resource cleanup, and truncation signaling (#258, #259, #260, #261)
Four reliability hardening fixes that improve crash safety, resource cleanup, and API transparency across the graph pipeline. Fix 1: wrap incremental_trace_flows DELETEs in a transaction (#258) ------------------------------------------------------------------- File: code_review_graph/flows.py:499-502 The DELETE loop in incremental_trace_flows deleted flow_memberships and flows rows one at a time without an explicit transaction. A crash mid-loop could leave orphaned flow_memberships rows pointing at deleted flows. Fix: wrap the DELETE loop in BEGIN IMMEDIATE / COMMIT / ROLLBACK, matching the pattern already used by store_flows at line 406. Includes an in_transaction guard to flush any implicit transaction first. Test: test_incremental_trace_flows_delete_is_atomic — verifies no orphaned memberships exist after an incremental re-trace. Fix 2: atomic FTS index rebuild (#259) -------------------------------------- File: code_review_graph/search.py:39-54 rebuild_fts_index performed DROP, CREATE, and INSERT across three separate statements with commits between them. A crash after DROP but before CREATE would leave the DB without an FTS table entirely. Fix: wrap the entire DROP + CREATE + INSERT sequence in a single BEGIN IMMEDIATE transaction so the operation is all-or-nothing. Test: test_fts_rebuild_is_atomic — verifies double-rebuild leaves the table intact and queryable. Fix 3: EmbeddingStore context manager support (#260) ---------------------------------------------------- File: code_review_graph/embeddings.py:391+ EmbeddingStore opens a SQLite connection in __init__ but did not implement __enter__/__exit__. If an exception occurred during usage before close() was called, the connection leaked. Fix: add __enter__ and __exit__ methods following the same pattern as GraphStore (graph.py:154-157). Test: test_supports_context_manager, test_context_manager_closes_on_exception. Fix 4: truncation signal for find_dependents (#261) ---------------------------------------------------- File: code_review_graph/incremental.py:434-465 When find_dependents hit _MAX_DEPENDENT_FILES (500), it truncated the result and logged a warning, but the caller received a plain list with no indication that it was incomplete. Callers had to guess whether they got the full dependent set or a truncated one. Fix: introduce DependentList (a transparent list subclass with a .truncated boolean attribute). Existing callers that iterate, len(), or slice continue to work unchanged; only callers that specifically check .truncated benefit from the signal. Tests: test_truncated_flag_set_when_capped (600 deps, expects True), test_truncated_flag_false_when_not_capped (small chain, expects False). Files changed ------------- - code_review_graph/flows.py — transaction wrap around DELETE loop - code_review_graph/search.py — atomic FTS rebuild - code_review_graph/embeddings.py — __enter__/__exit__ on EmbeddingStore - code_review_graph/incremental.py — DependentList with .truncated flag - tests/test_flows.py — orphan membership check - tests/test_search.py — double-rebuild atomicity test - tests/test_embeddings.py — context manager tests - tests/test_incremental.py — truncation flag tests Test results ------------ Stage 1 (new targeted tests): 6/6 passed. Stage 2 (all affected test files): 137 passed, 3 pre-existing failures (find_repo_root flakiness + UTF-8 gitignore — covered by PRs #274/#242). Stage 3 (full suite): 784 passed, 8 pre-existing Windows failures (all covered by parallel PRs). Stage 4 (ruff check on all 8 changed files): clean. Stage 5 (mypy on all 4 changed source files): clean. Zero regressions.
1 parent 80d22bf commit 474e507

8 files changed

Lines changed: 206 additions & 25 deletions

File tree

code_review_graph/embeddings.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -418,6 +418,12 @@ def __init__(
418418

419419
self._conn.commit()
420420

421+
def __enter__(self) -> "EmbeddingStore":
422+
return self
423+
424+
def __exit__(self, exc_type, exc_val, exc_tb) -> None: # type: ignore[no-untyped-def]
425+
self.close()
426+
421427
def close(self) -> None:
422428
self._conn.close()
423429

code_review_graph/flows.py

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -496,10 +496,22 @@ def incremental_trace_flows(
496496
# ------------------------------------------------------------------
497497
# 3. Delete affected flows and their memberships
498498
# ------------------------------------------------------------------
499-
for fid in affected_ids:
500-
conn.execute("DELETE FROM flow_memberships WHERE flow_id = ?", (fid,))
501-
conn.execute("DELETE FROM flows WHERE id = ?", (fid,))
502-
conn.commit()
499+
# Wrap in an explicit transaction so a crash mid-loop cannot leave
500+
# orphaned flow_memberships rows pointing at deleted flows. See #258.
501+
if affected_ids:
502+
if conn.in_transaction:
503+
conn.commit()
504+
conn.execute("BEGIN IMMEDIATE")
505+
try:
506+
for fid in affected_ids:
507+
conn.execute(
508+
"DELETE FROM flow_memberships WHERE flow_id = ?", (fid,),
509+
)
510+
conn.execute("DELETE FROM flows WHERE id = ?", (fid,))
511+
conn.commit()
512+
except BaseException:
513+
conn.rollback()
514+
raise
503515

504516
# ------------------------------------------------------------------
505517
# 4. Re-detect entry points and filter to relevant ones

code_review_graph/incremental.py

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -431,15 +431,39 @@ def _single_hop_dependents(store: GraphStore, file_path: str) -> set[str]:
431431
return dependents
432432

433433

434+
class DependentList(list):
435+
"""A ``list[str]`` with a ``.truncated`` flag.
436+
437+
When :func:`find_dependents` hits ``_MAX_DEPENDENT_FILES`` it truncates
438+
the result and sets ``truncated = True`` so callers can distinguish a
439+
complete expansion from a capped one. See issue #261.
440+
441+
This is a transparent ``list`` subclass — existing callers that iterate,
442+
``len()``, or slice continue to work unchanged; only callers that
443+
specifically check ``.truncated`` benefit from the signal.
444+
"""
445+
446+
truncated: bool
447+
448+
def __init__(self, items: list, *, truncated: bool = False) -> None:
449+
super().__init__(items)
450+
self.truncated = truncated
451+
452+
434453
def find_dependents(
435454
store: GraphStore,
436455
file_path: str,
437456
max_hops: int = _MAX_DEPENDENT_HOPS,
438-
) -> list[str]:
457+
) -> DependentList:
439458
"""Find files that import from or depend on the given file.
440459
441460
Performs up to *max_hops* iterations of expansion (default 2).
442461
Stops early if the total exceeds 500 files.
462+
463+
Returns a :class:`DependentList` — a regular ``list[str]`` that also
464+
carries a ``.truncated`` flag. When ``truncated is True`` the
465+
returned list is capped at ``_MAX_DEPENDENT_FILES`` and the full
466+
set of dependents was not explored. See issue #261.
443467
"""
444468
all_dependents: set[str] = set()
445469
visited: set[str] = {file_path}
@@ -460,9 +484,11 @@ def find_dependents(
460484
"Dependent expansion capped at %d files for %s",
461485
len(all_dependents), file_path,
462486
)
463-
# Truncate to the cap
464-
return list(all_dependents)[:_MAX_DEPENDENT_FILES]
465-
return list(all_dependents)
487+
return DependentList(
488+
list(all_dependents)[:_MAX_DEPENDENT_FILES],
489+
truncated=True,
490+
)
491+
return DependentList(list(all_dependents))
466492

467493

468494
def _parse_single_file(

code_review_graph/search.py

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -35,23 +35,30 @@ def rebuild_fts_index(store: GraphStore) -> int:
3535
# the FTS5 virtual table DDL, which is tightly coupled to SQLite internals.
3636
conn = store._conn
3737

38-
# Drop and recreate the FTS table to avoid content-sync mismatch issues
39-
conn.execute("DROP TABLE IF EXISTS nodes_fts")
40-
conn.execute("""
41-
CREATE VIRTUAL TABLE nodes_fts USING fts5(
42-
name, qualified_name, file_path, signature,
43-
tokenize='porter unicode61'
44-
)
45-
""")
46-
conn.commit()
47-
48-
# Populate from nodes table
49-
conn.execute("""
50-
INSERT INTO nodes_fts(rowid, name, qualified_name, file_path, signature)
51-
SELECT id, name, qualified_name, file_path, COALESCE(signature, '')
52-
FROM nodes
53-
""")
54-
conn.commit()
38+
# Wrap the full DROP + CREATE + INSERT sequence in an explicit transaction
39+
# so a crash mid-rebuild cannot leave the DB without an FTS table at all
40+
# (DROP succeeded but CREATE/INSERT didn't). See #259.
41+
if conn.in_transaction:
42+
conn.commit()
43+
conn.execute("BEGIN IMMEDIATE")
44+
try:
45+
conn.execute("DROP TABLE IF EXISTS nodes_fts")
46+
conn.execute("""
47+
CREATE VIRTUAL TABLE nodes_fts USING fts5(
48+
name, qualified_name, file_path, signature,
49+
tokenize='porter unicode61'
50+
)
51+
""")
52+
# Populate from nodes table
53+
conn.execute("""
54+
INSERT INTO nodes_fts(rowid, name, qualified_name, file_path, signature)
55+
SELECT id, name, qualified_name, file_path, COALESCE(signature, '')
56+
FROM nodes
57+
""")
58+
conn.commit()
59+
except BaseException:
60+
conn.rollback()
61+
raise
5562

5663
count = conn.execute("SELECT count(*) FROM nodes_fts").fetchone()[0]
5764
logger.info("FTS index rebuilt: %d rows indexed", count)

tests/test_embeddings.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -345,3 +345,30 @@ def test_get_provider_minimax_without_key_raises(self):
345345
with patch.dict("os.environ", {}, clear=True):
346346
with pytest.raises(ValueError, match="MINIMAX_API_KEY"):
347347
get_provider("minimax")
348+
349+
350+
class TestEmbeddingStoreContextManager:
351+
"""Regression tests for #260: EmbeddingStore must support the context
352+
manager protocol so connections are cleaned up on exception."""
353+
354+
def test_supports_context_manager(self, tmp_path):
355+
db = tmp_path / "embed_ctx.db"
356+
with EmbeddingStore(db) as store:
357+
assert store is not None
358+
assert store.db_path == db
359+
# After exiting, connection should be closed.
360+
# (Attempting another query would fail, but we don't test that
361+
# because close() doesn't invalidate the object — it just
362+
# closes the underlying sqlite3 connection.)
363+
364+
def test_context_manager_closes_on_exception(self, tmp_path):
365+
db = tmp_path / "embed_err.db"
366+
try:
367+
with EmbeddingStore(db) as store:
368+
assert store.db_path == db
369+
raise RuntimeError("simulated crash")
370+
except RuntimeError:
371+
pass
372+
# The connection was closed by __exit__ even though an exception
373+
# was raised. This is the whole point of #260 — without the
374+
# context manager, the connection would leak.

tests/test_flows.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -558,3 +558,31 @@ def test_incremental_trace_flows_no_affected_flows(self):
558558
assert count == 0
559559
# Original flows unchanged.
560560
assert len(get_flows(self.store)) == initial_count
561+
562+
def test_incremental_trace_flows_delete_is_atomic(self):
563+
"""Regression test for #258: the DELETE loop in incremental_trace_flows
564+
must be wrapped in a transaction so a crash mid-loop cannot leave
565+
orphaned flow_memberships rows."""
566+
self._add_func("handler", path="routes.py")
567+
self._add_func("service", path="services.py")
568+
self._add_call("routes.py::handler", "services.py::service", "routes.py")
569+
570+
flows = trace_flows(self.store)
571+
store_flows(self.store, flows)
572+
assert len(get_flows(self.store)) > 0
573+
574+
# Incremental trace touching routes.py should delete old flows and
575+
# re-trace them. The key assertion is that this does NOT raise
576+
# "cannot start a transaction within a transaction" and that the
577+
# DB ends in a consistent state.
578+
count = incremental_trace_flows(self.store, ["routes.py"])
579+
# The re-trace should find the same entry points.
580+
assert count >= 0
581+
# No orphaned memberships: every membership references a valid flow.
582+
conn = self.store._conn
583+
orphans = conn.execute(
584+
"SELECT fm.flow_id FROM flow_memberships fm "
585+
"LEFT JOIN flows f ON f.id = fm.flow_id "
586+
"WHERE f.id IS NULL"
587+
).fetchall()
588+
assert len(orphans) == 0, f"found {len(orphans)} orphaned memberships"

tests/test_incremental.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -592,3 +592,58 @@ def test_cap_triggers_on_many_files(self, tmp_path):
592592
assert len(deps) <= 500
593593
finally:
594594
store.close()
595+
596+
def test_truncated_flag_set_when_capped(self, tmp_path):
597+
"""Regression test for #261: find_dependents must set
598+
DependentList.truncated = True when the result is capped."""
599+
from code_review_graph.parser import EdgeInfo, NodeInfo
600+
601+
db_path = tmp_path / "trunc.db"
602+
store = GraphStore(db_path)
603+
try:
604+
store.upsert_node(NodeInfo(
605+
kind="File", name="/hub.py", file_path="/hub.py",
606+
line_start=1, line_end=10, language="python",
607+
))
608+
store.upsert_node(NodeInfo(
609+
kind="Function", name="hub_func", file_path="/hub.py",
610+
line_start=2, line_end=8, language="python",
611+
))
612+
for i in range(600):
613+
path = f"/dep{i}.py"
614+
store.upsert_node(NodeInfo(
615+
kind="File", name=path, file_path=path,
616+
line_start=1, line_end=10, language="python",
617+
))
618+
store.upsert_node(NodeInfo(
619+
kind="Function", name=f"func_{i}", file_path=path,
620+
line_start=2, line_end=8, language="python",
621+
))
622+
store.upsert_edge(EdgeInfo(
623+
kind="IMPORTS_FROM", source=f"{path}::func_{i}",
624+
target="/hub.py::hub_func", file_path=path, line=1,
625+
))
626+
store.commit()
627+
628+
deps = find_dependents(store, "/hub.py", max_hops=5)
629+
assert len(deps) <= 500
630+
# The key assertion: truncated flag must be set.
631+
assert deps.truncated is True, (
632+
"DependentList.truncated should be True when capped at "
633+
"_MAX_DEPENDENT_FILES, but it was False"
634+
)
635+
finally:
636+
store.close()
637+
638+
def test_truncated_flag_false_when_not_capped(self, tmp_path):
639+
"""Regression test for #261: find_dependents must set
640+
DependentList.truncated = False when the result is complete."""
641+
store = self._make_chain_store(tmp_path)
642+
try:
643+
deps = find_dependents(store, "/c.py", max_hops=2)
644+
assert deps.truncated is False, (
645+
"DependentList.truncated should be False when the "
646+
"expansion completed without hitting the cap"
647+
)
648+
finally:
649+
store.close()

tests/test_search.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,3 +248,23 @@ def test_fts_query_with_special_chars(self):
248248
results = hybrid_search(self.store, dangerous_query)
249249
# Just assert no exception was raised
250250
assert isinstance(results, list)
251+
252+
def test_fts_rebuild_is_atomic(self):
253+
"""Regression test for #259: rebuild_fts_index must wrap the DROP +
254+
CREATE + INSERT sequence in a single transaction so a crash between
255+
DROP and CREATE cannot leave the DB without an FTS table."""
256+
# Build, rebuild, then verify the table exists and is queryable.
257+
rebuild_fts_index(self.store)
258+
259+
# Verify the FTS table exists and has rows.
260+
conn = self.store._conn
261+
count = conn.execute("SELECT count(*) FROM nodes_fts").fetchone()[0]
262+
assert count > 0
263+
264+
# Rebuild again — must not raise and must leave the table intact.
265+
new_count = rebuild_fts_index(self.store)
266+
assert new_count == count
267+
268+
# Verify search still works after double-rebuild.
269+
results = hybrid_search(self.store, "auth")
270+
assert isinstance(results, list)

0 commit comments

Comments
 (0)