Conversation
liututu12
left a comment
There was a problem hiding this comment.
test: add power failure recovery test cases
| if current_batch_size==batch_size: | ||
|
|
||
| doc_ids= [str(i) for i in range(i*batch_size,(i+1)*batch_size)] | ||
| else: | ||
| doc_ids= [str(i) for i in range(i*batch_size, num_docs_to_delete)] |
There was a problem hiding this comment.
Double-multiplication bug produces wrong document IDs
The outer loop variable i iterates as 0, 10, 20, ... 190 (step = batch_size), but each iteration then computes range(i * batch_size, (i+1) * batch_size). This squares the offset: when i = 10 the generated IDs are 100–109 instead of 10–19, when i = 20 the IDs are 200–209, etc. As a result the test will attempt to delete documents that were never inserted (or delete the wrong ones), and many documents in the 10–99 range are never touched.
The fix is to use i directly as the starting offset (it already represents the absolute start position):
| if current_batch_size==batch_size: | |
| doc_ids= [str(i) for i in range(i*batch_size,(i+1)*batch_size)] | |
| else: | |
| doc_ids= [str(i) for i in range(i*batch_size, num_docs_to_delete)] | |
| if current_batch_size==batch_size: | |
| doc_ids= [str(j) for j in range(i, i + batch_size)] | |
| else: | |
| doc_ids= [str(j) for j in range(i, num_docs_to_delete)] |
| single_doc, | ||
| recovered_collection.schema, | ||
| ), f"result doc={fetched_doc},doc_exp={single_doc}" | ||
| break |
There was a problem hiding this comment.
fetched_doc is undefined — will raise NameError on assertion failure
In this branch, fetched_docs (plural) is assigned but the error message f-string references fetched_doc (singular), which is never bound in this scope. If the is_doc_equal assertion fails, Python will raise a NameError on fetched_doc instead of reporting the actual assertion failure, hiding the real error.
The same issue occurs at approximately line 452 in the second loop over query_result.
| single_doc, | |
| recovered_collection.schema, | |
| ), f"result doc={fetched_doc},doc_exp={single_doc}" | |
| break | |
| ), f"result doc={fetched_docs},doc_exp={single_doc}" |
| assert doc.id in fetched_docs | ||
| assert is_doc_equal( | ||
| fetched_docs["1"], exp_doc, recovered_collection.schema | ||
| ), f"result doc={fetched_docs},doc_exp={exp_doc}" |
There was a problem hiding this comment.
Hardcoded key
"1" always looks up the wrong document
fetched_docs["1"] always fetches the document with ID "1" regardless of which document is currently being iterated (doc.id). For any document whose ID is not "1" this will either raise a KeyError (if doc "1" was deleted) or silently verify the wrong document, making the integrity check meaningless.
The same hardcoded key appears again at approximately line 462 in the second loop.
| assert doc.id in fetched_docs | |
| assert is_doc_equal( | |
| fetched_docs["1"], exp_doc, recovered_collection.schema | |
| ), f"result doc={fetched_docs},doc_exp={exp_doc}" | |
| assert is_doc_equal( | |
| fetched_docs[doc.id], exp_doc, recovered_collection.schema | |
| ), f"result doc={fetched_docs},doc_exp={exp_doc}" |
| # Verify existing documents have correct structure | ||
| if len(query_result) > 0: | ||
| for doc in query_result[:100]: # Limit to first 100 for efficiency | ||
| fetched_docs = recovered_collection.fetch([doc.id]) |
There was a problem hiding this comment.
Hardcoded magic number
201 makes the test brittle
This assertion hardcodes the expected document count. If the setup ever changes (e.g., the number of initially inserted documents changes from 200 or the seed document ID 2001 is moved), this assertion will silently pass or fail for the wrong reason. The commented-out line directly above shows the intent was to use a dynamic value. Consider deriving the expected count from initial_doc_count + 1 (captured before closing the collection):
| fetched_docs = recovered_collection.fetch([doc.id]) | |
| assert recovered_collection.stats.doc_count == initial_doc_count + 1 |
| import psutil # Used for more reliable process management | ||
| except ImportError: | ||
| psutil = None # If psutil is not installed, set it to None | ||
| from distance_helper import * |
There was a problem hiding this comment.
Unused
threading import across all nine new test files
threading is imported at the top of every new test file but is never referenced anywhere in the file body. This applies to all 9 files introduced in this PR:
test_collection_crash_recovery_addcolumn.py:29test_collection_crash_recovery_altercolumn.py:29test_collection_crash_recovery_createindex.py:29test_collection_crash_recovery_deletedoc.py:29test_collection_crash_recovery_deleteindex.py:29test_collection_crash_recovery_dropcolumn.py:29test_collection_crash_recovery_insertdoc.py:29test_collection_crash_recovery_updatedoc.py:29test_collection_crash_recovery_upsertdoc.py:29
Remove the import from all files.
| print(f"[Subprocess] Starting Zvec insert document operations on {collection_path} at: {time.strftime('%Y-%m-%d %H:%M:%S')}") | ||
| print(f"[Subprocess] Will insert {num_docs_to_delete} documents in batches of {batch_size}") | ||
|
|
There was a problem hiding this comment.
Misleading print statement says "insert" instead of "delete"
The log message reads "Starting Zvec insert document operations" inside run_zvec_deletedoc_operations, which is a deletion function. This makes debugging confusing. Likewise, the next line says "Will insert N documents" instead of "Will delete N documents". These copy-paste leftovers should be corrected to reference deletion.
| import psutil # Used for more reliable process management | ||
| except ImportError: | ||
| psutil = None # If psutil is not installed, set it to None | ||
| from distance_helper import * |
| except ImportError: | ||
| psutil = None # If psutil is not installed, set it to None | ||
| from distance_helper import * | ||
| from fixture_helper import * |
| psutil = None # If psutil is not installed, set it to None | ||
| from distance_helper import * | ||
| from fixture_helper import * | ||
| from doc_helper import * |
| import psutil # Used for more reliable process management | ||
| except ImportError: | ||
| psutil = None # If psutil is not installed, set it to None | ||
| from distance_helper import * |
| except ImportError: | ||
| psutil = None # If psutil is not installed, set it to None | ||
| from distance_helper import * | ||
| from fixture_helper import * |
| psutil = None # If psutil is not installed, set it to None | ||
| from distance_helper import * | ||
| from fixture_helper import * | ||
| from doc_helper import * |
| import psutil # Used for more reliable process management | ||
| except ImportError: | ||
| psutil = None # If psutil is not installed, set it to None | ||
| from distance_helper import * |
| except ImportError: | ||
| psutil = None # If psutil is not installed, set it to None | ||
| from distance_helper import * | ||
| from fixture_helper import * |
| psutil = None # If psutil is not installed, set it to None | ||
| from distance_helper import * | ||
| from fixture_helper import * | ||
| from doc_helper import * |
| import psutil # Used for more reliable process management | ||
| except ImportError: | ||
| psutil = None # If psutil is not installed, set it to None | ||
| from fixture_helper import * |
Add power failure recovery related test cases
Greptile Summary
This PR adds nine new test files covering crash-recovery (power-failure simulation) scenarios for Zvec collection operations: document insert, update, upsert, delete, index creation/deletion, and column add/alter/drop. The shared pattern across all tests is sound — create a collection in the main process, spawn a subprocess that performs the target operation in a loop, kill it mid-flight with SIGKILL, then reopen the collection and assert it is usable.
Key issues found:
test_collection_crash_recovery_deletedoc.py— The subprocess script's document-ID generation has a double-multiplication bug: the outer loop variableialready steps bybatch_size(values 0, 10, 20…), but IDs are computed asrange(i * batch_size, (i+1) * batch_size), producing ranges 0–9, 100–109, 200–209 … instead of 0–9, 10–19, 20–29. Documents in the 10–99 range are never deleted, and IDs above the inserted range may be targeted.test_collection_crash_recovery_deletedoc.py— Two f-string error messages in the recovery-verification section referencefetched_doc(undefined) instead offetched_docs, which would raise a secondaryNameErroron assertion failure, masking the real error.test_collection_crash_recovery_deletedoc.py— Twois_doc_equalcalls use the hardcoded keyfetched_docs["1"]instead offetched_docs[doc.id], meaning the check always validates document"1"regardless of the current iteration.test_collection_crash_recovery_updatedoc.py— The post-crash doc-count assertion hardcodes== 201rather than using the dynamically capturedinitial_doc_count + 1.threadingis imported but never used.Confidence Score: 2/5
test_collection_crash_recovery_deletedoc.pyhas three distinct correctness bugs: an off-by-a-factor-of-batch_size error in ID generation (deletes the wrong documents), undefined variable references in two error-message f-strings, and hardcoded dictionary keys that silently verify the wrong document. Together these mean the deletion crash-recovery test does not actually validate what it claims to validate. Theupdatedocfile also has a brittle hardcoded count assertion. These issues should be fixed before merging.test_collection_crash_recovery_deletedoc.py(multiple logic bugs in both the embedded subprocess script and the recovery-verification section) andtest_collection_crash_recovery_updatedoc.py(hardcoded magic number assertion).Important Files Changed
threadingimport is the only issue.alter_columnoperations. The post-crashalter_columnin step 3.8 is intentionally wrapped in a try/except (expected to fail if already altered), which is acceptable. Unusedthreadingimport present.basic_schemafixture parameter is accepted but unused in test methods. Unusedthreadingimport present.fetched_docin f-string error messages at two locations, and (3) hardcoded dictionary key"1"instead ofdoc.idat two locations. Also has misleading "insert" log messages in deletion code.threadingimport present.drop_column. The subprocess does add-then-drop in a loop to ensure the column exists before each drop attempt, which is a sensible design. Post-crash drop is try/except guarded. Unusedthreadingimport present.threadingimport present.201in the doc count assertion (should beinitial_doc_count + 1). Theis_doc_equalcalls use query result docs as the expected value rather than the original inserted docs, which may not catch regressions. Unusedthreadingimport present.threadingimport present.Sequence Diagram
sequenceDiagram participant T as Test (Main Process) participant F as Filesystem (tempdir) participant S as Subprocess (zvec operations) participant Z as Zvec Collection T->>F: Create collection path in tempdir T->>Z: zvec.create_and_open(path, schema, option) T->>Z: insert N documents T->>Z: del coll (close) T->>F: Write subprocess script to tempfile T->>S: subprocess.Popen([python, script, args_json]) S->>Z: zvec.open(collection_path) S->>Z: Perform operation in loop (insert/delete/update/create_index/etc.) Note over T,S: Main process waits 2–3 seconds T--xS: SIGKILL (simulate power failure) T->>F: os.remove(script file) T->>Z: zvec.open(collection_path) — recovery attempt T->>Z: Verify doc count >= 1 T->>Z: Fetch & validate documents T->>Z: Query with filter T->>Z: Test insert/update/delete post-recovery T->>Z: Re-run the crashed operation (create_index / add_column / etc.) T->>Z: Final query to confirm functionalityLast reviewed commit: "test: add crash_reco..."
(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!