test(tethys): cascade-correctness regression fences (rivets-wsix) by dwalleck · Pull Request #75 · dwalleck/rivets

dwalleck · 2026-05-19T03:33:26Z

Summary

Three new integration tests in crates/tethys/tests/reindex_cascade.rs locking in the cascade-correctness mechanisms tethys relies on for re-index correctness.
Audit-only fix: zero new bugs found. The wsix issue asked "what other tables have the file_deps bug class?" The audit (gilfoyle prove-it-prototype, see .rivets-wsix/) discovered the answer is "none — the schema cascade chain catches it everywhere else."
The three fences lock in that cascade-correctness as permanent CI coverage so a future schema migration or indexing rewrite can't silently regress it.

The audit's key finding

refs, attributes, and symbols all rely on the schema's ON DELETE CASCADE chain rooted at symbols(id), triggered by the per-file DELETE FROM symbols WHERE file_id = ? in files.rs::upsert_file. This is the actual safety mechanism for re-index correctness on those tables — not the clear_all_X pattern lcb6 established for file_deps and call_edges. wsix's mental model ("UPSERT-only ⇒ needs clear_all_X") was a special case, not the general principle.

Full per-table inventory and probe results in .rivets-wsix/what-i-learned.md.

The three fences (one slice per design claim)

Slice	Test	Bug class fenced	TDD-inversion
1 (`66fde6e`)	`refs_cascade_on_call_removal`	`refs.in_symbol_id ON DELETE CASCADE` silently relaxed	Comment out `DELETE FROM symbols WHERE file_id` at files.rs:145 → refs_post = 5 (3 stale + 2 new), test fails on `assert_eq!(refs_post, 2)`
2 (`eb1762d`)	`attributes_cascade_on_symbol_removal`	`attributes.symbol_id ON DELETE CASCADE` silently relaxed	Change cascade FK to `ON DELETE NO ACTION` in schema.rs:108 → DELETE FROM symbols blocked by FK, test fails on `target_sym_post == 0`
3 (`5041b42`)	`clear_all_tables_stable_under_reindex`	`clear_all_file_deps` removed from `index_with_options`	Comment out `self.db.clear_all_file_deps()` at indexing.rs:139 → SUM(file_deps.ref_count) 1→2 across re-index, test fails on the SUM assertion (the row-count assertions correctly stay quiet — only SUM catches aggregate growth)

All three TDD-inversions confirmed empirically before commit (per the gilfoyle checkpointed-build skill discipline + the v1.0.3 falsifier-non-vacuity self-review rule from gilfoyle's recent PR #1).

Stress fixtures designed adversarially

Slice 1's fixture removes the middle call of three to defeat a head-only or tail-only cascade bug.
Slice 2's fixture has two attributed symbols, removes one, asserts the other's attributes survive — catches the "cascade too aggressive" bug.
Slice 3's fixture asserts both row count stability AND SUM(ref_count) stability. Only the SUM check catches the original lcb6 UPSERT-aggregate-growth bug class; the row-count check alone would miss it.

Audit trail

The complete gilfoyle workflow output is committed at .rivets-wsix/:

related-issues.md — tracker scan for prior art (5-min cap per prove-it-prototype step 0)
probe.sh, oracle.sh, probe_refs_bug.sh, probe_null_in_symbol.sh — the empirical probes that contradicted my initial mental model
what-i-learned.md — the one-sentence summary + per-table inventory + surprises
design.md — falsifiable design with 3 claims + falsifier-non-vacuity self-review
plan.md — budgeted plan with 3 slices, stress fixtures, and verification gates

Test plan

cargo nextest run -p tethys --test reindex_cascade → 3/3 pass
cargo nextest run -p tethys → 639/639 pass (full integration check)
cargo clippy --all-targets --all-features -- -D warnings → clean
cargo fmt --check → clean
TDD-inversion verified for each of the three fences

What this PR does NOT do

Does not fix any bug (audit found none).
Does not cover the orphan-file-from-disk case — that's rivets-dhxo, intentionally out of scope per .rivets-wsix/design.md.
Does not cover streaming-mode indexing — default full-index path only.
Does not add an architectural meta-fence (e.g., "every new INSERT site needs a paired clear"). Considered, declined as overengineering for an audit that found nothing — see design.md §6.

Audit-trail commit for rivets-wsix. Following the gilfoyle workflow, the diagnostic dir holds the probe scripts, oracle, design, and plan that preceded the regression-fence work landing in subsequent slices. Key finding from prove-it-prototype: ZERO new bugs. The schema's ON DELETE CASCADE chain (refs.in_symbol_id, attributes.symbol_id, arch_*.parent FKs) plus per-file `DELETE FROM symbols WHERE file_id` quietly handles re-index correctness for most tables — not the `clear_all_X` pattern that lcb6 established. wsix's mental model ("UPSERT-only ⇒ needs clear_all_X") was a special case, not the general principle. See `.rivets-wsix/what-i-learned.md` for the per-table inventory. Empirical probes (sqlite3 against a real tempdir workspace) contradicted my initial code-reading hypothesis that `refs` would accumulate without an explicit clear. The cascade via in_symbol_id catches it. Documented in `.rivets-wsix/probe_refs_bug.sh` and `.rivets-wsix/probe_null_in_symbol.sh`. Subsequent slices ship three integration tests that lock in the audited cascade-correctness (claims 1-3 in `.rivets-wsix/design.md`) so future schema changes can't silently regress the bug class wsix was looking for. Also includes the `.rivets/issues.jsonl` status update marking wsix in_progress.

Lock in design claim C1: removing a function-body call from a file's source produces exactly the expected row removals in `refs` after re-index, via the `refs.in_symbol_id REFERENCES symbols(id) ON DELETE CASCADE` chain triggered by the per-file `DELETE FROM symbols WHERE file_id` in `files.rs::upsert_file`. The wsix audit (see `.rivets-wsix/what-i-learned.md`) found that this cascade chain is the actual safety mechanism for refs re-index correctness — not the `clear_all_X` pattern lcb6 established. This test pins that mechanism. Stress fixture: starting `entry()` calls `helper::a()`, `helper::b()`, and `helper::c()`. Mutation removes the MIDDLE call. Defeats a hypothetical head-only / tail-only cascade bug: the assertions check specific surviving/removed refs by name, not just total count. TDD-inversion: commenting out files.rs:145 `DELETE FROM symbols WHERE file_id` produces refs_post=5 instead of 2 (3 stale + 2 new). The test's `assert_eq!(refs_post, 2)` fails with descriptive output. Confirms the fence is non-vacuous per the v1.0.3 falsifier-non-vacuity self-review rule. Per-gate verification: - `cargo nextest run -p tethys --test reindex_cascade` passes (~110ms) - `cargo clippy -p tethys --tests --all-features -- -D warnings` clean - `cargo fmt --check` clean - TDD-inversion fails the test as predicted

Lock in design claim C2: removing an attributed symbol from source cascade-deletes both the symbol AND its `attributes` rows via `attributes.symbol_id REFERENCES symbols(id) ON DELETE CASCADE`. Stress fixture: two attributed functions `target` and `keep`. Remove `target` from source. Defeats the "cascade too aggressive" bug class — if a regression cascade-cleared all of the file's attributes instead of just the removed symbol's, the `keep_attrs_post == keep_attrs_pre` assertion would catch it. TDD-inversion: relaxing the cascade FK to `ON DELETE NO ACTION` in schema.rs causes the symbol DELETE to fail entirely (FK constraint violation), and the test's `target_sym_post == 0` assertion fires. The fence catches the schema drift at one assertion level or another. Also adds two small helper fns (`count_attrs_for_symbol`, `count_symbols_by_name`) to keep individual test functions under clippy's 100-line limit. Per-gate verification: - `cargo nextest run -p tethys --test reindex_cascade` 2/2 pass - `cargo clippy -p tethys --tests --all-features -- -D warnings` clean - `cargo fmt --check` clean - TDD-inversion (cascade FK relaxed) fails the test as predicted

…(rivets-wsix slice 3) Lock in design claim C3: running `Tethys::index()` twice on an unchanged workspace produces identical row counts in `call_edges` and `file_deps`, AND a stable `SUM(file_deps.ref_count)`. Catches regression of the `clear_all_X` discipline established by lcb6 (file_deps) and its sibling call_edges precedent. The `SUM(file_deps.ref_count)` assertion defeats a specific bug class the row-count check alone misses: if `clear_all_file_deps` were removed, the same dep would be detected on each run, but the `ON CONFLICT DO UPDATE SET ref_count = ref_count + 1` clause would silently increment the aggregate. Row count stays equal; ref_count sum doubles. This is exactly the original lcb6 bug. TDD-inversion: commenting out `self.db.clear_all_file_deps()` at indexing.rs:139 produces SUM(ref_count) = 1 → 2 across re-index runs. The SUM assertion fires; the row-count assertions stay quiet (correctly — they would not catch this aggregate-growth class without the SUM check). Per-gate verification: - `cargo nextest run -p tethys --test reindex_cascade` 3/3 pass - `cargo nextest run -p tethys` 639/639 pass (final integration check) - `cargo clippy -p tethys --tests --all-features -- -D warnings` clean - `cargo fmt --check` clean - TDD-inversion (clear_all_file_deps disabled) fails the SUM assertion as predicted This completes the rivets-wsix gilfoyle loop. The audit found zero new bugs (see `.rivets-wsix/what-i-learned.md`); these three regression fences lock in the existing cascade and clear_all safety mechanisms so future schema or indexing changes can't silently regress them.

claude · 2026-05-19T03:36:54Z

Code Review: PR #75 — cascade-correctness regression fences (rivets-wsix)

Summary: Three integration tests added to crates/tethys/tests/reindex_cascade.rs locking in the cascade-correctness invariants discovered by the wsix audit. No production code changes. Pure regression fence.

What the PR does well

Adversarial fixture design — each test is specifically shaped to defeat the most likely false-positive:

Test 1 removes the middle call (not head/tail), ruling out head-only or tail-only cascade bugs
Test 2 has two attributed symbols and only removes one, catching "cascade too aggressive" bugs (the keep symbol's attributes must survive)
Test 3 asserts both row-count stability and SUM(ref_count) stability — the SUM check is the only thing that catches the UPSERT-aggregate-growth bug class from lcb6; the row-count check alone would miss it

Content-hash-based re-index triggering — the implementation wisely relies on content hash changes rather than mtime manipulation (the plan.md mentioned filetime as an option). Writing new content to the file is cleaner and removes the filetime dev-dependency concern entirely.

ClearAllSnapshot struct — clean encapsulation for the before/after comparison in test 3. Communicates intent clearly.

Explicit drop(conn) in test 2 before re-indexing is the right call — keeps the read connection from potentially conflicting with the write path before the second tethys.index() call.

Concurrency safety — each test uses its own tempfile::tempdir(), so tests get isolated DBs and don't race across nextest processes (which would be the concern per CLAUDE.md for shared-workspace tests).

Issues to raise

Minor: Missing c_refs assertion in refs_cascade_on_call_removal

The test asserts refs_post == 2, b_refs == 0, and a_refs == 1, but doesn't assert c_refs == 1. Logically implied (2 total − 0 b − 1 a = 1 c), but adding it would make the surviving-ref checks symmetric and fully explicit:

let c_refs = count_lib_refs_by_target_names(&conn, &["c"]);
assert_eq!(
    c_refs, 1,
    "ref to helper::c() must survive cascade (it's still in source) — got {c_refs}"
);

This is a minor gap, but since the test's stated purpose is verifying which specific refs survive, completing the pair strengthens the claim.

Worth noting: count_lib_refs_by_target_names is an indirect oracle for the cascade

The test's query joins on r.symbol_id (the target/callee symbol), but the cascade being tested fires on r.in_symbol_id (the container symbol, i.e., entry()). The test correctly validates the effect — after cascade-deleting entry()'s symbol row and re-inserting it, the stale ref to b is gone and the new refs to a and c are present. But refs where r.symbol_id IS NULL (unresolved refs) would be excluded from the count. The probes confirmed this isn't an issue in practice (the fixture's calls resolve cleanly), and the file-path filter constrains the scope. Just worth a note for future readers maintaining this test.

Non-issue, but plan vs. implementation divergence: plan.md describes using filetime::set_file_mtime and std::thread::sleep(1) for mtime bumping. The implementation discards this entirely in favor of content-hash-based change detection (simpler, no new dependency). The plan document is now stale on this point, but that's expected given the diagnostic-directory convention — point-in-time, not maintained.

Test coverage assessment

Claim	Covered	Falsifier verified
C1: `refs.in_symbol_id ON DELETE CASCADE` fires on re-index	✅ `refs_cascade_on_call_removal`	✅ PR description confirms
C2: `attributes.symbol_id ON DELETE CASCADE` fires on symbol removal	✅ `attributes_cascade_on_symbol_removal`	✅ PR description confirms
C3: `clear_all_X` discipline stable across unchanged-source re-index	✅ `clear_all_tables_stable_under_reindex`	✅ PR description confirms

The audit trail in .rivets-wsix/ (probe scripts, oracle, what-i-learned, design, plan) is exemplary. The falsifier-non-vacuity self-review (§3 in design.md) correctly identifies a concrete buggy implementation for each claim.

Verdict

Approve with the minor suggestion above. The c_refs == 1 assertion is a quick add that completes the surviving-ref coverage symmetrically. Everything else — fixture design, cascade mechanism understanding, test isolation, and audit documentation — is solid. This is a clean regression fence that will catch real schema drift.

gemini-code-assist

Code Review

This pull request introduces a suite of regression tests and comprehensive documentation following an audit of the database re-indexing logic. The audit verified that the ON DELETE CASCADE schema and clear_all_X patterns effectively prevent the accumulation of stale data during re-indexing. A new integration test file, crates/tethys/tests/reindex_cascade.rs, was implemented to lock in these behaviors for symbols, attributes, call edges, and file dependencies. The PR also includes various diagnostic scripts and design documents detailing the audit's methodology and findings. I have no feedback to provide as there were no review comments.

Round 1 of review feedback. Per the `.rivets-wsix/review-decisions-round-1.md` decision log: - **Accept (claim 1):** added explicit `c_refs == 1` assertion in `refs_cascade_on_call_removal`. Previously the third surviving ref was implicit (derivable from `refs_post − a_refs − b_refs = 2 − 1 − 0 = 1`). Spelling it out defends against future mutations to `count_lib_refs_by_target_names`'s IN-clause that would silently break that arithmetic. Symmetric with the existing `a_refs`/`b_refs` pattern. - **Reject (claim 2):** `count_lib_refs_by_target_names` joins on `r.symbol_id` (callee) rather than `r.in_symbol_id` (cascade trigger). The reviewer themselves noted this is a future-maintainer note, not actionable. The indirection is intentional — the regression fence verifies the *effect* of the cascade (refs disappear), not the trigger mechanism. Changing to `r.in_symbol_id` would weaken the fence by testing how the cascade fires instead of whether it produces the correct end state. - **Reject (claim 3):** plan.md / impl divergence on `filetime` vs. content-hash detection. Per CLAUDE.md "Issue diagnostic directories" convention, plan files are point-in-time, not maintained. `checkpointed-build` explicitly permits implementer deviation when the budget and oracle still hold; this is exactly that case.

claude · 2026-05-19T03:51:43Z

Code Review — PR #75: cascade-correctness regression fences (rivets-wsix)

Summary: Three new integration tests locking in SQLite cascade-correctness invariants for the tethys re-index pipeline. Zero production code changes; audit found zero bugs. This is a fence-only PR.

Overall Assessment

This is a well-executed regression-fence PR. The test design is adversarial, the oracles are independent (direct SQL, not tethys API), and the prior review round already addressed the asymmetric c_refs coverage gap. No blockers.

What works well

Adversarial fixture shapes
Each fixture is intentionally designed to defeat a specific class of false-pass:

Slice 1: middle-removal (a/b/c → a/c) catches a hypothetical head-only or tail-only cascade bug that a simpler "count decreased" assertion would miss.
Slice 2: two attributed symbols, remove one — the surviving-symbol assertions (keep_attrs_post == keep_attrs_pre) catch "cascade too aggressive," which is harder to test and easy to omit.
Slice 3: SUM(ref_count) alongside row count is the correct fence for the UPSERT-aggregate-growth bug class. Row count alone is famously insufficient for this failure mode (the lcb6 precedent).

Independent oracle
Each test reads SQLite directly via rusqlite, bypassing the indexer's own query layer. This is the right separation — if the indexer's read path were also broken, these tests would still catch the storage invariant violation.

SQLite connection management
attributes_cascade_on_symbol_removal correctly calls drop(conn) before the second tethys.index() call. With tethys's WAL-mode SQLite, an open read connection across a write is handled gracefully, but the explicit drop is the safer pattern and avoids file-locking surprises under nextest's process-per-test model.

Guard assertions
The assert_eq!(refs_pre, 3) in Slice 1 is load-bearing: if Pass 2 resolution fails for the fixture (e.g., due to a Cargo.toml misconfiguration), the test fails immediately with a diagnostic that says "fixture is wrong" rather than "cascade is broken." This is the right guard placement.

Observations (non-blocking)

Slice 1 oracle indirection
count_lib_refs_by_target_names joins on r.symbol_id (the callee/target) rather than r.in_symbol_id (the container where the cascade fires). The previous review flagged this and correctly rejected a change — verifying the effect (refs disappear) rather than the trigger mechanism is exactly right for a regression fence. Future maintainers reading this: the test is designed this way intentionally. The round-1 review-decisions doc explains the reasoning.

ClearAllSnapshot struct scope
The struct is used only in clear_all_tables_stable_under_reindex. It's fine as-is — the named fields (call_edges, file_deps, file_deps_ref_count_sum) make the two-snapshot comparison readable. A future maintainer adding a fourth field will naturally extend the struct.

Fixture relies on Pass 2 cross-file resolution succeeding
Slices 1 and 3 fixtures use cross-file calls (helper::a(), helper::do_thing()). The refs_pre and snap1.call_edges guard assertions ensure that if Pass 2 fails (e.g., on a stripped-down CI runner), the test fails early with a clear message rather than silently passing. This is correct, but worth noting: if Pass 2 resolution regresses for a different reason, Slice 1's guard assertion will absorb the failure. That's a feature, not a bug — it means the failure is fast and obvious.

Audit trail quality

The .rivets-wsix/ directory is thorough and follows the project's issue-diagnostic-directory convention correctly:

probe_refs_bug.sh / oracle.sh document the empirical work that preceded the design
design.md has the falsifier-non-vacuity self-review (sections 1-7) filled in rigorously
what-i-learned.md's per-table inventory is the kind of artifact that pays off when a future schema migration touches the cascade FKs
review-decisions-round-1.md correctly documents the c_refs accept and the plan.md filetime drift reject — both with rationale

Minor nit (truly optional)

count_lib_refs_by_target_names collects params_vec via names.iter().map(|n| n as &dyn rusqlite::ToSql).collect() — this is correct and idiomatic for dynamic parameter lists with rusqlite. No change needed; just noting that the pattern is non-obvious on first read and the expect("count refs by target names") error message is clear enough to diagnose failures.

Verdict

Approve. The three fences cover the three distinct invariants the wsix audit identified. The TDD-inversion verification was done before commit. CI passes. The prior review round's accepted finding (symmetric c_refs assertion) is in the current code. No outstanding issues.

dwalleck added 4 commits May 18, 2026 22:23

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(tethys): cascade-correctness regression fences (rivets-wsix)#75

test(tethys): cascade-correctness regression fences (rivets-wsix)#75
dwalleck wants to merge 5 commits into
mainfrom
feat/rivets-wsix-cascade-fences

dwalleck commented May 19, 2026

Uh oh!

claude Bot commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

claude Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dwalleck commented May 19, 2026

Summary

The audit's key finding

The three fences (one slice per design claim)

Stress fixtures designed adversarially

Audit trail

Test plan

What this PR does NOT do

Uh oh!

claude Bot commented May 19, 2026

Code Review: PR #75 — cascade-correctness regression fences (rivets-wsix)

What the PR does well

Issues to raise

Test coverage assessment

Verdict

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

claude Bot commented May 19, 2026

Code Review — PR #75: cascade-correctness regression fences (rivets-wsix)

Overall Assessment

What works well

Observations (non-blocking)

Audit trail quality

Minor nit (truly optional)

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant