Skip to content

test(tethys): cascade-correctness regression fences (rivets-wsix)#75

Open
dwalleck wants to merge 5 commits into
mainfrom
feat/rivets-wsix-cascade-fences
Open

test(tethys): cascade-correctness regression fences (rivets-wsix)#75
dwalleck wants to merge 5 commits into
mainfrom
feat/rivets-wsix-cascade-fences

Conversation

@dwalleck
Copy link
Copy Markdown
Owner

Summary

  • Three new integration tests in crates/tethys/tests/reindex_cascade.rs locking in the cascade-correctness mechanisms tethys relies on for re-index correctness.
  • Audit-only fix: zero new bugs found. The wsix issue asked "what other tables have the file_deps bug class?" The audit (gilfoyle prove-it-prototype, see .rivets-wsix/) discovered the answer is "none — the schema cascade chain catches it everywhere else."
  • The three fences lock in that cascade-correctness as permanent CI coverage so a future schema migration or indexing rewrite can't silently regress it.

The audit's key finding

refs, attributes, and symbols all rely on the schema's ON DELETE CASCADE chain rooted at symbols(id), triggered by the per-file DELETE FROM symbols WHERE file_id = ? in files.rs::upsert_file. This is the actual safety mechanism for re-index correctness on those tables — not the clear_all_X pattern lcb6 established for file_deps and call_edges. wsix's mental model ("UPSERT-only ⇒ needs clear_all_X") was a special case, not the general principle.

Full per-table inventory and probe results in .rivets-wsix/what-i-learned.md.

The three fences (one slice per design claim)

Slice Test Bug class fenced TDD-inversion
1 (66fde6e) refs_cascade_on_call_removal refs.in_symbol_id ON DELETE CASCADE silently relaxed Comment out DELETE FROM symbols WHERE file_id at files.rs:145 → refs_post = 5 (3 stale + 2 new), test fails on assert_eq!(refs_post, 2)
2 (eb1762d) attributes_cascade_on_symbol_removal attributes.symbol_id ON DELETE CASCADE silently relaxed Change cascade FK to ON DELETE NO ACTION in schema.rs:108 → DELETE FROM symbols blocked by FK, test fails on target_sym_post == 0
3 (5041b42) clear_all_tables_stable_under_reindex clear_all_file_deps removed from index_with_options Comment out self.db.clear_all_file_deps() at indexing.rs:139 → SUM(file_deps.ref_count) 1→2 across re-index, test fails on the SUM assertion (the row-count assertions correctly stay quiet — only SUM catches aggregate growth)

All three TDD-inversions confirmed empirically before commit (per the gilfoyle checkpointed-build skill discipline + the v1.0.3 falsifier-non-vacuity self-review rule from gilfoyle's recent PR #1).

Stress fixtures designed adversarially

  • Slice 1's fixture removes the middle call of three to defeat a head-only or tail-only cascade bug.
  • Slice 2's fixture has two attributed symbols, removes one, asserts the other's attributes survive — catches the "cascade too aggressive" bug.
  • Slice 3's fixture asserts both row count stability AND SUM(ref_count) stability. Only the SUM check catches the original lcb6 UPSERT-aggregate-growth bug class; the row-count check alone would miss it.

Audit trail

The complete gilfoyle workflow output is committed at .rivets-wsix/:

  • related-issues.md — tracker scan for prior art (5-min cap per prove-it-prototype step 0)
  • probe.sh, oracle.sh, probe_refs_bug.sh, probe_null_in_symbol.sh — the empirical probes that contradicted my initial mental model
  • what-i-learned.md — the one-sentence summary + per-table inventory + surprises
  • design.md — falsifiable design with 3 claims + falsifier-non-vacuity self-review
  • plan.md — budgeted plan with 3 slices, stress fixtures, and verification gates

Test plan

  • cargo nextest run -p tethys --test reindex_cascade → 3/3 pass
  • cargo nextest run -p tethys → 639/639 pass (full integration check)
  • cargo clippy --all-targets --all-features -- -D warnings → clean
  • cargo fmt --check → clean
  • TDD-inversion verified for each of the three fences

What this PR does NOT do

  • Does not fix any bug (audit found none).
  • Does not cover the orphan-file-from-disk case — that's rivets-dhxo, intentionally out of scope per .rivets-wsix/design.md.
  • Does not cover streaming-mode indexing — default full-index path only.
  • Does not add an architectural meta-fence (e.g., "every new INSERT site needs a paired clear"). Considered, declined as overengineering for an audit that found nothing — see design.md §6.

dwalleck added 4 commits May 18, 2026 22:23
Audit-trail commit for rivets-wsix. Following the gilfoyle workflow, the
diagnostic dir holds the probe scripts, oracle, design, and plan that
preceded the regression-fence work landing in subsequent slices.

Key finding from prove-it-prototype: ZERO new bugs. The schema's
ON DELETE CASCADE chain (refs.in_symbol_id, attributes.symbol_id,
arch_*.parent FKs) plus per-file `DELETE FROM symbols WHERE file_id`
quietly handles re-index correctness for most tables — not the
`clear_all_X` pattern that lcb6 established. wsix's mental model
("UPSERT-only ⇒ needs clear_all_X") was a special case, not the
general principle. See `.rivets-wsix/what-i-learned.md` for the
per-table inventory.

Empirical probes (sqlite3 against a real tempdir workspace) contradicted
my initial code-reading hypothesis that `refs` would accumulate without
an explicit clear. The cascade via in_symbol_id catches it. Documented
in `.rivets-wsix/probe_refs_bug.sh` and `.rivets-wsix/probe_null_in_symbol.sh`.

Subsequent slices ship three integration tests that lock in the audited
cascade-correctness (claims 1-3 in `.rivets-wsix/design.md`) so future
schema changes can't silently regress the bug class wsix was looking for.

Also includes the `.rivets/issues.jsonl` status update marking wsix
in_progress.
Lock in design claim C1: removing a function-body call from a file's
source produces exactly the expected row removals in `refs` after
re-index, via the `refs.in_symbol_id REFERENCES symbols(id) ON DELETE
CASCADE` chain triggered by the per-file `DELETE FROM symbols WHERE
file_id` in `files.rs::upsert_file`.

The wsix audit (see `.rivets-wsix/what-i-learned.md`) found that this
cascade chain is the actual safety mechanism for refs re-index
correctness — not the `clear_all_X` pattern lcb6 established. This
test pins that mechanism.

Stress fixture: starting `entry()` calls `helper::a()`, `helper::b()`,
and `helper::c()`. Mutation removes the MIDDLE call. Defeats a
hypothetical head-only / tail-only cascade bug: the assertions check
specific surviving/removed refs by name, not just total count.

TDD-inversion: commenting out files.rs:145 `DELETE FROM symbols WHERE
file_id` produces refs_post=5 instead of 2 (3 stale + 2 new). The
test's `assert_eq!(refs_post, 2)` fails with descriptive output.
Confirms the fence is non-vacuous per the v1.0.3 falsifier-non-vacuity
self-review rule.

Per-gate verification:
- `cargo nextest run -p tethys --test reindex_cascade` passes (~110ms)
- `cargo clippy -p tethys --tests --all-features -- -D warnings` clean
- `cargo fmt --check` clean
- TDD-inversion fails the test as predicted
Lock in design claim C2: removing an attributed symbol from source
cascade-deletes both the symbol AND its `attributes` rows via
`attributes.symbol_id REFERENCES symbols(id) ON DELETE CASCADE`.

Stress fixture: two attributed functions `target` and `keep`. Remove
`target` from source. Defeats the "cascade too aggressive" bug class —
if a regression cascade-cleared all of the file's attributes instead
of just the removed symbol's, the `keep_attrs_post == keep_attrs_pre`
assertion would catch it.

TDD-inversion: relaxing the cascade FK to `ON DELETE NO ACTION` in
schema.rs causes the symbol DELETE to fail entirely (FK constraint
violation), and the test's `target_sym_post == 0` assertion fires.
The fence catches the schema drift at one assertion level or another.

Also adds two small helper fns (`count_attrs_for_symbol`,
`count_symbols_by_name`) to keep individual test functions under
clippy's 100-line limit.

Per-gate verification:
- `cargo nextest run -p tethys --test reindex_cascade` 2/2 pass
- `cargo clippy -p tethys --tests --all-features -- -D warnings` clean
- `cargo fmt --check` clean
- TDD-inversion (cascade FK relaxed) fails the test as predicted
…(rivets-wsix slice 3)

Lock in design claim C3: running `Tethys::index()` twice on an
unchanged workspace produces identical row counts in `call_edges` and
`file_deps`, AND a stable `SUM(file_deps.ref_count)`. Catches
regression of the `clear_all_X` discipline established by lcb6 (file_deps)
and its sibling call_edges precedent.

The `SUM(file_deps.ref_count)` assertion defeats a specific bug class
the row-count check alone misses: if `clear_all_file_deps` were
removed, the same dep would be detected on each run, but the
`ON CONFLICT DO UPDATE SET ref_count = ref_count + 1` clause would
silently increment the aggregate. Row count stays equal; ref_count
sum doubles. This is exactly the original lcb6 bug.

TDD-inversion: commenting out `self.db.clear_all_file_deps()` at
indexing.rs:139 produces SUM(ref_count) = 1 → 2 across re-index runs.
The SUM assertion fires; the row-count assertions stay quiet (correctly
— they would not catch this aggregate-growth class without the SUM
check).

Per-gate verification:
- `cargo nextest run -p tethys --test reindex_cascade` 3/3 pass
- `cargo nextest run -p tethys` 639/639 pass (final integration check)
- `cargo clippy -p tethys --tests --all-features -- -D warnings` clean
- `cargo fmt --check` clean
- TDD-inversion (clear_all_file_deps disabled) fails the SUM assertion
  as predicted

This completes the rivets-wsix gilfoyle loop. The audit found zero
new bugs (see `.rivets-wsix/what-i-learned.md`); these three regression
fences lock in the existing cascade and clear_all safety mechanisms
so future schema or indexing changes can't silently regress them.
@claude
Copy link
Copy Markdown

claude Bot commented May 19, 2026

Code Review: PR #75 — cascade-correctness regression fences (rivets-wsix)

Summary: Three integration tests added to crates/tethys/tests/reindex_cascade.rs locking in the cascade-correctness invariants discovered by the wsix audit. No production code changes. Pure regression fence.


What the PR does well

Adversarial fixture design — each test is specifically shaped to defeat the most likely false-positive:

  • Test 1 removes the middle call (not head/tail), ruling out head-only or tail-only cascade bugs
  • Test 2 has two attributed symbols and only removes one, catching "cascade too aggressive" bugs (the keep symbol's attributes must survive)
  • Test 3 asserts both row-count stability and SUM(ref_count) stability — the SUM check is the only thing that catches the UPSERT-aggregate-growth bug class from lcb6; the row-count check alone would miss it

Content-hash-based re-index triggering — the implementation wisely relies on content hash changes rather than mtime manipulation (the plan.md mentioned filetime as an option). Writing new content to the file is cleaner and removes the filetime dev-dependency concern entirely.

ClearAllSnapshot struct — clean encapsulation for the before/after comparison in test 3. Communicates intent clearly.

Explicit drop(conn) in test 2 before re-indexing is the right call — keeps the read connection from potentially conflicting with the write path before the second tethys.index() call.

Concurrency safety — each test uses its own tempfile::tempdir(), so tests get isolated DBs and don't race across nextest processes (which would be the concern per CLAUDE.md for shared-workspace tests).


Issues to raise

Minor: Missing c_refs assertion in refs_cascade_on_call_removal

The test asserts refs_post == 2, b_refs == 0, and a_refs == 1, but doesn't assert c_refs == 1. Logically implied (2 total − 0 b − 1 a = 1 c), but adding it would make the surviving-ref checks symmetric and fully explicit:

let c_refs = count_lib_refs_by_target_names(&conn, &["c"]);
assert_eq!(
    c_refs, 1,
    "ref to helper::c() must survive cascade (it's still in source) — got {c_refs}"
);

This is a minor gap, but since the test's stated purpose is verifying which specific refs survive, completing the pair strengthens the claim.


Worth noting: count_lib_refs_by_target_names is an indirect oracle for the cascade

The test's query joins on r.symbol_id (the target/callee symbol), but the cascade being tested fires on r.in_symbol_id (the container symbol, i.e., entry()). The test correctly validates the effect — after cascade-deleting entry()'s symbol row and re-inserting it, the stale ref to b is gone and the new refs to a and c are present. But refs where r.symbol_id IS NULL (unresolved refs) would be excluded from the count. The probes confirmed this isn't an issue in practice (the fixture's calls resolve cleanly), and the file-path filter constrains the scope. Just worth a note for future readers maintaining this test.


Non-issue, but plan vs. implementation divergence: plan.md describes using filetime::set_file_mtime and std::thread::sleep(1) for mtime bumping. The implementation discards this entirely in favor of content-hash-based change detection (simpler, no new dependency). The plan document is now stale on this point, but that's expected given the diagnostic-directory convention — point-in-time, not maintained.


Test coverage assessment

Claim Covered Falsifier verified
C1: refs.in_symbol_id ON DELETE CASCADE fires on re-index refs_cascade_on_call_removal ✅ PR description confirms
C2: attributes.symbol_id ON DELETE CASCADE fires on symbol removal attributes_cascade_on_symbol_removal ✅ PR description confirms
C3: clear_all_X discipline stable across unchanged-source re-index clear_all_tables_stable_under_reindex ✅ PR description confirms

The audit trail in .rivets-wsix/ (probe scripts, oracle, what-i-learned, design, plan) is exemplary. The falsifier-non-vacuity self-review (§3 in design.md) correctly identifies a concrete buggy implementation for each claim.


Verdict

Approve with the minor suggestion above. The c_refs == 1 assertion is a quick add that completes the surviving-ref coverage symmetrically. Everything else — fixture design, cascade mechanism understanding, test isolation, and audit documentation — is solid. This is a clean regression fence that will catch real schema drift.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a suite of regression tests and comprehensive documentation following an audit of the database re-indexing logic. The audit verified that the ON DELETE CASCADE schema and clear_all_X patterns effectively prevent the accumulation of stale data during re-indexing. A new integration test file, crates/tethys/tests/reindex_cascade.rs, was implemented to lock in these behaviors for symbols, attributes, call edges, and file dependencies. The PR also includes various diagnostic scripts and design documents detailing the audit's methodology and findings. I have no feedback to provide as there were no review comments.

Round 1 of review feedback. Per the
`.rivets-wsix/review-decisions-round-1.md` decision log:

- **Accept (claim 1):** added explicit `c_refs == 1` assertion in
  `refs_cascade_on_call_removal`. Previously the third surviving ref
  was implicit (derivable from `refs_post − a_refs − b_refs = 2 − 1 − 0 = 1`).
  Spelling it out defends against future mutations to
  `count_lib_refs_by_target_names`'s IN-clause that would silently break
  that arithmetic. Symmetric with the existing `a_refs`/`b_refs` pattern.

- **Reject (claim 2):** `count_lib_refs_by_target_names` joins on
  `r.symbol_id` (callee) rather than `r.in_symbol_id` (cascade trigger).
  The reviewer themselves noted this is a future-maintainer note, not
  actionable. The indirection is intentional — the regression fence
  verifies the *effect* of the cascade (refs disappear), not the trigger
  mechanism. Changing to `r.in_symbol_id` would weaken the fence by
  testing how the cascade fires instead of whether it produces the
  correct end state.

- **Reject (claim 3):** plan.md / impl divergence on `filetime` vs.
  content-hash detection. Per CLAUDE.md "Issue diagnostic directories"
  convention, plan files are point-in-time, not maintained.
  `checkpointed-build` explicitly permits implementer deviation when the
  budget and oracle still hold; this is exactly that case.
@claude
Copy link
Copy Markdown

claude Bot commented May 19, 2026

Code Review — PR #75: cascade-correctness regression fences (rivets-wsix)

Summary: Three new integration tests locking in SQLite cascade-correctness invariants for the tethys re-index pipeline. Zero production code changes; audit found zero bugs. This is a fence-only PR.


Overall Assessment

This is a well-executed regression-fence PR. The test design is adversarial, the oracles are independent (direct SQL, not tethys API), and the prior review round already addressed the asymmetric c_refs coverage gap. No blockers.


What works well

Adversarial fixture shapes
Each fixture is intentionally designed to defeat a specific class of false-pass:

  • Slice 1: middle-removal (a/b/c → a/c) catches a hypothetical head-only or tail-only cascade bug that a simpler "count decreased" assertion would miss.
  • Slice 2: two attributed symbols, remove one — the surviving-symbol assertions (keep_attrs_post == keep_attrs_pre) catch "cascade too aggressive," which is harder to test and easy to omit.
  • Slice 3: SUM(ref_count) alongside row count is the correct fence for the UPSERT-aggregate-growth bug class. Row count alone is famously insufficient for this failure mode (the lcb6 precedent).

Independent oracle
Each test reads SQLite directly via rusqlite, bypassing the indexer's own query layer. This is the right separation — if the indexer's read path were also broken, these tests would still catch the storage invariant violation.

SQLite connection management
attributes_cascade_on_symbol_removal correctly calls drop(conn) before the second tethys.index() call. With tethys's WAL-mode SQLite, an open read connection across a write is handled gracefully, but the explicit drop is the safer pattern and avoids file-locking surprises under nextest's process-per-test model.

Guard assertions
The assert_eq!(refs_pre, 3) in Slice 1 is load-bearing: if Pass 2 resolution fails for the fixture (e.g., due to a Cargo.toml misconfiguration), the test fails immediately with a diagnostic that says "fixture is wrong" rather than "cascade is broken." This is the right guard placement.


Observations (non-blocking)

Slice 1 oracle indirection
count_lib_refs_by_target_names joins on r.symbol_id (the callee/target) rather than r.in_symbol_id (the container where the cascade fires). The previous review flagged this and correctly rejected a change — verifying the effect (refs disappear) rather than the trigger mechanism is exactly right for a regression fence. Future maintainers reading this: the test is designed this way intentionally. The round-1 review-decisions doc explains the reasoning.

ClearAllSnapshot struct scope
The struct is used only in clear_all_tables_stable_under_reindex. It's fine as-is — the named fields (call_edges, file_deps, file_deps_ref_count_sum) make the two-snapshot comparison readable. A future maintainer adding a fourth field will naturally extend the struct.

Fixture relies on Pass 2 cross-file resolution succeeding
Slices 1 and 3 fixtures use cross-file calls (helper::a(), helper::do_thing()). The refs_pre and snap1.call_edges guard assertions ensure that if Pass 2 fails (e.g., on a stripped-down CI runner), the test fails early with a clear message rather than silently passing. This is correct, but worth noting: if Pass 2 resolution regresses for a different reason, Slice 1's guard assertion will absorb the failure. That's a feature, not a bug — it means the failure is fast and obvious.


Audit trail quality

The .rivets-wsix/ directory is thorough and follows the project's issue-diagnostic-directory convention correctly:

  • probe_refs_bug.sh / oracle.sh document the empirical work that preceded the design
  • design.md has the falsifier-non-vacuity self-review (sections 1-7) filled in rigorously
  • what-i-learned.md's per-table inventory is the kind of artifact that pays off when a future schema migration touches the cascade FKs
  • review-decisions-round-1.md correctly documents the c_refs accept and the plan.md filetime drift reject — both with rationale

Minor nit (truly optional)

count_lib_refs_by_target_names collects params_vec via names.iter().map(|n| n as &dyn rusqlite::ToSql).collect() — this is correct and idiomatic for dynamic parameter lists with rusqlite. No change needed; just noting that the pattern is non-obvious on first read and the expect("count refs by target names") error message is clear enough to diagnose failures.


Verdict

Approve. The three fences cover the three distinct invariants the wsix audit identified. The TDD-inversion verification was done before commit. CI passes. The prior review round's accepted finding (symmetric c_refs assertion) is in the current code. No outstanding issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant