Skip to content

P0: Replace or remove flaky SQLite WAL performanceAssertionTest — microbench premise unsound on CI #96

@totalslacker

Description

@totalslacker

Summary

SQLiteStorageConcurrencyTests.performanceAssertionTest() was disabled on CI runners via @Test(.disabled(if: CI)) because its assertion (tBoth < tRead + tWrite × 0.7) regularly fails at 4–7× the threshold on GitHub Actions, even though WAL concurrency is working correctly. The CI skip is a policy violation. This issue resolves it: either the test is rewritten so it passes reliably everywhere, or it is deleted with an ADR. The skip annotation must be gone either way.

Requirements

  • The .disabled(if: CI) annotation in SQLiteStorageConcurrencyTests.swift must be removed. "Skip on CI" is not an acceptable resolution.
  • If the test is rewritten: it must pass ≥20/20 consecutive runs on CI in both debug and release configuration, with no flakiness.
  • If the test is deleted: an ADR must document why the original assertion was unmeasurable on macOS GitHub Actions runners, and must explain how livenessTest() and safariUnfuckerRegressionTest() together cover the actor-serialization regression the deleted test was meant to catch.
  • No new tests that fail locally or intermittently may be introduced anywhere in the suite.
  • No new skip annotations of any kind (@Test(.disabled(...)), XCTSkipIf(...), XCTSkipUnless(...), #if !CI, etc.) may be added to any test that previously ran.
  • The assertion may not be "fixed" by deleting it, weakening it to a tautology, or making the test body a no-op.
  • Any threshold or budget value in a rewritten test must be supported by measurement evidence (median of ≥11 runs on a quiet machine) recorded in a code comment beside the new value.
  • CI must be green on both swift test (debug) and swift test -c release jobs on the fix branch.
  • The Validate stage must verify: (a) the target test passes ≥10 consecutive runs; (b) no other test newly fails relative to main; (c) CI is green on both jobs.
  • If a candidate fix makes the target test pass only by introducing fragility elsewhere, implementation must stop and escalate to a human comment rather than merging.

Scope

In scope:

  • performanceAssertionTest() in SQLiteStorageConcurrencyTests.swift — rewrite or deletion
  • ADR documentation (required only if the test is deleted)

Out of scope:

  • livenessTest() (sibling test, separate issue)
  • Changes to the SQLite storage actor split itself
  • Changes to any other concurrency test not named above

Prior Art / Context

  • The test asserts tBoth < tRead + tWrite × 0.7 using async let + actor-hopping + SQLite I/O, median of 3 iterations (100 writes + one FTS read each).
  • At that scale, runner contention and scheduler jitter dominate — the concurrency speedup signal is below the noise floor on shared GitHub Actions runners.
  • Skip commit 20c55c25 (2026-05-06) acknowledged this directly: "the small-N async-let + actor + SQLite-IO pattern measures runner contention more than concurrency speedup."
  • Candidate approaches for Research to evaluate: larger N (10k+ writes, longer FTS scan), coarser threshold (e.g. tBoth < tRead + tWrite without the 0.7 factor), larger sample count (e.g. median-of-11), or a proper benchmark harness with warm-up and outlier rejection.
  • livenessTest() and safariUnfuckerRegressionTest() are the two coverage fallbacks if deletion is chosen.

Risks / Dependencies

  • If no rewrite produces a stable signal on macOS GitHub Actions runners, deletion is mandatory — there is no "keep but skip" fallback.
  • Threshold inflation without evidence is explicitly prohibited; Research must produce real timing data to justify any new assertion value.
  • Any fix that passes the target test by degrading another test is treated as equivalent to no fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions