Skip to content

Improve test suite reliability with randomization, TSan, and flaky test fixes#2823

Draft
Valpertui wants to merge 4 commits into
developfrom
valpertui/fix/tests-suite-improvements
Draft

Improve test suite reliability with randomization, TSan, and flaky test fixes#2823
Valpertui wants to merge 4 commits into
developfrom
valpertui/fix/tests-suite-improvements

Conversation

@Valpertui
Copy link
Copy Markdown
Member

What and why?

The iOS SDK test suite has been experiencing intermittent CI failures (~50% pass rate locally with make test-ios-all). This PR addresses the root causes of flakiness and adds infrastructure to detect order-dependent and thread-unsafe tests earlier.

How?

1. Enable test randomization across all 20 schemes
Adds randomExecutionOrdering = "YES" to every test scheme to surface order-dependent test failures that were hidden by deterministic execution.

2. Extend Thread Sanitizer coverage
Enables TSan on DatadogInternal, DatadogRUM, DatadogLogs, and DatadogTrace (iOS + tvOS) where it was previously missing. TSan was already enabled on DatadogCore, DatadogCrashReporting, and IntegrationTests.

3. Fix timing-sensitive tests
Tests with timeouts too tight for CI environments:

  • AppHangsWatchdogThreadTests: threshold 0.1s→0.5s, wait multiplier 10x→15x, duration tolerance uses Constants.tolerance + ciPadding
  • Profiling concurrency tests (CTorProfiler, MachSamplingProfiler, SafeRead, AppLaunchProfiler): timeout: 0.12.0 for concurrentPerform waits
  • AppStateManagerTests: 0.12.0 for async data store operations
  • DisplayLinkerTests: wait(during:) 0.1s→0.25s for CADisplayLink callbacks
  • VitalInfoSamplerTests: XCTAssertEqual(sampleCount, 2)XCTAssertGreaterThanOrEqual (timer scheduling can produce extra samples)
  • WatchdogTerminationsMonitoringTests: added 10s deadline to unbounded polling loop
  • ViewHitchesIntegrationTests: increased wait for frame hitch generation

4. Fix data races in URLSessionTaskStateSwizzlerTests
interceptedStates (plain Array) and interceptionCount (plain Int) were mutated from concurrent URLSession delegate callbacks — a genuine data race. Wrapped in ThreadSafeStates and ThreadSafeCounter. Also replaced Thread.sleep(1) with expectation-based waiting.

5. Fix flaky KSCrashBacktraceTests
testGenerateBacktraceForBackgroundThread was asserting that DatadogCrashReportingTests and Foundation appear in binary images, but the background thread was blocked on semaphore.wait() (only system frames on stack). Fixed by busy-spinning the thread inside a @inline(never) function in the test module, so user code frames are on the stack at capture time. Restores the user image assertion.

Local performance impact (10-run average):

Category Before After Delta
Full suite (11 modules) 224.5s 241.1s +16.6s (+7.4%)
TSan-new modules (4) 53.9s 69.9s +16.0s (+29.7%)
Other modules (7) 170.6s 171.2s +0.6s (+0.4%)

The +16s overhead comes entirely from TSan instrumentation on the 4 newly-enabled modules. Timeout changes have near-zero wall-clock impact (they're ceilings, not delays).

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes N/A — internal test infrastructure only
  • Add Objective-C interface for public APIs N/A — no public API changes
  • Run make api-surface N/A — no API changes

@Valpertui Valpertui force-pushed the valpertui/fix/tests-suite-improvements branch 2 times, most recently from c8fb123 to 94175d0 Compare April 14, 2026 13:11
…ble schemes

Enable randomExecutionOrdering on all test schemes to surface order-dependent test
failures. Enable Thread Sanitizer on DatadogInternal, DatadogRUM, DatadogLogs,
DatadogTrace, DatadogSessionReplay, DatadogWebViewTracking, and DatadogFlags.

DatadogProfiling is intentionally excluded: the mach_sampling_profiler uses Mach
thread suspension (thread_suspend/thread_get_state) to walk stack frames, which
conflicts with TSan's per-thread shadow memory. It also installs SIGBUS/SIGSEGV
handlers that conflict with TSan's own signal use. Document this in TESTING.md.
Increase timeouts for tests that use real threading, timers, or
concurrent dispatch where 0.1s ceilings are too tight for CI:

- AppHangsWatchdogThreadTests: raise threshold from 0.1s to 0.5s,
  widen wait multiplier from 10x to 15x, use Constants.tolerance + CI
  padding for duration assertions
- Profiling concurrency tests: 0.1s to 2.0s for concurrentPerform waits
- AppStateManagerTests: 0.1s to 2.0s for async data store operations
- DisplayLinkerTests: wait(during:) from 0.1s to 0.25s for CADisplayLink
- VitalInfoSamplerTests: use GreaterThanOrEqual for sample count
- AppHangsMonitoringTests: raise threshold and hang duration
- WatchdogTerminationsMonitoringTests: add 10s deadline to polling loop
- ViewHitchesTests: increase wait for frame hitch generation
…acktraceTests

URLSessionTaskStateSwizzlerTests: wrap interceptedStates and
interceptionCount in thread-safe types (ThreadSafeStates,
ThreadSafeCounter) to fix data races from concurrent URLSession
callbacks. Replace Thread.sleep(1) with expectation-based waiting.

KSCrashBacktraceTests: fix testGenerateBacktraceForBackgroundThread
by busy-spinning the background thread inside a @inline(never) user
code function so the backtrace captures user binary image frames.
Restores the assertion that DatadogCrashReportingTests appears in
binary images.
Each test called span.setActive() which enters an os_activity scope,
but never called span.finish() which leaves it. With randomized test
execution, accumulated nested os_activity scopes corrupted the activity
hierarchy, causing getActiveSpan() to return nil in subsequent tests.
@Valpertui Valpertui force-pushed the valpertui/fix/tests-suite-improvements branch from 94175d0 to f436891 Compare April 15, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant