Skip to content

Crash in DatadogRUM VitalCPUReader.readVitalData(): Swift arithmetic overflow during CPU vitals sampling #221

@javier9010

Description

@javier9010

Stack trace

We are seeing production iOS crashes reported by Bugsnag. The crash appears to happen inside DatadogRUM CPU mobile-vitals sampling, not in our Unity app code.

EXC_BREAKPOINT:

0 DatadogRUM +0x96434 Swift runtime failure: arithmetic overflow ()
1 DatadogRUM +0x96434 VitalCPUReader.readVitalData() (VitalCPUReader.swift:31:46)
2 DatadogRUM +0x966ac protocol witness for SamplingBasedVitalReader.readVitalData() in conformance VitalCPUReader ()
3 DatadogRUM +0x9692c VitalInfoSampler.takeSample() (VitalInfoSampler.swift:90:41)
4 DatadogRUM +0x96a28 closure # 2 in VitalInfoSampler.init(cpuReader:memoryReader:refreshRateReader:frequency:maximumRefreshRate:) (VitalInfoSampler.swift:78:19)
5 DatadogRUM +0x5f534 thunk for @escaping @callee_guaranteed @sendable (@guaranteed NSTimer) -> () ()
6 Foundation +0x7c35c ___NSFireTimer
7 CoreFoundation +0xc1bac _CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION
8 CoreFoundation +0x82de0 ___CFRunLoopDoTimer
9 CoreFoundation +0x2c0f8 ___CFRunLoopDoTimers
10 CoreFoundation +0x755b8 ___CFRunLoopRun
11 CoreFoundation +0x79d1c _CFRunLoopRunSpecific
12 GraphicsServices +0x1994 _GSEventRunModal
13 UIKitCore +0x371348 -[UIApplication _run]
14 UIKitCore +0x370fc0 _UIApplicationMain
15 UnityFramework +0x18b80 -[UnityFramework runUIApplicationMainWithArgc:argv:] (main.mm:102:5)
16 YAHTZEE +0x4178 main (main.mm:26:9)
17 dyld +0x14340 start

Reproduction steps

We do not have a deterministic local reproduction yet. This is observed in production App Store builds.

Observed pattern:

  • Crash occurs during startup/bootstrap.
  • Bugsnag context is usually in the startup.
  • is_startup_phase is always true.
  • In the latest sample, the app received "Scene Will Deactivate" shortly before the crash.
  • The crash happens from an NSTimer callback on the main run loop while DatadogRUM is sampling CPU vitals.

Datadog RUM is enabled through the Unity package. Mobile vitals are enabled with the Unity package setting VitalsUpdateFrequency = Average.

Volume

Bugsnag currently shows approximately 1,562 events for this grouped crash. We do not have the exact percentage of total app sessions impacted in this report, but the issue is recurring in production across multiple app versions. Bugsnag reports 610 distinct deviceToken values in metadata for this group. In the last release we started noticing a spike

Affected SDK versions

Datadog Unity SDK: com.datadoghq.unity 1.4.3 Native iOS SDK versions pinned by this Unity package: - DatadogCore 2.29.0 - DatadogLogs 2.29.0 - DatadogRUM 2.29.0 - DatadogCrashReporting 2.29.0

Latest working SDK version

Unknown. This is our first confirmed grouped crash investigation for this specific DatadogRUM CPU vitals crash signature.

Does the crash manifest in the latest SDK version?

No

Unity Version

Unity 6000.0.52f1

Build Specifics

Platform: iOS
Distribution: App Store
Architecture: arm64

Datadog Unity package: com.datadoghq.unity 1.4.3
Datadog iOS pods: 2.29.0
RUM enabled: yes
CrashReporting enabled: yes
VitalsUpdateFrequency: Average
AutomaticSceneTracking: disabled

Device Information

All reported events are iOS physical Apple devices.

Top OS versions in Bugsnag:

  • iOS 16.7.11: 943 events
  • iOS 16.7.12: 418 events
  • iOS 16.7.14: 87 events
  • iOS 16.7.10: 15 events
  • iOS 15.5: 14 events

Top device models include:

  • iPhone14,7
  • iPhone15,4
  • iPhone14,5
  • iPhone16,2
  • iPhone12,1

Notable metadata:

  • device.jailbroken = true for all events in this Bugsnag group
  • is_startup_phase = true for all events
  • lastLoadedUnityScene = Bootstrap for all events
  • releaseStage = AppStore

Other relevant information

We have not verified this in production with the latest Datadog Unity SDK yet. However, source inspection shows the same unchecked UInt64 arithmetic in VitalCPUReader.readVitalData() still exists in newer native iOS SDK sources, including 2.30.1 and 3.11.0, so we would appreciate guidance on whether this is already addressed elsewhere or if a defensive guard is recommended.

This looks similar in symptom to DataDog/dd-sdk-ios#1181 and PR DataDog/dd-sdk-ios#1177, but we are already on native DatadogRUM 2.29.0, so the previous RUM-thread initial sampling fix should already be included.

Our current hypothesis is that VitalCPUReader assumes the CPU tick values are monotonic, but in affected production cases the sampled tick value may be lower than a previously stored value. That would cause Swift UInt64 arithmetic to trap at:

VitalCPUReader.swift:
let ongoingInactiveTicks = ticks - (utilizedTicksWhenResigningActive ?? ticks)
let inactiveTicks = totalInactiveTicks + ongoingInactiveTicks
return Double(ticks - inactiveTicks)

We are looking for recommended solutions that allow us to keep Datadog RUM enabled. Possible options we would like guidance on:

  • Is there a supported way in the Unity SDK to disable only CPU vitals while keeping RUM, logs, resources, actions, and other telemetry enabled?
  • Would Datadog accept a guard in VitalCPUReader to drop invalid CPU samples instead of crashing?
  • Is there a known version of dd-sdk-ios or dd-sdk-unity that already hardens this code path?
  • Is there a recommended workaround for Unity users before an SDK fix is available?

A defensive fix could be to return nil for the CPU sample when:

  • ticks < utilizedTicksWhenResigningActive
  • totalInactiveTicks + ongoingInactiveTicks would overflow
  • ticks < inactiveTicks
  • currentTicks < previouslyReadTicks in appDidBecomeActive()

Metadata

Metadata

Assignees

No one assigned

    Labels

    crashCrash caused by the SDK

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions