RUM-15735: Offload Mach profile aggregation by simaoseica-dd · Pull Request #2878 · DataDog/dd-sdk-ios

simaoseica-dd · 2026-05-05T13:00:47Z

What and why?

This PR improves the iOS Mach profiler so aggregation work no longer blocks the sampling loop. It also reduces aggregation cost and adds bounded overload handling so normal profiling can keep sampling while heavy profile processing catches up.

How?

Mach profiling now separates the sampling hot path from the aggregation cold path. The sampling thread only captures raw stack samples and hands off completed buffers, avoiding expensive profile mutation, binary image resolution and flush/profile-rotation work on the sampling loop.

Aggregation runs on a long-lived serialized aggregation_worker. The worker drains sample batches in order, executes flush requests as ordered barriers and performs profile swaps on the same serialized stream so flush_and_get_profile() produces a deterministic profile boundary while sampling continues.

Additionally, the aggregation_worker also:

resolves binary image data lazily for first seen instruction pointers instead of resolving every frame in every batch.
excludes profiler owned threads from collected profiles.
uses reusable buffers plus temporary overflow buffers so the sampler does not pause when aggregation falls behind.
applies a hard byte ceiling to queued aggregation work; if the ceiling is reached, new batches are dropped and internal diagnostics counters are recorded.
exposes dropped batch diagnostics through a C API. This diagnostics are collected in the Profiling Session telemetry (internal spec).

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration)
Make sure each commit and the PR mention the Issue number or JIRA reference
Add CHANGELOG entry for user facing changes
Add Objective-C interface for public APIs - see our guidelines (internal)
Run make api-surface when adding new APIs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 831d3b7b37

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

arroz

Overall it seems OK! I took some time to validate locking as I'm not a C++ expert, I don't see any obvious deadlocks or anything like that.

Just left some minor comments.

arroz · 2026-05-08T13:06:51Z

@@ -49,6 +49,9 @@ void dd_delete_profiling_defaults(void);
 *                   - Default: 5000000000ULL (5 seconds)
 *                   - Timeout checking occurs during sample processing
 *


Nitpick: empty line between these two parameters but not the others.

arroz · 2026-05-08T15:23:45Z

 // Profiling sampling backstop (see `callback`).
 // Typical profile span is ~1 minute; this cutoff includes additional slack beyond that.
 // The extra time avoids stopping sampling while the profile is still being processed.
 static constexpr int64_t DD_PROFILER_TIMEOUT_NS = 90000000000ULL; // 1:30 minutes


There is a mismatch between the value and the variable type (signed vs unsigned). Although the value fits within the signed bits, both types should match.

arroz · 2026-05-11T10:23:58Z

+    /// Recycled buffers retained for future producer handoff.
+    std::vector<std::vector<stack_trace_t>> reusable_buffers;
+
+    std::mutex work_mutex;


This mutex could have a bit of documentation regarding which variables it protects.

awforsythe

The level of complexity required here for proper synchronization is pretty impressive. I've done my best to understand the mental model for everything that's happening here re: concurrency, but I think this is the sort of code where you can't go wrong with more inline comments 🙂

As far as C++ code quality goes, LGTM.

awforsythe · 2026-05-11T14:20:01Z

    func testStop_unblocksPendingFlushRequest() {
        // Given
        XCTAssertEqual(dd_profiler_start(), 1)
        Thread.sleep(forTimeInterval: 0.05) // Allow sampling to begin

        let flushReturned = expectation(description: "Flush returns after stop")

        // When
        DispatchQueue.global().async {
            let profile = dd_profiler_flush_and_get_profile()
            if let profile {
                dd_pprof_destroy(profile)
            }
            flushReturned.fulfill()
        }

        dd_profiler_stop()

        // Then
        wait(for: [flushReturned], timeout: 1.0)
        XCTAssertEqual(dd_profiler_get_status(), DD_PROFILER_STATUS_STOPPED)
    }


This is a pre-existing test that wasn't modified in these changes, but it seems like it may be relevant, since both dd_profiler_flush_and_get_profile and dd_profiler_stop now require coordination with the aggregation worker.

The test name ("stop unblocks pending flush request") seems to suggest that when you call dd_profiler_stop, you expect any pending call to dd_profiler_flush_and_get_profile to be unblocked, as in canceled immediately... but the test doesn't seem to validate that behavior. A flush call now waits on flush_cv while holding g_dd_profiler_mutex, meaning that a subsequent stop call will block (waiting for g_dd_profiler_mutex) until the flush completes.

i.e. from what I understand, if Thread A calls dd_profiler_flush_and_get_profile() and that call takes 100ms, a call to dd_profiler_stop() in Thread B will block until that 100ms flush operation is finished. Is this the expected behavior?

(If so, the name of the test seems a bit misleading, as the call to stop has no effect on whether the pending flush is unblocked: the test is just asserting that when you call both functions concurrently, they both return and you end up in the stopped state.)

fuzzybinary

This is some good work! A few comments that I think should be addressed, but I won't hold up merge.

fuzzybinary · 2026-05-11T14:06:41Z

+    struct work_item {
+        enum class kind {
+            batch,
+            flush_barrier
+        };


If we're able to use C++17 (which I think we are) this would be better served as an std::variant which would also likely save a bunch of memory.

fuzzybinary · 2026-05-11T14:10:19Z

+    uint64_t dropped_batch_count = 0;
+    uint64_t dropped_sample_count = 0;
+    uint64_t max_pending_bytes = 0;


We already have a structure that stores these together - maybe just make these a dd_profiler_diagnositcs object?

fuzzybinary · 2026-05-11T14:12:17Z

+
+namespace dd::profiler {
+
+static void destroy_stack_trace_payload(stack_trace_t* trace) {


nit: This feels like a weird place for this. Shouldn't this live somewhere with the declaration of stack_trace_t?

fuzzybinary · 2026-05-11T14:17:03Z

+}
+
+void aggregation_worker::stop() {
+    if (is_worker_thread()) {


Does it really make sense to allow the worker thread to call stop?

Even if it does, I think this whole function could use some documentation, explaining what it's doing under what circumstances and why..

fuzzybinary · 2026-05-11T14:20:30Z

+    lock.unlock();
+    if (execute_inline && action) {
+        action(action_ctx);
+    }


I'm curious if this execute_inline special case can be avoided somehow. It would be nice if whatever is processing the work_items could reliably execute the flush action.

fuzzybinary · 2026-05-11T14:29:45Z

+    }
+}
+
+void aggregation_worker::enqueue_active_buffer(std::vector<stack_trace_t>& active_buffer) {


I had to stare at this function for quite a while to figure out what it was doing. Comments here would be greatly appreciated.

fuzzybinary · 2026-05-11T14:40:27Z

+    }
+}
+
+void aggregation_worker::destroy_batch(std::vector<stack_trace_t>& batch) {


nit: naming - this isn't destroying the batch, it's clearing the data in the individual items. It looks really weird to see destroy_batch immediately followed by recycle_batch.

fuzzybinary · 2026-05-11T14:43:22Z

+    if (reusable_buffers.size() < max_reusable_buffers) {
+        reusable_buffers.push_back(std::move(batch));
+    }


Did we do any profiling / checks on how much time we save performing recycling on these vectors? It seems we only allocate once during the buffer swap, which, because these are all C structs, should be a single allocation without any constructor calls. I feel like the added complexity of recycling the buffers might not be worth it, especially as the number of buffers available isn't a hard limit. You'll still get a new buffer even if we're at the max number of reusable buffers.

fuzzybinary · 2026-05-11T14:47:09Z

+        dd_profiler_diagnostics_t result = empty_diagnostics();
+
+        if (profiler) {
+            profiler->consume_diagnostics(&result);


Since this is all C++, any reason this couldn't be pass by reference instead of taking the pointer?

fuzzybinary · 2026-05-11T14:51:23Z

+        if (profiler) {
+            profiler->request_flush(swap_profile_action, &swap_context);


Does request_flush ever take an action that isn't swap_profile_aciton? If not, I feel like we might be able to add some type safety here by letting this take an std::function over a C-style funciton pointer.

simaoseica-dd requested review from a team as code owners May 5, 2026 13:00

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

Comment thread DatadogProfiling/Mach/include/mach_sampling_profiler.h

simaoseica-dd force-pushed the feature/profiling branch from 0a068db to c878d78 Compare May 7, 2026 11:24

simaoseica-dd added 2 commits May 7, 2026 12:25

RUM-15735: Offload Mach profile aggregation

473ba7c

RUM-15735: Add profiling aggregation diagnostics telemetry

562d076

simaoseica-dd force-pushed the simaoseica/RUM-15735/offload-mach-profile-aggregation branch from 831d3b7 to 562d076 Compare May 7, 2026 16:27

simaoseica-dd requested review from arroz, awforsythe, fuzzybinary and maxep May 8, 2026 09:43

arroz reviewed May 11, 2026

View reviewed changes

awforsythe approved these changes May 11, 2026

View reviewed changes

fuzzybinary reviewed May 11, 2026

View reviewed changes


		namespace dd::profiler {

		static void destroy_stack_trace_payload(stack_trace_t* trace) {

		if (profiler) {
		profiler->request_flush(swap_profile_action, &swap_context);

Conversation

simaoseica-dd commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What and why?

How?

Review checklist

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

arroz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awforsythe left a comment

Choose a reason for hiding this comment

Uh oh!

awforsythe May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fuzzybinary left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

simaoseica-dd commented May 5, 2026 •

edited

Loading

awforsythe May 11, 2026 •

edited

Loading