Skip to content

Add DDSketch implementation for latency distributions#2914

Open
tienquocbui wants to merge 2 commits into
feature/client-side-statsfrom
kelvin.bui/css-ddsketch
Open

Add DDSketch implementation for latency distributions#2914
tienquocbui wants to merge 2 commits into
feature/client-side-statsfrom
kelvin.bui/css-ddsketch

Conversation

@tienquocbui
Copy link
Copy Markdown
Member

@tienquocbui tienquocbui commented May 12, 2026

What and why?

Adds a self-contained DDSketch implementation in DatadogTrace/Sources/DDSketch/ for computing approximate latency percentiles in client-side stats. The okSummary and errorSummary fields in the stats payload require DDSketch data serialized as protobuf bytes.

How?

Four source files, ported from the Go reference implementation:

  • ProtoEncoder - Minimal protobuf encoder supporting only the subset needed by ddsketch.proto: varint, fixed64, sint32/zigzag, length-delimited, and packed doubles.
  • LogarithmicMapping - Maps positive doubles to integer bin indices. 1% relative accuracy, index(for:) is a line-for-line match with Go's LogarithmicMapping.Index().
  • CollapsingLowestDenseStore - Contiguous bin array that collapses lowest bins when exceeding maxNumBins (2048). Trades lowest-quantile accuracy for bounded memory.
  • DDSketch - Internal struct with makeForStats() factory (1% accuracy, 2048 bins), add() for recording values, and toProtoBytes() for protobuf serialization. All proto field numbers verified against ddsketch.proto.

The code is intentionally self-contained with no SDK dependencies for potential future extraction.

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes - N/A (internal, not user-facing)
  • Add Objective-C interface for public APIs - N/A (internal to DatadogTrace)
  • Run make api-surface when adding new APIs - N/A (internal to DatadogTrace)

@tienquocbui tienquocbui self-assigned this May 12, 2026
@tienquocbui tienquocbui force-pushed the kelvin.bui/css-ddsketch branch 3 times, most recently from 18ecc74 to 960fec1 Compare May 12, 2026 08:58
@tienquocbui
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 960fec14f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread DatadogInternal/Sources/DDSketch/CollapsingLowestDenseStore.swift Outdated
@tienquocbui tienquocbui force-pushed the kelvin.bui/css-ddsketch branch from 960fec1 to 786bb0c Compare May 12, 2026 09:13
@tienquocbui tienquocbui marked this pull request as ready for review May 12, 2026 12:47
@tienquocbui tienquocbui requested review from a team as code owners May 12, 2026 12:47
@maxep
Copy link
Copy Markdown
Member

maxep commented May 12, 2026

QQ: Why hosting it in DatadogInternal? who will be the consumers of the DDSketch?

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 786bb0cdef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread DatadogInternal/Sources/DDSketch/CollapsingLowestDenseStore.swift Outdated
@tienquocbui
Copy link
Copy Markdown
Member Author

@maxep The consumer is StatsConcentrator in DatadogTrace, which needs DDSketch to compute the okSummary/errorSummary latency distributions in client-side stats payloads. Since feature modules can only import DatadogInternal, it lives here. We discussed making it a standalone module but agreed with @arroz that a self-contained subdirectory in DatadogInternal avoids the extra build target overhead while keeping the code isolated for potential future extraction.

@tienquocbui tienquocbui force-pushed the kelvin.bui/css-ddsketch branch from 786bb0c to 54bb612 Compare May 12, 2026 13:14
@maxep
Copy link
Copy Markdown
Member

maxep commented May 12, 2026

If it's only consumed by DatadogTrace, can we define it in DatadogTrace instead? DatadogInternal is only for shared definition with minimal logic (NetworkInstrumentation being the exception)

@tienquocbui
Copy link
Copy Markdown
Member Author

tienquocbui commented May 12, 2026

Good point! We discussed this with Miguel and Maciek earlier and agreed on DatadogInternal as a middle ground. The code is fully self-contained (no SDK dependencies, own subdirectory) so it's halfway to being extracted into a standalone Swift package later.

@tienquocbui tienquocbui force-pushed the kelvin.bui/css-ddsketch branch from 54bb612 to 7349ef4 Compare May 13, 2026 11:47
@tienquocbui tienquocbui force-pushed the kelvin.bui/css-ddsketch branch from 7349ef4 to 82f2925 Compare May 13, 2026 11:52
@tienquocbui
Copy link
Copy Markdown
Member Author

tienquocbui commented May 13, 2026

Moved DDSketch from DatadogInternal to DatadogTrace per @maxep's feedback, since DatadogTrace is the only consumer, it fits better here. DatadogInternal should stay focused on shared interfaces, not standalone utilities. The code remains self-contained (no SDK dependencies) so future extraction is still straightforward.

extendRange(newMin: minIndex, newMax: index)
}

let arrayIndex = index - offset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: just a matter of style, but on the other assignments above, the index calculation was done inline, like bins[minIndex - offset]. Why doing it differently here?


/// Returns the contiguous bin data for protobuf serialization.
/// The `offset` is the index of the first bin in the contiguous array.
func contiguousBins() -> (counts: [Double], indexOffset: Int32) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can save the creation of an array (and copying memory, etc) by returning Array<Double>.SubSequence instead of [Double]. Then, in ProtoEncoder.encodePackedDoubles(…) you can change values type to Array<Double>.SubSequence as well.

Comment on lines +72 to +73
let startArrayIndex = minIndex - offset
let endArrayIndex = maxIndex - offset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When calculating indexes, it's always good practice to base them on offsets from the collection start or end index. Although skipping that works here, this makes the code safer if, in the future, an optimization is added where the array turns into a subarray, like what I suggested above. Never assume the startIndex of a collection is 0 (and in many collections the Index type may be something other than an Int).

Suggested change
let startArrayIndex = minIndex - offset
let endArrayIndex = maxIndex - offset
let startArrayIndex = bins.startIndex.advanced(by: minIndex - offset)
let endArrayIndex = bins.startIndex.advanced(by: maxIndex - offset)

let adjustedMin = newMax - maxNumBins + 1

if bins.isEmpty || adjustedMin >= maxIndex {
let totalCount = count
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why creating an additional var and not just using count directly below?

let dstIdx = max(oldIdx, adjustedMin)
let dstArrayIdx = dstIdx - adjustedMin
if dstArrayIdx >= 0 && dstArrayIdx < newBins.count {
newBins[dstArrayIdx] += bins[srcArrayIdx]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before about basing indexes on the array's startIndex.

}

var newBins = [Double](repeating: 0, count: maxNumBins)
for oldIdx in minIndex...maxIndex {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this easier or harder than just calculating two offsets and the count of values to copy, and then copying those? Maybe even doing newBins: = [Double](repeating: 0, count: x) + Array(bins[y..<z] + [Double](repeating: 0, count: w). Might be a bit more performant as well, but I'm not 100% sure, profiling would be needed to confirm. But it would at least be easier to read and understand the algorithm (since the construction of the new array is clearly 0 or more of zeros, then part of the older one, followed by 0 or more zeros).

/// memory usage. This is the correct trade-off for latency distributions where
/// higher percentiles (p50, p90, p99) matter more than p1.
internal struct CollapsingLowestDenseStore {
private(set) var bins: [Double]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct could use more documentation, on variables and functions, explaining what they are used for in the algorithm.

/// let protoBytes = sketch.toProtoBytes()
/// ```
internal struct DDSketch {
let mapping: LogarithmicMapping
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about documentation.

let relativeAccuracy: Double

init(relativeAccuracy: Double) {
precondition(relativeAccuracy > 0 && relativeAccuracy < 1, "relativeAccuracy must be in (0, 1)")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
precondition(relativeAccuracy > 0 && relativeAccuracy < 1, "relativeAccuracy must be in (0, 1)")
precondition(relativeAccuracy > 0 && relativeAccuracy < 1, "relativeAccuracy must be in [0, 1]")


self.maxIndexableValue = min(
exp((Double(Int32.max) - indexOffset) / multiplier - 1),
exp(709.0) / (2.0 * gamma / (1.0 + gamma))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this 709 number? Can it be moved to a constant with a name that expresses its meaning?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

709 is the largest safe exponent for exp() before IEEE 754 double overflow (exp(709.78...) ≈ Double.greatestFiniteMagnitude). It comes directly from the Go reference. I'll extract it to a named constant.

///
/// This encoder is intentionally self-contained with no SDK dependencies
/// so the DDSketch code can be extracted to a standalone repository.
internal struct ProtoEncoder {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered the pros and cons of using Swift Protobuf instead? Specifically, how much does the compiled binary grow in Release mode, after the usual dead code removing, compared to our version? I'm not against leaving our implementation in, but if the resulting different in the binary size is not very significative, I would be happy with not having to maintain this code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We considered it. The main reasons for a custom encoder:

  • (1) adding apple/swift-protobuf is a new dependency, which goes against the SDK's small-footprint principle - the library adds ~200KB to binary size even after dead code stripping,
  • (2) we only need ~100 lines of write-only encoding for 5 wire types, no decoding,
  • (3) the maintenance burden is minimal since the DDSketch proto schema is stable and unlikely to change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants