Add DDSketch implementation for latency distributions#2914
Conversation
18ecc74 to
960fec1
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 960fec14f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
960fec1 to
786bb0c
Compare
|
QQ: Why hosting it in |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 786bb0cdef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@maxep The consumer is |
786bb0c to
54bb612
Compare
|
If it's only consumed by |
|
Good point! We discussed this with Miguel and Maciek earlier and agreed on |
54bb612 to
7349ef4
Compare
7349ef4 to
82f2925
Compare
|
Moved |
| extendRange(newMin: minIndex, newMax: index) | ||
| } | ||
|
|
||
| let arrayIndex = index - offset |
There was a problem hiding this comment.
Nitpick: just a matter of style, but on the other assignments above, the index calculation was done inline, like bins[minIndex - offset]. Why doing it differently here?
|
|
||
| /// Returns the contiguous bin data for protobuf serialization. | ||
| /// The `offset` is the index of the first bin in the contiguous array. | ||
| func contiguousBins() -> (counts: [Double], indexOffset: Int32) { |
There was a problem hiding this comment.
You can save the creation of an array (and copying memory, etc) by returning Array<Double>.SubSequence instead of [Double]. Then, in ProtoEncoder.encodePackedDoubles(…) you can change values type to Array<Double>.SubSequence as well.
| let startArrayIndex = minIndex - offset | ||
| let endArrayIndex = maxIndex - offset |
There was a problem hiding this comment.
When calculating indexes, it's always good practice to base them on offsets from the collection start or end index. Although skipping that works here, this makes the code safer if, in the future, an optimization is added where the array turns into a subarray, like what I suggested above. Never assume the startIndex of a collection is 0 (and in many collections the Index type may be something other than an Int).
| let startArrayIndex = minIndex - offset | |
| let endArrayIndex = maxIndex - offset | |
| let startArrayIndex = bins.startIndex.advanced(by: minIndex - offset) | |
| let endArrayIndex = bins.startIndex.advanced(by: maxIndex - offset) |
| let adjustedMin = newMax - maxNumBins + 1 | ||
|
|
||
| if bins.isEmpty || adjustedMin >= maxIndex { | ||
| let totalCount = count |
There was a problem hiding this comment.
Why creating an additional var and not just using count directly below?
| let dstIdx = max(oldIdx, adjustedMin) | ||
| let dstArrayIdx = dstIdx - adjustedMin | ||
| if dstArrayIdx >= 0 && dstArrayIdx < newBins.count { | ||
| newBins[dstArrayIdx] += bins[srcArrayIdx] |
There was a problem hiding this comment.
Same comment as before about basing indexes on the array's startIndex.
| } | ||
|
|
||
| var newBins = [Double](repeating: 0, count: maxNumBins) | ||
| for oldIdx in minIndex...maxIndex { |
There was a problem hiding this comment.
Is this easier or harder than just calculating two offsets and the count of values to copy, and then copying those? Maybe even doing newBins: = [Double](repeating: 0, count: x) + Array(bins[y..<z] + [Double](repeating: 0, count: w). Might be a bit more performant as well, but I'm not 100% sure, profiling would be needed to confirm. But it would at least be easier to read and understand the algorithm (since the construction of the new array is clearly 0 or more of zeros, then part of the older one, followed by 0 or more zeros).
| /// memory usage. This is the correct trade-off for latency distributions where | ||
| /// higher percentiles (p50, p90, p99) matter more than p1. | ||
| internal struct CollapsingLowestDenseStore { | ||
| private(set) var bins: [Double] |
There was a problem hiding this comment.
This struct could use more documentation, on variables and functions, explaining what they are used for in the algorithm.
| /// let protoBytes = sketch.toProtoBytes() | ||
| /// ``` | ||
| internal struct DDSketch { | ||
| let mapping: LogarithmicMapping |
There was a problem hiding this comment.
Same comment as above about documentation.
| let relativeAccuracy: Double | ||
|
|
||
| init(relativeAccuracy: Double) { | ||
| precondition(relativeAccuracy > 0 && relativeAccuracy < 1, "relativeAccuracy must be in (0, 1)") |
There was a problem hiding this comment.
| precondition(relativeAccuracy > 0 && relativeAccuracy < 1, "relativeAccuracy must be in (0, 1)") | |
| precondition(relativeAccuracy > 0 && relativeAccuracy < 1, "relativeAccuracy must be in [0, 1]") |
|
|
||
| self.maxIndexableValue = min( | ||
| exp((Double(Int32.max) - indexOffset) / multiplier - 1), | ||
| exp(709.0) / (2.0 * gamma / (1.0 + gamma)) |
There was a problem hiding this comment.
What is this 709 number? Can it be moved to a constant with a name that expresses its meaning?
There was a problem hiding this comment.
709 is the largest safe exponent for exp() before IEEE 754 double overflow (exp(709.78...) ≈ Double.greatestFiniteMagnitude). It comes directly from the Go reference. I'll extract it to a named constant.
| /// | ||
| /// This encoder is intentionally self-contained with no SDK dependencies | ||
| /// so the DDSketch code can be extracted to a standalone repository. | ||
| internal struct ProtoEncoder { |
There was a problem hiding this comment.
Have we considered the pros and cons of using Swift Protobuf instead? Specifically, how much does the compiled binary grow in Release mode, after the usual dead code removing, compared to our version? I'm not against leaving our implementation in, but if the resulting different in the binary size is not very significative, I would be happy with not having to maintain this code.
There was a problem hiding this comment.
We considered it. The main reasons for a custom encoder:
- (1) adding
apple/swift-protobufis a new dependency, which goes against the SDK's small-footprint principle - the library adds ~200KB to binary size even after dead code stripping, - (2) we only need ~100 lines of write-only encoding for 5 wire types, no decoding,
- (3) the maintenance burden is minimal since the
DDSketchproto schema is stable and unlikely to change
What and why?
Adds a self-contained DDSketch implementation in
DatadogTrace/Sources/DDSketch/for computing approximate latency percentiles in client-side stats. TheokSummaryanderrorSummaryfields in the stats payload require DDSketch data serialized as protobuf bytes.How?
Four source files, ported from the Go reference implementation:
ProtoEncoder- Minimal protobuf encoder supporting only the subset needed byddsketch.proto: varint, fixed64, sint32/zigzag, length-delimited, and packed doubles.LogarithmicMapping- Maps positive doubles to integer bin indices. 1% relative accuracy,index(for:)is a line-for-line match with Go'sLogarithmicMapping.Index().CollapsingLowestDenseStore- Contiguous bin array that collapses lowest bins when exceedingmaxNumBins(2048). Trades lowest-quantile accuracy for bounded memory.DDSketch- Internal struct withmakeForStats()factory (1% accuracy, 2048 bins),add()for recording values, andtoProtoBytes()for protobuf serialization. All proto field numbers verified againstddsketch.proto.The code is intentionally self-contained with no SDK dependencies for potential future extraction.
Review checklist
make api-surfacewhen adding new APIs - N/A (internal to DatadogTrace)