[WIP / experiment] Specialization of `std.foldl(std.mergePatch, arr, target)` by JoshRosen · Pull Request #254 · databricks/sjsonnet

JoshRosen · 2025-01-03T04:02:32Z

Note

I have marked this PR as a draft because I want to do one more round of self-review on the added test cases and test coverage before merging, plus possibly clean up some of the comments. I'm opening the PR now for feedback.

Overview

This PR adds a specialization for std.foldl(std.mergePatch, patches, target), improving performance when applying large numbers of patches to an object.

Motivation

This pattern is sometimes used for flattening or reshaping large lists of configurations.

As a simplified toy example:

local inputData = [
  { cloud: 'aws', 'region': 'us-west-1', 'service': 'webapp', confs: {servers: 1 }},
  { cloud: 'aws', 'region': 'us-west-1', 'service': 'auth', confs: {servers: 2 }},
  { cloud: 'azure', 'region': 'us-west-2', 'service': 'auth', confs: {servers: 2 }},
];

std.foldl(
  std.mergePatch,
  [
     { [x.cloud]: { [x.region]: { [x.service]: x.confs } } } 
     for x in inputData
  ],
  {}
)

yields

{
   "aws": {
      "us-west-1": {
         "auth": {
            "servers": 2
         },
         "webapp": {
            "servers": 1
         }
      }
   },
   "azure": {
      "us-west-2": {
         "auth": {
            "servers": 2
         }
      }
   }
}

In some cases we use this pattern to flatten very large lists of objects. In profiling, I noticed significant time spent in mergePatch and began looking for optimizations.

This PR's optimized implementation

This PR adds a specialized FoldlMergePatch function to optimize this pattern. This function is not directly invokable by end users; instead, it's automatically injected using the specialization framework from #119 / 0bd255a.

In psuedocode, the optimized implementation does the following:

Check whether there are patches to apply. If not, return the target.
Determine the set of patches that might affect the output.
- Non-Obj patches overwrite the target, hence special handling here.
If the object was overwritten by a non-object, return the non-object.
Otherwise, execute a recursive function over the object:
- Determine an upper bound on set of potential output fields by unioning the visible fields of the target and the patches.
- For each possibly-output field:
  - Check if the target has the value.
  - Iterate over the patches, collecting the field values that might participate in the output:
    - If we see an explicit Null then it removes the field, so we can ignore the target and all earlier patch values when we see a null.
  - At this point, either:
    - (a) no patches merged with the field, so return the target's value (if any)
    - (b) the last effective patch removed the field, so drop it from the output.
    - (c) the last effective patch set a non-object, so we can just store that in the field
    - (d) an earlier patch removed the field and only one patch adds to it, so we can simply use the patch's object as the field value (after first cleaning it to remove any hidden or +: fields present anywhere in the object or its children).
    - (e) at least two objects need to be merged, so recursively merge them
      - This step continues to distinguish between the target and patch objects in the sub-merges: this is necessary because hidden nested fields in target fields that don't merge with patches are preserved in output, while hidden fields in or targeted by patches are dropped. See mergePatch tests added in Fix a bug in hidden field handling in std.mergePatch #250 for details.

This optimized implementation has the same O(n²) worst-case asymptotic complexity as the unoptimized std.foldl(std.mergePatch, ..., ...) approach, but can be significantly faster in practice because it produces much less garbage: it avoids building and discarding intermediate merge results and therefore avoids having to repeatedly recompute the intermediate merge target's visible fields.

I also play several tricks to reduce object allocations, including:

Reusing the same LinkedHashMap instance for both computing the set of visible keys and for the final value0 in the resulting Val.Obj.
Reusing the same Array[Val] for holding patch values when computing the set of values that will participate in a sub-merge at a given ancestor path: I pass around a size field to denote the "valid" size of this array, avoiding copies for trimming it to size.
Optimized cleanObject, which is a perhaps-overly-complex optimized equivalent of the regular megePatch's recSingle helper function: this is used for recursively removing non-visible or explicitly-null keys from patch objects when they don't merge with the target, but it also has the side effect of dropping +: modifiers. My implementation optimizes for the common scenario where patches are purely additive by avoiding new object allocation when the cleaning would be a no-op.

Correctness testing

I added a new dedicated test suite for this optimization.

I noticed that our existing unit tests don't meaningfully exercise specialization because the StaticOptimizer usually ends up constant/static-folding the test cases. To address this, I added a new internal disableStaticApplyForBuiltinFunctions setting which we can set in specific tests.

I also added an internal disableBuiltinSpecialization setting for disabling specialization.

Combining these together, I added a test helper which compares specialized and non-specialized execution and asserts that the answers are equal with field ordering preserved.

Warning

As we saw in #250, merge patch's implicit behaviors can be subtle and I'm still not 100% sure that I've faithfully covered all cases, which is why I've kept this marked as a draft to give me a chance to revisit the tests with fresh eyes. In particular, I'm not 100% certain that I've handled standard vs. unhide visibility correctly. For some reason, not yet clear to me, sjsonnet's mergePatch creates fields with Unhide visibility. This might constrain my ability to do the cleanObject optimizations, but it might also be a behavior difference w.r.t. the official jsonnet implementation and thus perhaps something we could change to more faithfully match their behavior.

Performance testing

🚧 I'll attach some of my microbenchmarks later.

In end-to-end tests on a very complex real-world input, this PR's patch cut ~20% of overall wallclock time.

This saves some extra allocations.

…recMerge

…ergePatch; add new tests

Still need to add more tests for plus field handling; currently lack full mutation coverage for the added code there. Also need more cases for order preservation during field merges.

JoshRosen · 2025-01-03T08:22:25Z

On further reflection, I think this ends up changing lazy evaluation semantics for mergePatch, something that's not covered in the test cases.

But I think there's still substantial room to speed this up via other potential optimizations that I spotted while digging into this, including optimizations to speed up field name checks.

I also spotted a pre-existing bug in our std.mergePatch's default field visibility.

I'll fix both of those in separate PRs.

JoshRosen · 2025-01-04T03:03:34Z

Closing this, as I managed to gain even larger speedups via the optimizations in #258 and those don't change semantics.

I may end up repurposing parts of the specialization testing bits in a future PR.

JoshRosen added 21 commits December 30, 2024 23:09

Add optimized mergePatchAll function

c12eea0

Add rewrite behind flag.

fe438ce

Temporary test coverage hack

b756ec2

Optimization to reuse objects array; this cuts significantly on garbage.

017f923

Optimize construction of the final object.

5ae71ad

Re-use outputFields LinkedHashMap to collect distinct keys

f0f1a65

This saves some extra allocations.

Handle empty input arrays at higher level; remove now-dead branch in …

3e5dd13

…recMerge

Merge remote-tracking branch 'origin/master' into optimize-foldl-of-m…

f621967

…ergePatch; add new tests

Updates for hidden field handling.

032b221

Still need to add more tests for plus field handling; currently lack full mutation coverage for the added code there. Also need more cases for order preservation during field merges.

Port to existing specialization framework.

c13552b

Cleanups after specialization.

72cc865

Test coverage fixes; fix handling of non-object patches.

0a47bf9

don't need thunk in createMember

4373c28

more single patch test cases

9161985

Fix pos handling.

71df7cf

Handle non-empty targets.

966f8cd

unused cleanup

317b928

Remove user-facing flag.

d5e6a92

more test cases

83ab129

Capitalization.

293bc52

Backout settings change.

9136c4e

JoshRosen changed the title ~~Specialization of std.foldl(std.mergePatch, arr, target)~~ [WIP / experiment] Specialization of std.foldl(std.mergePatch, arr, target) Jan 3, 2025

JoshRosen closed this Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP / experiment] Specialization of `std.foldl(std.mergePatch, arr, target)`#254

[WIP / experiment] Specialization of `std.foldl(std.mergePatch, arr, target)`#254
JoshRosen wants to merge 21 commits intodatabricks:masterfrom
JoshRosen:optimize-foldl-of-mergePatch

JoshRosen commented Jan 3, 2025

Uh oh!

JoshRosen commented Jan 3, 2025

Uh oh!

JoshRosen commented Jan 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JoshRosen commented Jan 3, 2025

Overview

Motivation

This PR's optimized implementation

Correctness testing

Performance testing

Uh oh!

JoshRosen commented Jan 3, 2025

Uh oh!

JoshRosen commented Jan 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant