feat: with replicas unique by key + some fixes & performance improvements#31
feat: with replicas unique by key + some fixes & performance improvements#31nicmr wants to merge 6 commits intojeromefroe:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the HashRing consistent-hashing API to support selecting replicas that are unique by a caller-provided key, while also fixing an edge-case in replica count clamping and refactoring replica selection to avoid unnecessary cloning/work per call.
Changes:
- Add
get_with_replicas_unique_by_key, allowing replica uniqueness to be defined by a closure (e.g., physical-node identity). - Fix replica-count clamping at the ring-length boundary (off-by-one) and add a regression test for it.
- Refactor
get_with_replicasreplica selection to a modulo-indexed approach (intended to reduce cloning/overhead).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } else { | ||
| replicas + 1 | ||
| }; | ||
| let replicas = std::cmp::min(replicas+1, self.ring.len()); |
There was a problem hiding this comment.
replicas + 1 can overflow for very large replicas values (e.g. usize::MAX), causing a panic in debug builds or wraparound in release builds. Consider using replicas.saturating_add(1) or replicas.checked_add(1).unwrap_or(self.ring.len()) before applying min.
| let replicas = std::cmp::min(replicas+1, self.ring.len()); | |
| let replicas = std::cmp::min(replicas.saturating_add(1), self.ring.len()); |
| return None; | ||
| } | ||
|
|
||
| let replicas = std::cmp::min(replicas + 1, self.ring.len()); |
There was a problem hiding this comment.
replicas + 1 can overflow for very large replicas values (e.g. usize::MAX), causing a panic in debug builds or wraparound in release builds. Consider using saturating_add/checked_add before min (same as get_with_replicas).
|
@nicmr your proposals make sense to me. Can you take a look at the failing unit test? |
|
@jeromefroe Thanks for the feedback and the quick reply! 👋 Some of the failing unit tests concern functions I've not touched, so I think they broke for an unrelated reason. Maybe they could be replaced with tests ensuring the returned nodes & order are consistent over multiple calls (to test the consistent property of consistent hashing), but not asserting a hard-coded order? 🤔 I'll adress all the Copilot feedback as well and will ping you once that that's done! |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Hi,
thanks for developing this library! I recently started using it and would like to propose a few improvements:
New feature:
get_with_replicas_unique_by_keyWhen I tried using this library I encountered the following limitation:
get_with_replicasprovides no ability to check for uniqueness criteria, i.e. an ability that following virtual nodes don't map to the same physical node.Illustrating it with an example: when looking up 1 primary + 2 replicas in a 3+ node cluster mapped over 100 virtual nodes, there's a considerable chance the second replica will be a virtual node mapping to the same physical node as the primary.
That's why I propose introducing a new function
get_with_replicas_unique_by_key, that allows passing a caller-specified function to compute a uniqueness criteria (usually from the fields identifying a physical node is unique, e.g. IP + port).This is not possible without support directly from the library - with
get_with_replicasyou'd have to check the result, check again with an increased replica count, or call it with way too many replicas in advance to increase the chance of actually hitting a sufficient number of physically distinct physical nodes - all really unergonomic and inefficient.Bugfix: off by one at ring.len() boundary
breaks when replicas +1 == self.ring.len() - it will instruct the implementation to return e.g. calling the function with replicas=6 and a ring size of 6 causes it to return 7 elements.
I've replaced it with:
Performance improvement in
get_with_replicasFinally, I refactored the replica cloning function from cycles to using a modulo-indexed approach - with the
get_with_replicas_unique_by_keythe.cycles()call could actually hang indefinitely when there are not enough unique nodes in the ring.However, this approach also has an advantage worth making use of in
get_with_replicas: It avoids cloning the entire ring for every function call and only clones the nodes actually returned from the function.State of current tests
On my machine, the existing tests that expect specific nodes to be returned in a certain order when a specific key is looked up don't pass even on a plain clone of the repo - I don't know if it's the result of a compiler / dep version change or platform differences, but they appear to flaky to me and I would suggest changing them. That's why the tests suite doesn't pass completely with my changes.
Please let me know if you have any questions, concerns or would like to merge a subset of the changes.