Use `Iterator::nth()` for reservoir sampling algorithms.

The reservoir sampling algorithms will be implemented using a generic `Iterator` in the new `QueryResultIterator` (or Entities) code. The real performance advantage of this algorithm is that it _skips over_ elements of the sequence it doesn't need to consider for inserting into the reservoir. However, the way we are implementing it, we still iterate over every element. For the specific case that the `Iterator` has a constant time `nth` method implementation (e.g. `Vec<T>`), this leaves a ton of performance on the table unnecessarily. 

Instead of computing `next_pick_position`, then iterating over each element and checking if `position == next_pick_position`, we should compute `skip_count` and set `item = iter.nth(skip_count)` directly. (Beware off-by-one, as `iter.nth(0)` is equivalent to `iter.next()`.)

Unfortunately, this change won't give us much of an advantage in our most interesting use-case of sampling from index sets, because `HashSet`'s `Iterator::nth` implementation has to iterate over each "slot" to account for the fact that some slots might be empty. (This is why we haven't bothered to make this change yet.) _However,_ this could be very relevant if we use alternative data structures for different kinds of indexes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `Iterator::nth()` for reservoir sampling algorithms. #667

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use Iterator::nth() for reservoir sampling algorithms. #667

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Use `Iterator::nth()` for reservoir sampling algorithms. #667