Skip to content

chore: Sampling algorithms now use nth method. Fixes #667.#760

Merged
RobertJacobsonCDC merged 3 commits intomainfrom
RobertJacobsonCDC_667_sample_with_nth
Feb 13, 2026
Merged

chore: Sampling algorithms now use nth method. Fixes #667.#760
RobertJacobsonCDC merged 3 commits intomainfrom
RobertJacobsonCDC_667_sample_with_nth

Conversation

@RobertJacobsonCDC
Copy link
Collaborator

This PR

  1. modifies our built-in sampling algorithms to use Iterator::nth instead of iterating over every element.
  2. adds a missing check along one execution path to dispatch to a faster sampling algorithm, giving ~8x speedup for that particular path
  3. adds two benchmarks to the suit of sampling benchmarks to fill a gap in coverage

There are some dramatic speedups that do not represent actual performance improvements in the code, and there are small apparent regressions that are not real in the sense that the code in their execution path is unchanged. See notes on benchmarks below for details.

`EntitySetIterator` now selects sampling algorithm based on size hint (like `sample_entity` already does).
@RobertJacobsonCDC RobertJacobsonCDC linked an issue Feb 11, 2026 that may be closed by this pull request
@RobertJacobsonCDC
Copy link
Collaborator Author

Sampling algorithm benchmarks

Summary

Some algorithm benches (which apply sampling functions directly to a Vec) show extreme speedups (up to 60×), but these are misleading: ixa never samples from Vec iterators. They are still useful, though, as they isolate the overhead of the reservoir algorithm itself from the cost of nth. The only real-world improvement is the ~8× speedup for sampling_multiple_known_length, where IndexSet's O(1) nth lets the algorithm skip between selected indices in constant time. The reservoir path shows no meaningful improvement (≤1.23×), because the filtering iterators that trigger reservoir sampling have O(n) nth regardless.

Benchmarks

Smaller (faster) time in bold, relative speedup = main ÷ dev

  • > 1 → dev is faster

  • < 1 → dev is slower

Algorithm & Sampling Benchmarks (main vs dev)

benchmark main dev relative speedup (main ÷ dev)
algorithm_sampling_single_known_length 6.0698 ns 6.0833 ns 1.00×
algorithm_sampling_single_l_reservoir 30.041 µs 498.42 ns ≈60.3×
algorithm_sampling_single_rand_reservoir 155.04 µs 157.62 µs 0.98×
algorithm_sampling_multiple_known_length 32.474 µs 1.2966 µs ≈25.1×
algorithm_sampling_multiple_l_reservoir 69.347 µs 17.879 µs 3.88×

sample_entity Benchmarks (main vs dev)

benchmark main dev relative speedup (main ÷ dev)
whole_population / 1 000 11.380 ns 12.313 ns 0.92×
whole_population / 10 000 11.420 ns 12.374 ns 0.92×
whole_population / 100 000 11.395 ns 12.314 ns 0.93×
single_property_indexed / 1 000 82.356 ns 83.488 ns 0.99×
single_property_indexed / 10 000 81.921 ns 82.532 ns 0.99×
single_property_indexed / 100 000 81.593 ns 83.125 ns 0.98×
multi_property_indexed / 1 000 194.19 ns 186.21 ns 1.04×
multi_property_indexed / 10 000 188.18 ns 193.71 ns 0.97×
multi_property_indexed / 100 000 194.23 ns 186.61 ns 1.04×

The apparent regression in whole_population* isn't real: the execution paths between the branches are identical. (Link order alone can cause +/-40% performance changes. See also Emery Berger.)

sampling Benchmarks (main vs dev)

benchmark main dev relative speedup (main ÷ dev)
sampling_single_known_length 83.469 µs 84.998 µs 0.98×
sampling_single_l_reservoir 6.0256 ms 4.8935 ms 1.23×
sampling_multiple_known_length 6.5298 ms 778.81 µs ≈8.39×
sampling_multiple_l_reservoir 6.7035 ms 6.6157 ms 1.01×

Running the benchmarks

# On main:
cargo bench -p ixa-bench 'sampl' -- --save-baseline 'main'
# On the dev branch:
cargo bench -p ixa-bench 'sampl' -- --baseline 'main'

@github-actions
Copy link

Benchmark Results

Hyperfine

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `large_sir::baseline` | 2.9 ± 0.0 | 2.8 | 3.0 | 1.00 |
| `large_sir::entities` | 13.3 ± 0.5 | 12.9 | 15.9 | 4.65 ± 0.17 |

Criterion

Note: A comparison could not be generated. Maybe you added new benchmarks?

@RobertJacobsonCDC
Copy link
Collaborator Author

A dump of some of my notes for posterity.

Sampling Benchmark Coverage Analysis

After switching the sampling algorithms to use Iterator::nth for skipping (instead of
iterating over every element and comparing positions), performance depends on two
independent properties of the iterator being sampled from:

  1. Known vs. unknown length. If the iterator is an ExactSizeIterator, we use the
    sample_*_from_known_length functions, which select random indices up front and
    collect them in one pass. Otherwise, we fall back to reservoir sampling
    (sample_*_l_reservoir).

  2. Fast vs. slow nth. If the iterator supports O(1) nth (e.g. Vec, slice, or
    PopulationIterator), the skip steps are genuinely constant-time. If nth is O(n)
    (e.g. a filtering iterator, or HashSet's iterator), the skips still have to walk
    every element, and the nth optimization has no effect.

These two properties combine into three execution paths that matter in practice:

Execution path When it arises Benchmark coverage
Known length + fast nth Query is fully resolved by a single index (or multi-property index), and the backing store supports O(1) skip. algorithm_sampling_single_known_length, algorithm_sampling_multiple_known_length (algorithm benches on Vec); all sample_entity_scaling benchmarks; sampling_single_known_length_entities, sampling_multiple_known_length_entities (sample_people)
Unknown length + fast nth The reservoir algorithm is applied to an iterator with O(1) nth. This path never arises in practice: if the length is unknown, it is because the iterator filters, which makes nth linear. However, it is useful for isolating reservoir overhead from nth cost. algorithm_sampling_single_l_reservoir, algorithm_sampling_multiple_l_reservoir (reservoir algorithm applied directly to a Vec)
Unknown length + slow nth Query involves properties that are not jointly indexed, requiring an EntitySetIterator that filters one source against others. Each nth call must evaluate the filter for every skipped element. sampling_single_l_reservoir_entities, sampling_multiple_l_reservoir_entities (sample_people: query on (Property10, Property100) where each property is individually indexed but the pair is not)

Gap in benchmark coverage

There are two ways to end up with an EntitySetIterator that has unknown length and
slow nth, corresponding to different SourceIterator variants:

  1. SourceIterator::IndexIter with nonempty sources -- The source is an indexed
    set, but additional unindexed filter constraints remain in EntitySetIterator::sources.
    The iterator walks the index set and filters each candidate against the remaining
    constraints. Both sampling_*_l_reservoir_entities benchmarks in sample_people.rs
    exercise this path.

  2. SourceIterator::PropertyVecIter -- No index is available at all. The source
    iterates over a property's value vector (via ConcretePropertySource or
    DerivedPropertySource), checking every entity. No benchmark currently exercises
    this path.

We add two benchmarks to cover case 2: sampling_single_unindexed_entities and sampling_multiple_unindexed_entities.

Running the benchmarks

# On main:
cargo bench -p ixa-bench 'sampl' -- --save-baseline 'main'
# On the dev branch:
cargo bench -p ixa-bench 'sampl' -- --baseline 'main'

// index `idx` we skip `idx - consumed` where `consumed` tracks how many
// elements have already been consumed.
for idx in indexes {
if let Some(item) = iter.nth(idx - consumed) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this change, why don't you break early anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we iterated over the elements of iter (the source set), checked if we found the next index, and if we did land on the next index, updated the next index from the list of precomputed indexes. We break if there are no more indexes.

In the new version, we iterate over the precomputed indexes. There's no need to break, because it's implicit in the for idx in indexes. We move through the iter iterator by calling iter.nth on it.

Copy link
Collaborator

@k88hudson-cfa k88hudson-cfa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with extra comment removed and adding a commment about whether the size is known being runtime dependent

@github-actions
Copy link

Benchmark Results

Hyperfine

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `large_sir::baseline` | 3.0 ± 0.0 | 3.0 | 3.2 | 1.00 |
| `large_sir::entities` | 11.8 ± 0.3 | 11.1 | 12.7 | 3.88 ± 0.11 |

Criterion

Note: A comparison could not be generated. Maybe you added new benchmarks?

@RobertJacobsonCDC RobertJacobsonCDC merged commit 752e12d into main Feb 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use Iterator::nth() for reservoir sampling algorithms.

2 participants