Skip to content

Latest commit

 

History

History
102 lines (71 loc) · 2.54 KB

File metadata and controls

102 lines (71 loc) · 2.54 KB

API: HyperLogLog

Status: Ready

Purpose

Approximate cardinality (distinct-count) estimation.

Types

The library provides three HLL algorithm variants, each available at multiple precision levels.

Algorithm Variants

  • HyperLogLog<Classic, H> — classic HyperLogLog (Flajolet et al.).
  • HyperLogLog<ErtlMLE, H> — improved estimator (Ertl, arXiv:1702.01284).
  • HyperLogLogHIP — Historic Inverse Probability estimator (Lang, arXiv:1708.06839). Not mergeable.

Precision Aliases

Each variant is backed by a register storage with a configurable precision parameter p (number of address bits). Higher precision uses more memory but improves accuracy.

Alias Precision Registers Underlying Type
HyperLogLogP12<Variant, H> p=12 4,096 HyperLogLogImpl<Variant, HllBucketListP12, H>
HyperLogLogP14<Variant, H> p=14 16,384 HyperLogLogImpl<Variant, HllBucketListP14, H>
HyperLogLogP16<Variant, H> p=16 65,536 HyperLogLogImpl<Variant, HllBucketListP16, H>
HyperLogLog<Variant, H> p=14 (default) 16,384 = HyperLogLogP14<Variant, H>
HyperLogLogHIPP12 p=12 4,096 HyperLogLogHIPImpl<HllBucketListP12>
HyperLogLogHIPP14 p=14 16,384 HyperLogLogHIPImpl<HllBucketListP14>
HyperLogLogHIP p=14 (default) 16,384 = HyperLogLogHIPP14

H defaults to DefaultXxHasher and can be omitted in most usage.

Constructors

fn new() -> Self
fn default() -> Self

Insert / Update

fn insert(&mut self, obj: &DataInput)
fn insert_many(&mut self, items: &[DataInput])
fn insert_with_hash(&mut self, hashed: u64)
fn insert_many_with_hashes(&mut self, hashes: &[u64])

Query

fn estimate(&self) -> usize

Classic also exposes:

fn indicator(&self) -> f64

Merge

fn merge(&mut self, other: &Self)

Available on Classic and ErtlMLE variants. Not available on HIP.

Serialization

fn serialize_to_bytes(&self) -> Result<Vec<u8>, RmpEncodeError>
fn deserialize_from_bytes(bytes: &[u8]) -> Result<Self, RmpDecodeError>

Examples

use asap_sketchlib::{ErtlMLE, HyperLogLog, DataInput};

let mut hll = HyperLogLog::<ErtlMLE>::default();
for i in 0..1000u64 {
    hll.insert(&DataInput::U64(i));
}
let card = hll.estimate();
assert!(card > 900);

Caveats

  • HyperLogLogHIP is not mergeable.
  • P12 variants trade accuracy for lower memory; expect slightly higher error than the default P14.

Status

Canonical cardinality implementation in this library.