feat: expose CondensedTree via cluster_with_tree() + opt-in serde derives#46
Open
0Xjuanca wants to merge 1 commit into
Open
Conversation
…ives Adds a public API for accessing the condensed cluster tree alongside an opt-in `serde` feature gate, enabling custom inference on new points (approximate_predict-style) in downstream applications. * `Hdbscan::cluster_with_tree() -> Result<(Vec<i32>, CondensedTree<T>), HdbscanError>` is a new public method that returns both cluster labels and the condensed tree used internally to derive them. Mirrors the semantics of `cluster()` exactly; `cluster()` is refactored to delegate to a private internal helper, preserving its public signature unchanged. * `CondensedNode<T>` and `CondensedTree<T>` alias elevated from `pub(crate)` to `pub`, with `CondensedNode<T>` marked `#[non_exhaustive]` for forward compatibility. * New `serde` feature gate provides opt-in `Serialize` / `Deserialize` derives on `CondensedNode<T>` (off by default; no new deps pulled in unless enabled). * `#[cfg(all(test, feature = "serde"))]` bincode roundtrip test on `Vec<CondensedNode<f64>>` verifies the generic-bound serde derives work for `T: Float + Serialize`. Default features and `cluster()`/`cluster_par()` behaviour are unchanged.
Owner
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Small additive change that exposes the condensed cluster tree as a public output of clustering, alongside opt-in serde derives. Default behaviour is unchanged; existing public API is preserved exactly.
Use case
I'm using
hdbscanin a BERTopic-style topic-modeling pipeline that needs to assign new (previously-unseen) document embeddings to existing clusters without re-running the full algorithm. This is the same problem Python'shdbscan.approximate_predictsolves: it walks the condensed tree from a new point's nearest training neighbour to find the smallest enclosing cluster.For a Rust implementation, I need:
condense_treeis private andCondensedNode<T>ispub(crate)).Changes
Hdbscan::cluster_with_tree() -> Result<(Vec<i32>, CondensedTree<T>), HdbscanError>returns both labels and the condensed tree from a single fit pass.cluster()is refactored to delegate to a privatecluster_internal()helper shared withcluster_with_tree(); publiccluster()signature is preserved exactly — no behaviour change for existing callers.CondensedNode<T>(indata_wrappers.rs) and theCondensedTree<T>type alias (inhdbscan.rs) elevated frompub(crate)topub.CondensedNode<T>marked#[non_exhaustive]so additional fields can be added in future revisions without breaking external consumers.serdefeature gate: opt-in serde derives onCondensedNode<T>. Off by default — no new dependencies pulled in. Downstream users opt in viafeatures = ["serde"].#[cfg(all(test, feature = "serde"))]bincode roundtrip onVec<CondensedNode<f64>>verifies the generic-bound serde derives work forT: Float + Serialize.CondensedNodeandCondensedTreenow exposed from the crate root.Compatibility
serdeis enabled.cluster()callers: unchanged. Signature preserved exactly; refactor is internal-only.cluster_par()callers: untouched. Symmetric expansion tocluster_par_with_tree()is straightforward but kept out of scope here for minimum-PR-surface; happy to land separately if you prefer.LOC delta
~30 LOC across 4 files.
Test plan
cargo test— default features: 14 integration tests + 5 doc-tests, all passcargo test --features serde— serial + serde: existing tests + new bincode roundtrip, all passcargo build --features serde— clean compile, no warningsHappy to iterate on naming (
cluster_with_treevs alternatives) or scope (e.g., add the parallel symmetry now if you'd prefer one PR).