Release v0.5.1 by scal444 · Pull Request #215 · NVIDIA-BioNeMo/nvMolKit

scal444 · 2026-06-24T19:29:27Z

Cherry-picked a bunch from main, and then additional formatting and version change commits

The pip-build path was globbing every .so in rdkit.libs/ into RDKit_LIBS and every libboost_* into Boost_LIBRARIES, then linking each into every nvmolkit Python module. That dragged libcairo and libquadmath onto each module's NEEDED list. libcairo in turn NEEDEDs libXrender/libX11/libXext; those are on the manylinux lib_whitelist so rdkit-pypi's auditwheel pass legally left them external, but the nvidia/cuda runtime container we test in doesn't ship them, so `import nvmolkit.fingerprints` failed at load. Narrow RDKit_LIBS to the 16 components nvmolkit actually uses (mirroring the conda path's explicit list) and Boost_LIBRARIES to the same 4 (boost serialization / iostreams / python<ver> / numpy<ver>). Also patchelf with --force-rpath so the entry-point modules get DT_RPATH instead of DT_RUNPATH. The libs inside rdkit.libs/ have no rpath of their own and rely on RPATH inheritance to resolve second-level deps; rdkit's own python bindings do the same. Drop the unused ${RDKit_LIBS} link from _arrayHelpers, which touches zero RDKit symbols. Verified the rdkit==2026.3.1 py3.12 wheel built with this change loads and passes the full pytest suite (390 passed, 10 long deselected) on an H200 in the manylinux+CUDA container with no system X/font libs.

Autotune now steps in 64 element increments by default, cutting down on the search space. CPU space is now physical core limited by default. (cherry picked from commit d0bce61)

(cherry picked from commit 3537c4c)

…IDIA-BioNeMo#194) Giant clustering tasks were hitting signed int overflow, moved to a 2D grid to expand limits by 5 OOM (cherry picked from commit b19bf4f)

Zero out a buffer that needed to be zero'd out every thread iteration, not just at the beginning of the batch. Wrote a reliable reproducer. (cherry picked from commit 4f897f3)

…VIDIA-BioNeMo#183) rdkit/rdkit#9298 (merged into RDKit 2026.03) fixed a bug where a negative energy value caused max(energy * gradScale, 1.0) to clamp to 1, artificially tightening the gradient-tolerance convergence test mid-minimisation. Force fields with stabilising electrostatic or dispersion terms (MMFF94, UFF) can produce negative intermediate energies, so this affected real workloads. Apply the same fix in both nvMolKit BFGS paths, gated on the linked RDKit version so that behaviour is unchanged when built against older RDKit: - src/minimizer/bfgs_minimize.cu (updateDGradKernel, batched path) - src/minimizer/bfgs_minimize_permol_kernels.cu (updateDGrad, per-mol path) Mirrors the version-conditional pattern already used for kRdkitHasGradScaleFix in scaleGradKernel. Signed-off-by: Clay Moore <claytonwaynemoore@gmail.com> (cherry picked from commit 7ad16cf)

…VIDIA-BioNeMo#205) GH issue 202: a non-negligible fraction of molecules embedded to zero conformers. RDKIT_NEW_FLAG_API was a constexpr bool but consumed with #if, which the preprocessor evaluates as 0, and etkdg_stages and rdkit_dist_geom_flattened never linked nvmolkit_versions so versions.h was not even visible. The gate was therefore always off. findChiralSets never set d_structureFlags and loadChiralDataset never read it, so fused-small-ring tetrahedral centers used volScale 1.0 instead of the 0.25 RDKit applies and were deterministically rejected at the tetrahedral check. The same gate guards the constrained tight-bounds branch in addLongRangeDistanceConstraints, which assigns to l and u; those are now non-const so the branch compiles. (cherry picked from commit 525286a)

…cessing (NVIDIA-BioNeMo#204) buildQueryBatchParallel ran addQueryToBatch inside an OpenMP parallel region. The validation error for fragment (disconnected) queries escaped the region and called std::terminate whenever more than one preprocessing thread was used, so the error only surfaced as a RuntimeError on the single-threaded path. Capture the first exception with OpenMPExceptionRegistry and rethrow once the region joins. (cherry picked from commit 6f967ed)

…A-BioNeMo#212) The _mmffOptimization and _uffOptimization bindings declared BatchHardwareOptions default argument values. Boost.Python converts default values to Python when a function is registered, which needs a to-Python converter that only _embedMolecules registers, so importing either module before nvmolkit.types raised "No to_python converter found for nvMolKit::BatchHardwareOptions". The Python wrappers already supply every argument, so the native defaults are dropped, and a regression test imports each module first in a fresh interpreter. (cherry picked from commit 7dc0a38)

* Normalize array inputs for clustering APIs * Simplify CUDA tensor input normalization (cherry picked from commit b6f3f42)

This was an ABI issue - a transitive flag is important for the structure of RDKit data that wasn't being included. It's an RDKit issue but we can fix it by manually including the flag we need to avoid the ABI mismatch. Was not present in conda-forge builds by luck. (cherry picked from commit ad4124e)

Backport the TFD correctness subset of ea09e2d without the unrelated performance changes.

Backport the pyproject.toml metadata from 8767035 and the PyPI-facing URL updates from ec24d54 without README or Sphinx documentation changes.

evasnow1992

Verified that the set of commits is complete and self-contained, and confirmed that the release notes are clear and comprehensive. The changes look good to me. Thank you for packaging this into a release.

scal444 and others added 17 commits May 15, 2026 13:22

Improve autotune batch size and CPU count scanning (NVIDIA-BioNeMo#179)

3786668

Autotune now steps in 64 element increments by default, cutting down on the search space. CPU space is now physical core limited by default. (cherry picked from commit d0bce61)

Fix GH issue 195 (NVIDIA-BioNeMo#196)

67d8547

(cherry picked from commit 3537c4c)

Fix grid dimension overflow in fused Butina neighbor count kernel (NV…

529dd26

…IDIA-BioNeMo#194) Giant clustering tasks were hitting signed int overflow, moved to a 2D grid to expand limits by 5 OOM (cherry picked from commit b19bf4f)

Fix gh 197 (NVIDIA-BioNeMo#198)

3b860f8

Zero out a buffer that needed to be zero'd out every thread iteration, not just at the beginning of the batch. Wrote a reliable reproducer. (cherry picked from commit 4f897f3)

Unify array inputs for several functions (NVIDIA-BioNeMo#207)

ebc8fff

* Normalize array inputs for clustering APIs * Simplify CUDA tensor input normalization (cherry picked from commit b6f3f42)

Fix TFD pair decoding and empty ring torsions

d265234

Backport the TFD correctness subset of ea09e2d without the unrelated performance changes.

Backport PyPI package metadata fixes

d75ad0a

Backport the pyproject.toml metadata from 8767035 and the PyPI-facing URL updates from ec24d54 without README or Sphinx documentation changes.

Prepare v0.5.1 release

49d31f8

Update v0.5.1 release notes

c9fbb2c

Refine v0.5.1 release note wording

9f49b86

Link v0.5.1 release notes to issues and PRs

9508390

scal444 requested a review from evasnow1992 June 24, 2026 19:29

evasnow1992 approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v0.5.1#215

Release v0.5.1#215
scal444 wants to merge 17 commits into
NVIDIA-BioNeMo:release_0_5from
scal444:cand_v_0_5_1

scal444 commented Jun 24, 2026

Uh oh!

evasnow1992 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

scal444 commented Jun 24, 2026

Uh oh!

evasnow1992 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants