Releases: asad/SMSD
SMSD Pro 7.1.1
SMSD Pro 7.1.1
Bug-fix patch on top of v7.1.0. No new features, no public API breakage.
Fixed
- Cross-language ECFP / FCFP fingerprint parity between Java, C++, and
Python (two long-standing Java drifts at radius ≥ 1). - Java canonical SMILES writer: bond symbol for aromatic-adjacent
single bonds and implicit H count inside stereo brackets. - Python
smsd.canonical_smiles(smi)/smsd.to_smiles(smi)raised
TypeErroron string input. Both now acceptstrorMolGraph. MatchResult.overlapCoefficientreturned the wrong similarity
metric. BothMatchResult.overlapandMatchResult.overlapCoefficient
now return Szymkiewicz-Simpson overlap as documented; the new
MatchResult.tanimotoattribute exposes the Jaccard value.- Canonical SMILES writer now emits
[nH]for pyrrole-type aromatic
nitrogen, so output kekulizes cleanly in downstream readers. - FP-level
smsd.overlapCoefficient/count_overlapCoefficient
camelCase aliases now return Simpson overlap as documented.
Verified
- Python pytest: 603 passed, 6 skipped, 0 failures.
- Java JUnit: 581 tests, 0 failures, 0 errors.
Apache 2.0 — see NOTICE.
SMSD Pro 7.1.0
SMSD Pro 7.1.0
Copyright (c) 2018-2026 Syed Asad Rahman — BioInception PVT LTD
What's New
Comprehensive test suite — 597 tests across 9 test files covering MCS, substructure search, fingerprints, similarity metrics, file I/O, scaffolds, coordinate transforms, depiction, SMARTS, and batch operations.
Bug Fixes
tanimoto_coefficient/overlap_coefficientnow correctly handles sparse count input fromcircular_fingerprint_counts()counts_to_array()now accepts bothdictandlist[tuple]formatsfingerprint()no longer raises ValueError forkind='ecfp','fcfp', or'torsion'- Fingerprint radius capped at molecule size — prevents bit-vector saturation for out-of-range radius values
- Documentation API names verified and corrected across all public docs
Install
Java (Maven):
```xml
com.bioinceptionlabs
smsd
7.1.0
```
Python:
```bash
pip install smsd==7.1.0
```
JAR:
```bash
java -jar smsd-7.1.0-jar-with-dependencies.jar --help
```
Platforms
- Java 25+, C++17, Python 3.10–3.13
- macOS (arm64, x86_64), Linux (x86_64, aarch64), Windows (AMD64)
- GPU: Metal (Apple Silicon), CUDA (Volta+)
Apache-2.0 License
SMSD Pro 7.0.0
SMSD Pro 7.0.0
Syed Asad Rahman — BioInception PVT LTD
Major release: unified Python and Java API, clean break from legacy aliases,
full Java parity for convenience methods, all tests green.
What's New
Unified Python API — clean break
Two entry points replace all previous aliases:
import smsd
# MCS — returns dict (single) or list[dict] (multiple)
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1")
mcs = smsd.find_mcs(mol1, mol2, max_results=5)
# Substructure — returns dict (single) or list[dict] (multiple)
hit = smsd.find_substructure("c1ccccc1", "c1ccc(O)cc1")
hit = smsd.find_substructure(query, target, max_results=3)
# Boolean convenience check
if smsd.is_substructure(query, target):
print("Query is a substructure of target")Both functions accept SMILES strings, MolGraph objects, or RDKit Mol objects.
Removed legacy aliases
The following names are no longer available in v7.0.0:
| Removed | Replacement |
|---|---|
smsd.mcs() |
smsd.find_mcs() |
smsd.substructure_search() |
smsd.find_substructure() |
smsd.all_mcs() |
smsd.find_mcs(mol1, mol2, max_results=N) |
smsd.overlapCoefficient() |
smsd.overlap_coefficient() |
smsd.tanimoto() |
smsd.tanimoto_coefficient() |
smsd.count_overlap_coefficient() |
(use C++ binding directly) |
smsd.count_tanimoto() |
(use C++ binding directly) |
Java parity — unified convenience methods
import com.bioinception.smsd.core.SearchEngine;
// MCS with default options
Map<Integer, Integer> mcs = SearchEngine.findMCS(g1, g2);
// Substructure with default options
Map<Integer, Integer> hit = SearchEngine.findSubstructure(query, target);
// With custom options and timeout
Map<Integer, Integer> hit = SearchEngine.findSubstructure(query, target, chemOpts, 10_000L);All overloads work with both MolGraph and CDK IAtomContainer inputs.
Internal improvements
- Raw C++ bindings renamed to
_native_find_mcs,_native_find_all_mcs,
_native_is_substructure,_native_find_substructure— clearly internal. - All internal calls use the unified API (
find_mcs,find_substructure). mcs_from_smiles(),mcs_rdkit(),substructure_rdkit(), and
depict_mcs()all updated to the new names.
Migration Guide
Python
- mapping = smsd.mcs(mol1, mol2)
+ mapping = smsd.find_mcs(mol1, mol2)
- mapping = smsd.substructure_search(query, target)
+ mapping = smsd.find_substructure(query, target)
- results = smsd.all_mcs(mol1, mol2, max_results=5)
+ results = smsd.find_mcs(mol1, mol2, max_results=5)Java
No breaking changes. New convenience methods added; existing methods unchanged.
Benchmark
Dalke NN dataset (1,000 high-similarity ChEMBL pairs):
| Metric | SMSD Pro 7.0.0 | RDKit FindMCS 2026.03 |
|---|---|---|
| Total time | 40 s | 213 s |
| Median time | 0.6 ms | 0.4 ms |
| Mean MCS size | 25.8 atoms | 25.0 atoms |
| Timeouts | 0 | 8 |
| Larger-MCS wins | 211 (21 %) | 29 (3 %) |
5x faster overall, finds larger MCS 7x more often, zero timeouts.
Test Status
- Python: 310 passed, 0 failed
- Java: BUILD SUCCESS (581+ tests)
Compatibility
- Java 25+, C++17, Python 3.10-3.13
- GPU: Metal (Apple Silicon), CUDA (Volta+)
- Platforms: macOS (arm64, x86_64), Linux (x86_64, aarch64), Windows (AMD64)
Copyright
Copyright (c) 2018-2026 BioInception PVT LTD
Algorithm copyright (c) 2009-2026 Syed Asad Rahman
Licensed under Apache License 2.0. See NOTICE for details.
SMSD Pro 6.12.2
SMSD Pro 6.12.2
Syed Asad Rahman — BioInception PVT LTD
- MCS quality fix: ring-constrained recovery when loose defaults cause sub-optimal mappings on steroids, beta-blockers, and macrolides
- C++ coverage engine: zero losses on Dalke benchmark, 42-90x faster than RDKit
- Python MCS: thin C++ wrapper, no Python orchestration overhead
- VF2PP budget cap: prevents seed generation from draining the time budget
- Test fixes: assertions updated for 6.12.1 default settings
581 Java tests, 310 Python tests — all passing.
Drop-in replacement for 6.12.1.
Install
Python — pip install smsd
Java — Maven Central
Docker — docker pull ghcr.io/asad/smsd:latest
SMSD Pro 6.12.1
SMSD Pro 6.12.1
Syed Asad Rahman — BioInception PVT LTD
Defaults aligned with RDKit FMCS for fair benchmarking. Lightweight
coverage-driven MCS engine as default. Full Java-C++-Python parity.
What changed
RDKit-compatible defaults
ChemOptions now defaults to ringMatchesRingOnly=false and
matchFormalCharge=false, matching RDKit FindMCS out of the box.
Named profiles available for stricter settings:
default— RDKit-compatible (new)strict— ring=ring, charge match, strict bond orderpharma— drug discovery (ring, charge, complete rings)reaction— relaxed for AAM workflowscompat-fmcs— explicit RDKit FMCS parity
Lightweight MCS engine as default
smsd.mcs() now routes through the coverage-driven funnel first
(greedy, substructure, seed-extend, BK clique, McGregor) with
LFUB early termination. Falls back to native C++ pipeline on error.
Dalke NN benchmark (100 pairs, same settings):
85x faster than RDKit FMCS. 28 wins, 0 losses.
Fully configurable mcs() API
All matching flags settable directly — no need to construct
ChemOptions for common use cases:
mapping = smsd.mcs(mol1, mol2,
ring_matches_ring_only=False,
match_bond_order="loose",
max_stage=1)
maxStage pipeline control
Wired into both Java and C++ findMCSImpl with 4 stage gates:
- Stage 0: greedy only (sub-millisecond)
- Stage 1: + substructure + seed-extend (reaction mapping)
- Stage 2: + McSplit
- Stage 3: + Bron-Kerbosch
- Stage 5: full pipeline with extra seeds
Java parity
SmallExactMCSExplorer, FixedSizeBondMaximizer, SigKey ring-system
equivalence in BK, global reaction deadline, TargetCorpus,
screenAndMatch, overlapCoefficient, FP quality analysis, fluent
MCSOptions, boolean[] BK triedClasses, numRings, ensureRingSystems.
C++ optimisations
SMSD_LIKELY branch hints on bondOrder/bondInRing/bondAromatic.
Packed bond matrix (3 matrices to 1 uint8). sm_75 CUDA for T4.
Ring system accessors exposed in Python bindings.
Python fixes
Fixed overlapCoefficient formula (was Tanimoto, now correct
intersection/min). Added tanimoto_coefficient as separate function.
Fixed _ensure_native NameError in depiction. Suppressed RDKit
kekulisation warnings in lightweight engine.
Compatibility
- Java 25+, C++17, Python 3.10+
- GPU: Metal (Apple Silicon), CUDA (Volta+)
- Platforms: macOS (arm64, x86_64), Linux (x86_64, aarch64), Windows (AMD64)
Copyright
Copyright (c) 2018-2026 BioInception PVT LTD
Algorithm copyright (c) 2009-2026 Syed Asad Rahman
SMSD Pro 6.12.0
SMSD Pro 6.12.0
Syed Asad Rahman — BioInception PVT LTD
This release adds a lightweight clique-based MCS solver, standalone
fingerprint modules, and exposes the full set of search options to
Python users.
What changed
Clique solver
New header-only maximum clique finder (clique_solver.hpp) for
workflows where chemistry stays in Python/RDKit and only the
graph search runs in C++. Includes Bron-Kerbosch with Tomita
pivoting, k-core pruning, greedy seed, and McGregor bond-grow
extension. Portable across Mac/Linux/Windows (uses bitops.hpp
wrappers, no raw compiler builtins).
Python API: find_mcs_clique, match_substructure,
match_substructure_from_elements, score_mapping.
Lightweight MCS engine
smsd.mcs_engine provides a coverage-driven funnel that accepts
SMILES strings, MolGraph objects, or RDKit Mol objects directly:
from smsd.mcs_engine import find_mcs_lightweight
result = find_mcs_lightweight("c1ccc(O)cc1", "c1ccc(N)cc1")Stages escalate automatically (greedy, substructure, seed-extend,
BK clique, McGregor) and stop as soon as the label-frequency upper
bound is reached. The C++ clique solver handles the heavy lifting.
SmallExactMCSExplorer
Exact branch-and-bound MCS for small molecule pairs (up to 20×40
atoms). Now wired into the native findMCS pipeline for
disconnected MCS mode, giving deterministic results on hard
cases instead of relying on heuristic early exits.
Fingerprint modules
Standalone fp/ headers separated from batch.hpp:
fp/mol/circular.hpp— ECFP / FCFP (Morgan)fp/mol/path.hpp— path-based fingerprintsfp/mol/pharmacophore.hpp— pharmacophore featuresfp/mol/torsion.hpp— topological torsionfp/mol/mcs_fp.hpp— MCS-aware path fingerprintsfp/similarity.hpp— Tanimoto, Dice, cosine, Soergel, subsetfp/common.hpp,fp/format.hpp— shared utilities
Support headers
hungarian.hpp— O(n³) optimal assignmentperiodic_table.hpp— element data and valence tablesbond_energies.hpp— bond dissociation energiesscaffold_library.hpp— Murcko scaffold extractioncolor_palette.hpp— Jmol/CPK element colorsdepictor.hpp— publication-quality SVG rendering engine
Full options exposure
All C++ search options now accessible from Python:
MCSOptions (18 fields): max_stage, seed_neighborhood_radius,
seed_max_anchors, use_two_hop_nlf_in_extension,
use_three_hop_nlf_in_extension added to the existing set.
ChemOptions (19 fields): aromaticity_model, pH,
matcher_engine, induced, use_two_hop_nlf, use_three_hop_nlf,
use_bit_parallel_feasibility added.
Enums (6): AromaticityMode, AromaticityModel, BondOrderMode,
RingFusionMode, MatcherEngine, Solvent — all accessible as
smsd.MatcherEngine.VF2PP etc.
MolGraph.num_rings() added for SSSR ring count.
Global reaction deadline
global_deadline namespace in VF2++ enforces a single wall-clock
deadline across all MCS/substructure calls within a pipeline.
TimeBudget checks both local and global deadlines. Timeout
frequency increased (check every 256 calls instead of 1024).
FixedSizeBondMaximizer
Maximizes bond count for fixed-size atom mappings on small pairs.
Integrated into runValidatedMcsDirection for automatic bond
refinement when MCS matches the smaller molecule entirely.
Cross-platform
- MSVC: clique solver uses portable
smsd::popcount64/
smsd::ctz64(no raw__builtin_*). CMake adds/W4 /EHsc. - ARM NEON: fused bitset ops dispatch automatically on Apple Silicon.
- CI wheels: Linux x86_64 + aarch64, macOS arm64 + x86_64,
Windows AMD64. OpenMP installed in build containers. macOS
wheels bundle libomp via delocate.
Includes 6.11.2 fixes
Memory leak fix (WeakHashMap), BK color-bound overflow,
CIP Rule 3 (Z > E), thread-safety on lazy MolGraph fields,
tautomer weight corrections, Se/I support, SAH SMILES fix,
reaction-aware charge relaxation, Mcs→MCS rename,
tanimoto→overlapCoefficient.
Compatibility
- Java 25+, C++17, Python 3.10+
- GPU: Metal (Apple Silicon), CUDA (Volta+)
- Platforms: macOS (arm64, x86_64), Linux (x86_64, aarch64), Windows (AMD64)
Copyright
Copyright (c) 2018-2026 BioInception PVT LTD
Algorithm copyright (c) 2009-2026 Syed Asad Rahman
SMSD v6.11.1 — Fingerprint Correctness & Thread Safety
Bug Fixes
- ECFP initial invariants — added missing Rogers & Hahn (2010) invariants #3 (bond order sum / valence) and #4 (atomic mass number) to both binary and count ECFP in C++ and Java. Fingerprints now encode the full 7-invariant set per the original paper.
- Path fingerprint canonical hash — replaced double-bit-setting (forward + reverse) with
min(fwd, rev)single canonical hash, correcting bit-density inflation. - FCFP pyrrole-N misclassification — aromatic nitrogen acceptor classification now uses direct hydrogen count instead of bond-sum heuristic, fixing incorrect pyridine-N classification as non-acceptor (pyridine N has a free lone pair; pyrrole N does not).
- Thread safety —
prewarmGraph()now callsensurePatternFP()andgetPharmacophoreFeatures()before OpenMP parallel regions, preventing data races on lazy-init mutable caches. - Dead code removal — removed unused
seenHashesset from binary ECFP; removed disabled tautomer rules T18 (sulfoxide S=O) and T20 (nitrile/isonitrile).
Java 25 Modernisation
- Records:
SubstructureStats,SubstructureResult,McsResult,Node,TemplateEntry,CanonResult,ScoredCandidate,Query— 8 inner classes converted to records, eliminating 134 lines of boilerplate. - Arrow switches: 6 switch statements in
SMSDcliandReactionAwareScorerconverted to arrow expressions. - Unnamed variables:
catch (Exception _)replacescatch (Exception ignored)across 5 sites. - Pattern matching instanceof: Stereo element dispatch in
MolGraphuses pattern matching. - Renames:
McsPostFilter→MCSPostFilter,CipAssigner→CIPAssigner(standard IUPAC acronym capitalisation). - Build: JDK 25 across Maven, CI, Dockerfile, and native installer workflow.
Note
ECFP fingerprint values will differ from v6.11.0 due to the added invariants. This is a correctness improvement aligned with the Rogers & Hahn 2010 specification.
Full Changelog: v6.11.0...v6.11.1
Apache-2.0. Copyright (c) 2018-2026 BioInception PVT LTD.
SMSD Pro v6.11.0 — Performance, Precision, and Depiction
Highlights
SMSD Pro 6.11.0 delivers three major improvements:
-
Core engine hardening — cache-optimal data structures and pre-indexed candidate lookup reduce MCS search time by 20-40% on large molecule pairs.
-
Publication-quality SVG depiction — zero-dependency renderer conforming to ACS 1996 standard (same specification used by Nature, Science, JACS, and Springer journals). Renders molecules, MCS comparisons, and substructure highlights directly from SMILES or MolGraph with no external tools required.
-
Comprehensive layout engine — 8-phase 2D pipeline, distance geometry 3D, 40+ pharmaceutical scaffold templates, full coordinate transform suite.
Core Engine
- Converted all hot-path
vector<bool>tovector<uint8_t>across McGregor DFS, Bron-Kerbosch partition bound, seed-extend, frontier expansion, k-core pruning, and tree detection. Yields 15-25% cache performance improvement. - Pre-indexed candidate sets in McGregor DFS using precomputed
compatTargets_[]. Eliminates O(n²) linear scan per frontier atom. - CTZ-based fingerprint bit extraction replaces sequential scan.
Depiction Engine (New)
SVG renderer produces journal-ready molecular structure diagrams:
- ACS 1996 proportions — all dimensions auto-scale from a single reference value
- Skeletal formula — carbon suppressed, heteroatom labels with H-count subscripts and charge superscripts
- Bond rendering — single, double (asymmetric toward ring interior), triple, wedge up (solid), wedge down (dashed stripes)
- MCS highlighting — green-filled circles, bold bonds, blue mapping numbers
- Side-by-side pair rendering with bidirectional arrow separator
- Jmol/CPK element colors — N=blue, O=red, S=amber, P=orange, etc.
- Full customization via DepictOptions
Layout Engine
- 8-phase 2D pipeline: template match → ring-first → chain zig-zag → force-directed → overlap resolution → crossing reduction → canonical orientation → bond-length normalisation
- Distance geometry 3D with power iteration eigendecomposition
- 40+ scaffold templates including pharmaceutical scaffolds, PAH, spiro, and bridged systems
- Full coordinate transform suite: translate, rotate, scale, mirror, center, align
Python Bindings
35+ new functions exposed with GIL release for thread safety:
depict_svg(), depict_pair(), depict_mapping(), save_svg(), find_nmcs(), find_scaffold_mcs(), validate_mapping(), decompose_rgroups(), generate_coords_2d(), generate_coords_3d(), and more.
Tests
1,512 tests passed across all platforms:
- C++ core: 114 | C++ layout: 42 | C++ CIP: 42 | C++ parser: 542
- Python: 170 | Java: 602
Compatibility
- Fully backward-compatible with v6.10.x
- No API breaking changes
- Requires C++17, Java 11+, Python 3.9+
SMSD Pro by BioInception PVT LTD. Algorithm Copyright 2009-2026 Syed Asad Rahman. Apache-2.0.
v6.10.2
SMSD Pro v6.10.2 — correctness release.
- Fixed MCS connectivity filter to enforce common-bond reachability in non-induced mode
- Added GOLDEN_843 regression tests (Python and Java)
- Version bump across all artifacts
Java 21+ / C++17 / Python 3.9+. Apache-2.0 — see NOTICE for attribution.
v6.10.1
SMSD Pro v6.10.1 — stability and correctness release.
- Hardened MCS mapping repair pipeline (iterative bounded-loop, no unbounded recursion)
- Deterministic test suite (structural invariants, not wall-clock time)
- CI/CD reliability fixes
Apache-2.0 — BioInception PVT LTD.