PySIDT is a Python package for training and running inference on subgraph isomorphic decision trees (SIDTs) for molecular property prediction as described in Johnson et al. 2025.
SIDTs are graph-based decision trees made of nodes associated with molecular substructures. Inference occurs by descending target molecular structures down the decision tree to nodes with matching subgraph isomorphic substructures and making predictions based on the final (most specific) node matched. SIDTs can perform significantly better on smaller datasets (<10,000 datapoints) than deep neural network based approaches. Being trees of molecular substructures, SIDTs are inherently readable and easy to visualize, making them easy to analyze. They are also straightforward to extend and retrain, facilitate uncertainty estimation, and enable integration of expert knowledge.
The SIDT technique was originally developed in Johnson and Green 2024. This implementation incorporates uncertainty prepruning, as detailed in Pang et al. 2024.
Documentation for PySIDT is available here.
- Install PySIDT from source
git clone https://github.com/zadorlab/PySIDT.gitcd PySIDTconda env create -f environment.ymlconda activate pysidt_envpip install -e .
- Install molecule from source
git clone https://github.com/ReactionMechanismGenerator/molecule.gitcd moleculeconda activate pysidt_envmakepip install -e .
