Cleanup works + bug fixes#15
Open
xiaoruiDong wants to merge 28 commits into
Open
Conversation
This can be useful for other softwares using rdkit
PR#4827 in PYG changes how scatter is called, defaults to the dim=-2 instead of 0. This change directly calls scatter instead of global_add_pool to avoid the incompatibility.
1. Modularization 2. able to user provided CUDA version 3. Found a workaround for Mchip Mac 4. Remove the constraints for deps 5. backup the original dep constraints in environment_reproduce.yml
Originally, model is assigned to GPU whenever possible. This causes conflicts when using a model on a CPU but on a machine with GPU.
angle_mask_ref and angle_combos are predefined without awareness of where the operation will be. Add a temporary workaround.
Previously, inference's device are kind of hardcoded. This commit makes sure that device are corrected assigned, so that you can run inference on CPU or GPU on a machine with GPU or inference on CPU on a machine without GPU
Molecule CN1C2=C(C=C(C=C2)Cl)C(=NCC1=O)C3=CC=CC=C3 helps identify this problem
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Recently, I was trying to enable the GeoMol embedder in RDMC.conformer_generation on my M2 Macbook. Since I am using a completely different environment with different versions, inevitably, I ran into several issues (e.g., issue #14 and dimension mismatch errors) and tried to fix them.
Environment (most relevant dependencies)
Changes and Why in such a way
In order to understand what went wrong, I first cleaned up the featurization module (major effort) and then targeted two major causes 1.
Batch.from_data_listperforms differently than expected. 2. there are changes in the newer version of PyG.For (1), before the discussion, I need to say that I only tried
Batch.from_data_listto form a batch of data, and all discussions are based on that. In my trials, I first featurize molecules intotorch_geometric.Data(data1, data2, ...) and then useBatch.from_data_list([data1, data2, ...])to naively construct a Batch (usually termeddatain this repo). According to the functionutils.get_neighbor_ids,data.neighborsis expected to be alistofdict. But, I found whatBatch.from_data_listdoes is taking the keys in thedata1.neighborsand extending the values according todata2.neighbors, data3.neighbors, ..... So it actually ends up with a singledictobject. I tried to identify how such differences are introduced but failed, and I ended up creating a customizedfrom_data_listin commit "f341404".To avoid introducing new bugs, I used version>2 (therefore, won't influence version < 2, which is recommended) identifying whether a special treatment to
data.neighborsis needed. However, I believe the action in theifcan also be applied to version < 1 without issues.For (2), I found a dimension mismatch when calling (
h_mol = self.h_mol_mlp(global_add_pool(x2, batch))in model.py line 248). It is due to PR#4827 in PyG changing howscatteris called inglobal_add_pool. In new implementations,dimnow defaults to -2 instead of 0. I simply avoid usingglobal_add_poolbut directly callscatterto avoid the issue despite the versions.Other comments
I modified featurization.py a lot, mostly to increase readability for myself. I also changed module names and added a few miscellaneous files for my convenience in importing modules. I can create another PR that only contains the change (1) and (2) I highlighted above if you think it is more appropriate.