Code for the study Predicting Hydrogen Storage in MOFs: Representation Matters.
H2MOF benchmarks geometric descriptors, energy voxels, and molecular point clouds for predicting hydrogen adsorption in metal-organic frameworks.
Requires Python >= 3.10.
pip install -e .By default, the pipeline looks for these files at the repository root:
hmof_data.csv: labels indexed byMOF_nametrain.json,validation.json,test.json: dataset splitshmof_voxels/<MOF_name>.npy: voxel representationshmof_point_clouds/<MOF_name>.npy: point-cloud representations
These paths can be overridden through CLI flags or environment variables where supported.
- Build labels from MOFX-DB JSON files:
h2mof labels --input-dir hmof_json --output-csv hmof_data.csvInput files are expected as hMOF-*.json and optionally tobmof-*.json.
- Generate representation assets:
# Energy voxels
moxel hmof_cif hmof_voxels --grid_size 25 --cutoff 10 --epsilon 50 --sigma 2.5
# Point clouds
aidsorb create hmof_cif --outname hmof_point_clouds- Create train/validation/test splits:
aidsorb prepare hmof_point_clouds --split_ratio "(0.8, 0.1, 0.1)"- Run the benchmark suite:
h2mof benchUse h2mof <command> --help for full details.
h2mof labels
h2mof bench
h2mof curves
h2mof eval-externalh2mof bench: trains and evaluates the selected methods on the default splith2mof curves: runs the learning-curve experimentsh2mof eval-external: evaluates trained methods on an external dataset
All pipeline commands write to outputs/ by default, or to H2MOF_OUTPUTS_DIR if set.
outputs/contracts/artifact_schema_version.jsonoutputs/models/<method>/<target>/: trained models, predictions, metrics, and run metadataoutputs/benchmarks/summary_metrics.csvoutputs/curves/<target>/: learning-curve tables and summariesoutputs/external/<dataset>/<method>/<target>/: external-evaluation predictions and metricsoutputs/external/<dataset>/summary/summary_metrics.csv
H2MOF_ROOT: override the project rootH2MOF_OUTPUTS_DIR: override the output directoryH2MOF_LOG_LEVEL: set console logging level, defaultINFOH2MOF_RICH: set to0to disable rich console outputH2MOF_NUM_WORKERS: DataLoader workers, default8
This project is released under the MIT License. See LICENSE.
