Transferable Force Fields for Six Common Solvents from Liquid Density and Surface Tension Data: A Gaussian Process Optimization Framework
Authors: Montana N. Carlozo, Alexander W. Dowling, and Edward J. Maginn
ES-FFO is a repository used to rapidly calibrate the LJ parameters of electrolyte solvent forcefields given experimental data. The key feature of this work is using machine learning tools in the form of Gaussian processes (GPs) which allow us to cheaply estimate the results of a molecular simulation given temperature state points and thermophysical property data. This workflow combines the approaches of Wang et al., 2023, J. Chem. Theory Comput. and Carlozo et al., 2025, Digital Discovery to create one repository to build GP models and optimize parameters for electrolyte solvent FFs.
NOTE: We use Signac and signac flow (<https://signac.io/>) to manage the setup and execution of each workflow. These instructions assume a working knowledge of that software.
This work is currently in preparation and has not yet been published. Please cite the following when referring to this work:
Carlozo, M. N., Dowling, A. W., & Maginn, E. J.. Transferable Force Fields for Six Common Solvents from Liquid Density and Surface Tension Data: A Gaussian Process Optimization Framework. In preparation.
The repository is organized as follows:
ES-FFO/ is the top level directory. It contains:
- .gitignore prevents large files from the signac workflow and plots from being tracked by git and prevents tracking of other unimportant files.
- utils/ contains functions used by all pieces of this workflow including molecule experimental data and reference parameters
- block_averages/ contains a code for block averaging via the methods in H. Flyvbjerg and H.G. Peterson. Error estimates on averages of correlated data. J. Chem. Phys. 91:461-466, 1989.
- fffit/ contains another set of utility functions used by multiple aspects of this workflow.
- Build_GPs/ contains the workflow runs and analyzes results from applying the methods modified from those in Wang et al. 2023 to create Base FFs
- Opt_ES/ contains the workflow runs and analyzes results from applying the methods in Carlozo et al. 2025. to create GP-optimized FFs
- Opt_ES/ Also contains the molecular simulation results and analysis of the FF(s) developed in steps 5 and 6.
- hfcs-fffit.yml is the environment for running this workflow.
- submit_jobs is a shell script for submitting jobs to the cluster.
- create_analysis_figs.ipynb is a jupyter notebook for creating the final analysis figures including all figures from running the scripts in 11-17.
- make_corr_figs_all_molec.py is a python script for creating correlation figures for all molecules. Figure 12, Figure 14, and SI Figure S12 in the main text.
- make_corr_figs_one_molec.py is a python script for creating correlation figures for individual molecules. SI Figures S1-S6 and SI Figures S13-S24 in the main text.
- make_GP_vs_sim_and_sens.py is a python script for creating figures comparing GP and actual predictions and generating the results for individual molecule sensitivity analyses. Figure 9, SI Figure S11, and SI Tables S2-S13 in the main text.
- make_ift_val_figs.py is a python script for creating visualizations of the interfactial tension boxes for all ST simulations.
- make_param_comp_figs.py is a python script for creating parameter comparison figures. Figures 10-11 and SI Figures S7-S10 in the main text.
- make_pareto_comp_figs.py is a python script for visualizing differences between pareto-optimal LJ parameter sets in ST (and LD) iterations. Figure 13 in the main text.
Note that all workspace/ subdirectories below include one example signac workspace with some example input data and key simulation validation figures where appropriate.
This directory contains:
- molec_class_files/esolvs.py: A .py file which loads all electrolyte solvent data into a class
- prep_ms_data.py: Contains scripts used to prepare data for GP training and get error data
This directory contains functions used by both workflows:
- models.py: Functions related to building GP models
- pareto.py: Functions for determining pareto efficient points
- plot.py: Functions for plotting data
- signac.py: Functions for extracting signac data
- utils.py: Functions for scaling and unscaling data and shuffling/splitting GP data
This directory contains all data related to running the GP building workflow, including:
- analysis/mol/ld_iters: Results for liquid density (LD) iterations for each molecule "mol"
- analysis/mol/vle_iters: Results for surface tension (ST) iterations
- utils/*.py: Functions for analyzing the results of LD and ST iterations
- ld_iters/ : The signac project directory for running the LD iterations including project.py, templates/, and workspace/
- vle_iters/ : The signac project directory for running the ST iterations including project.py, templates/, and workspace/
- gen_lhs_samples.py: A script for generating the LHS data necessary for LD and ST iterations
- init_LD.py: A script to generate the signac project for LD iterations
- init_IFT.py: A script to generate the signac project for ST iterations
- post_ld.py: A script to analyze LD iteration data
- post_vle.py: A script to analyze ST iteration data
- get_CPU_hrs.py : A script to calculate the total CPU hours used by the workflow
This directory contains all data related to running the optimization workflow of Carlozo et al., 2025, Digital Discovery including:
- analysis/: Data from analyzing the results of the optimization workflow
- utilsOpt/: Functions for running the optimization workflow
- gemc_val_opt/: The signac project directory for running the gemc validation for GP-optimized FF including project.py, templates/, and workspace/
- gemc_val_no_opt/: The signac project directory for running the gemc validation for the Base FF including project.py, templates/, and workspace/
- ift_val_opt/: The signac project directory for running the interfacial tension validation for GP-optimized FF including project.py, templates/, and workspace/
- ift_val_no_opt/: The signac project directory for running the interfacial tension validation for the Base FF including project.py, templates/, and workspace/
- opt_at_params_new/: The signac project directory for running the optimization of FF parameters with updated GP models including project.py, templates/, and workspace/
- compress_gemc.py: A script to compress the gemc validation data
- compress_ift.py: A script to compress the interfacial tension validation data
- get_CPU_hrs.py: A script to calculate the total CPU hours used by the optimization and validation workflows
- get_GPU_hrs.py: A script to calculate the total GPU hours used by the optimization and validation workflows
- init_gemc_val.py: A script to initialize the gemc validation workflow
- init_ift_val.py: A script to initialize the interfacial tension validation workflow
- init_opt_at.py: A script to initialize the optimization workflow
- post_analysis_opt.py: A script to analyze the results of the optimization workflow
- post_analysis_val.py: A script to analyze the results of the validation workflows. Generates Figures 3-8 and the data in Tables 4-5 of the main text
This directory contains data from analyzing the results of the optimization workflow for each molecule "mol".:
- at_00/mol/ExpVal/opt_res/jac_approx: Includes jacobian approximations
- at_00/mol/ExpVal/opt_res/hess_approx: Includes hessian approximations
- at_00/mol/ExpVal/opt_res/MAPD: Includes Mean Absolute Percentage Deviation values
- at_00/mol/ExpVal/opt_res/ms_val_opt: Includes the error data and validation results for the GP-Optimized FF
- at_00/mol/ExpVal/opt_res/ms_val_no_opt: Includes the error data and validation results for the Base FF
- at_00/mol/ExpVal/opt_res/prop_pred/: Includes comparisons of GP, simulated, and experimental values for the GP-optimized and Base FFs
- at_00/mol/ExpVal/opt_res/best_per_run.csv: The best FF optimization results for each optimization run
- at_00/mol/ExpVal/opt_res/pareto_info.csv: The pareto-optimal FF optimization results for each molecule
- at_00/mol/ExpVal/opt_res/unique_best_set.csv: The unique LJ parameter sets identified for each molecule during optimization
- AT-0/ms_val_no_opt/: Base FF property prediction plots for heat of vaporization and vapor pressure (h_p_vap.pdf), surface tension (surf_tens.pdf), and density (vle.pdf)
- AT-0/ms_val_opt/: GP-optimized FF property prediction plots for heat of vaporization and vapor pressure (h_p_vap.pdf), surface tension (surf_tens.pdf), and density (vle.pdf)
- AT-0/ms_val_opt_comp/: GP-optimized FF and Base FF property prediction plots. (h_p_vap.pdf, surf_tens.pdf, vle.pdf)
- comp_err_data.csv: Data comparing the errors of different FFs
- lit_err_data.csv: Data comparing the errors of the literature FFs
- lit_ff_data.csv: Data comparing the properties of the literature FFs
- lit_ff_data_w_no_opt.csv: Data comparing the properties of the literature FFs with the Base FFs
- lit_ff_data_w_opt.csv: Data comparing the properties of the literature FFs with the GP-optimized FFs
- lit_Hvap_est_w_no_opt.csv: Data comparing the heat of vaporization of the literature FFs and Base FFs
- lit_Hvap_est_w_opt.csv: Data comparing the heat of vaporization of the literature FFs and GP-optimized FFs
This directory contains all functions related to running the optimization workflow of Carlozo et al., 2025, Digital Discovery including:
- utilsOpt/atom_type.py: Class definitions for atom typing schemes
- utilsOpt/opt_atom_types.py: Other functions necessary for optimizing FF parameters and analyzing workflow results
- utilsOpt/plot.py: Plotting functions for analyzing optimization results
- utilsOpt/signac.py: Functions for extracting signac data
To run this software, you must have access to all packages in the hfcs-fffit environment (hfcs-fffit.yml) which can be installed using the instructions in the next section.
This package has a number of requirements that can be installed in different ways. We recommend using a conda environment to manage most of the installation and dependencies. However, some items will need to be installed from source or pip.
Running the simulations will also require an installation of GROMACS.
This can be installed separately (see installation instructions
here <https://manual.gromacs.org/documentation/2021.2/install-guide/index.html>).
To run liquid density iterations, follow the following steps:
- Initialize Signac workflow
conda activate hfcs-fffit cd Build_GPs/ python init_LD.py - Run LD iterations
- Note: rm -r workspace/ signac_project_document.json signac.rc will remove everything and allow you to start fresh if you mess up
cd Build_GPs/ld_iters/
python project.py submit -o LD --bundle=12 --parallel
- Check status a few times throughout the process
python project.py status
Note: Step 2 operation LD runs multiple operations in series. Alternatively, the following can be run one at a time: 4. Alternative submission
python project.py run -o create_forcefield
python project.py submit -o create_system
python project.py run -o fix_topology
python project.py submit -o em_sim --bundle=12 --parallel
python project.py submit -o nvt_eq_sim --bundle=12 --parallel
python project.py submit -o npt_eq_sim --bundle=12 --parallel
python project.py submit -o npt_prod_sim --bundle=12 --parallel
python project.py submit -o calculate_props --bundle=12 --parallel
- Extract densities, run GP optimization and get samples for the next iteration in the Build_GPs/ directory
- Note: vle_iters/ will be populated with parameter sets automatically once the termination criteria for LD iters is satisfied
qsub -N postLD submit_job post_ld.py
To run surface tension iterations, follow the following steps:
- Initialize Signac workflow
conda activate hfcs-fffit cd Build_GPs/ python init_IFT.py - Run ST iterations
- Note: rm -r workspace/ signac_project_document.json signac.rc will remove everything and allow you to start fresh if you mess up
cd Build_GPs/vle_iters/
python project.py submit -o IFT #--bundle=2 --parallel
- Check status a few times throughout the process
python project.py status
Note: Step 2 operation IFT runs multiple operations in series. Alternatively, the following can be run one at a time: 4. Alternative submission
python project.py run -o create_forcefield
python project.py submit -o create_system
python project.py run -o fix_topology
python project.py submit -o em_sim #--bundle=2 --parallel
python project.py submit -o nvt_eq_sim #--bundle=2 --parallel
python project.py submit -o npzzat_eq_sim #--bundle=2 --parallel
python project.py submit -o npzzat_prod_sim #--bundle=2 --parallel
python project.py submit -o init_inter_eq_sim #--bundle=2 --parallel
python project.py submit -o inter_eq_sim #--bundle=2 --parallel
python project.py submit -o inter_prod_sim #--bundle=2 --parallel
python project.py submit -o calculate_props #--bundle=2 --parallel
- Extract properties, run GP optimization and get samples for the next iteration in the Build_GPs/ directory
qsub -N postVLE submit_job post_vle.py
To run generalized FF calibration, follow the following steps:
- Initialize Signac workflow
conda activate hfcs-fffit cd Opt_ES/ python init_opt_at.py - Run LD iterations
- Note: rm -r workspace/ signac_project_document.json signac.rc will remove everything and allow you to start fresh if you mess up
cd Opt_ES/opt_ff/
python project.py submit -o OptGenFF --bundle=12 --parallel
- Check status a few times throughout the process
python project.py status
Note: Step 2 operation OptGenFF multiple operations in series. Alternatively, the following can be run one at a time: 4. Alternative submission
python project.py submit -o gen_pareto_sets --bundle=12 --parallel
python project.py submit -o run_obj_alg --bundle=12 --parallel
- Extract and analyze data:
qsub -N postOptAnaly ../submit_job post_analysis_opt.py
To validate the generalized FF from above, follow the following steps:
- Initialize Signac workflow
conda activate hfcs-fffit cd Opt_ES/ - Run GEMC validation simulations (Pvap, rho_v, Hvap) - Simulatenous with Steps 5-7
- Note: rm -r workspace/ signac_project_document.json signac.rc will remove everything and allow you to start fresh if you mess up
cd Opt_ES/gemc_val_xxx/
python project.py submit -o NPT_NVT --bundle=6 --parallel
python project.py submit -o EQ --bundle=6 --parallel
python project.py submit -o PROD --bundle=6 --parallel
Note: In step 2, "xxx" represents which FF is being validated. use "no_opt" for the Base FF and "opt" for the GP-optimized FF. 3. Check status a few times throughout the process
python project.py status
Note: Step 2 operation is multiple operations in series. Alternatively, the following can be run one at a time: 4. Alternative submission
python project.py run -o create_forcefield
python project.py run -o calc_boxes
python project.py submit -o NVT_liqbox --bundle=4 --parallel
python project.py run -o extract_final_NVT_config
python project.py submit -o NPT_liqbox --bundle=4 --parallel
python project.py run -o extract_final_NPT_config
python project.py submit -o run_gemc_eq --bundle=4 --parallel
python project.py submit -o gemc_eq_restart --bundle=4 --parallel (optional)
python project.py submit -o run_gemc_prod --bundle=4 --parallel
python project.py submit -o check_prod_data --bundle=4 --parallel
python project.py submit -o calculate_props --bundle=4 --parallel
python project.py run -o del_job (optional)
- Run validation MD interface simulations (rho_l and surface tension) - Simultaneously with Steps 2-4
- Note: rm -r workspace/ signac_project_document.json signac.rc will remove everything and allow you to start fresh if you mess up
cd Opt_ES/ift_val_xxx/
python project.py submit -o IFT #--bundle=2 --parallel
Note: In step 5, "xxx" represents which FF is being validated. use "no_opt" for the Base FF and "opt" for the GP-optimized FF. 6. Check status a few times throughout the process
python project.py status
Note: Step 2 operation IFT runs multiple operations in series. Alternatively, the following can be run one at a time: 7. Alternative submission
python project.py run -o create_forcefield
python project.py submit -o create_system
python project.py run -o fix_topology
python project.py submit -o em_sim #--bundle=2 --parallel
python project.py submit -o nvt_eq_sim #--bundle=2 --parallel
python project.py submit -o npzzat_eq_sim #--bundle=2 --parallel
python project.py submit -o npzzat_prod_sim #--bundle=2 --parallel
python project.py submit -o init_inter_eq_sim #--bundle=2 --parallel
python project.py submit -o inter_eq_sim #--bundle=2 --parallel
python project.py submit -o inter_prod_sim #--bundle=2 --parallel
python project.py submit -o calculate_props #--bundle=2 --parallel
- Extract and analyze data:
cd Opt_ES/ qsub -N postOptAnaly ../submit_job post_analysis_val.py cd ../ qsub -N makeCorrAll submit_job make_corr_figs_all_molec.py qsub -N makeCorrInd submit_job make_corr_figs_one_molec.py qsub -N makeGPSimSense submit_job make_GP_vs_sim_and_sens.py qsub -N makeIFTValfigs submit_job make_ift_val_figs.py qsub -N makeParamComp submit_job make_param_comp_figs.py qsub -N makeParetoComp submit_job make_pareto_comp_figs.py
Note: The second set of jobs create final figures for the SI and main text. Alternatively, these can be created by running the jupyter notebook create_analysis_figs.ipynb.
The instructions outlined above seem to be system-dependent. In some cases, users have the following error:
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found
If you observe this, please try the following in the terminal
export LIBRARY_PATH=$CONDA_PREFIX/lib:$LIBRARY_PATH
which should fix the problem. This is not an optimal solution and is something we would like to address. We found that related projects 1, 2 have similar issues. If you are aware of a robust solution to this issue, please let us know by raising an issue or sending an email!
This research is based upon work supported by the National Science Foundation under award number EEC-2330175 for the Engineering Research Center EARTH. Computing resources were provided by the Center for Research Computing (CRC) at the University of Notre Dame. MC acknowledges support from the Graduate Assistance in Areas of National Need fellowship from the Department of Education, grant number P200A210048.
Please contact Montana Carlozo (mcarlozo@nd.edu) with any questions, suggestions, or issues.