Skip to content

jmill1125/binding_free_energy

 
 

Repository files navigation

Binding Free Energy Workflow

This repository provides a complete pipeline for calculating the binding free energy of molecular systems using alchemical free energy methods. The workflow combines molecular dynamics (MD) simulations, free energy perturbation (FEP), and Bennett acceptance ratio (BAR) analysis to compute binding affinities for host-guest complexes.


Overview

The pipeline implements an annihilation method to calculate binding free energies:

  1. Guest in solution: Annihilate the guest's interactions in a solvated environment
  2. Host-guest complex: Annihilate the guest's interactions within the host-guest complex
  3. Binding free energy: Calculate the difference between these two processes

The workflow uses OpenMM for MD simulations with alchemical transformations, applying a lambda schedule to annihilate (turn off) van der Waals and electrostatic interactions. Free energy differences are calculated using the Bennett Acceptance Ratio (BAR) method via pymbar.


Host Assumption and Limitations

This project currently assumes the host is hydroxypropyl beta cyclodextrin (HP-BCD).

  • The repository includes host files in HP-BCD/ and the workflow explicitly loads hp-bcd.pdb and hp-bcd.xml.
  • Defaults and templates are tailored to HP-BCD, for example:
    • Workflow.py uses hp-bcd.xml during solvation/setup and sets Host restraint Group 1 to a curated HP-BCD index list by default (see Workflow Automatic Selections and Defaults).
    • Prepare.py copies hp-bcd.pdb, hp-bcd.xml, and related files into template directories.
    • Docked complex examples and paths use HP-BCD naming conventions.

Using a different host requires programmatic changes to this codebase (not just swapping files). At minimum you will need to:

  • Replace HP-BCD-specific file paths and filenames in Workflow.py and Prepare.py.
  • Update default restraint atom selections (e.g., the 1-217 host index range) and any host-size assumptions.
  • Provide the correct host PDB/XML/PRM files and adjust directory names and job templates as needed.

If you plan to generalize beyond HP-BCD, consider adding configuration options for host files, host index ranges for restraints, and template selection, and then updating the documentation accordingly.

Requirements

System Requirements

  • Linux/Unix environment (tested on cluster systems)
  • Python 3.x
  • GPU with CUDA or OpenCL support (OpenMM platform)
    • Production, BAR, and Energy scripts request the CUDA platform by default; Equil and Solvate let OpenMM choose the default platform unless you override it
    • CPU platform may work for small tests but is not recommended for production

Software Dependencies

This project uses a pinned conda environment defined in environment.yml (with a small pip section). The following packages are included:

Required Packages

  1. OpenMM (with OpenMM-Setup)
    Molecular dynamics simulation engine

  2. pymbar (version >= 4)
    Free energy calculations and BAR analysis https://pymbar.readthedocs.io/en/stable/

  3. MDTraj
    Trajectory analysis and manipulation https://www.mdtraj.org/1.9.8.dev0/index.html

  4. PDBFixer
    PDB file preparation and solvation https://github.com/openmm/pdbfixer

  5. Open Babel (Python bindings)
    Used to read SDF and write XYZ for docked guest coordinates
    https://open-babel.readthedocs.io/

  6. intspan
    Interval handling utilities https://pypi.org/project/intspan/

  7. Standard scientific Python stack

    • numpy
    • scipy
    • (typically included with conda environment)

Bundled Software

  • Force Field X (FFX): Version 1.0.0 is bundled in the ffx-1.0.0/ directory. Used for force field parameterization and file conversions. If a new version is needed, download from the official Force Field X website.
  • A minimum Java version of 21 is required to run this bundled tool:

Installation

Step 1: Install Miniconda

If not already installed, follow the official Miniconda installation guide.

Step 2: Clone Repository

git clone <repository_url>
cd binding_free_energy

Step 3: Create Conda Environment from environment.yml

conda env create -f environment.yml

This will create a conda environment named binding_free_energy (as defined in the file) with pinned versions for Python, OpenMM, MDTraj, PDBFixer, and other dependencies. A small set of packages is installed via pip as specified under the pip: section.

Step 4: Activate the Environment

conda activate binding_free_energy

Notes:

  • To update an existing environment to match the file: conda env update -f environment.yml --prune
  • OpenMM platforms: the environment provides a CUDA toolkit. Select your platform at runtime. Production, BAR, and Energy scripts explicitly request CUDA by default; Equil and Solvate let OpenMM choose the default platform. To force a platform globally, set OPENMM_DEFAULT_PLATFORM (e.g., CUDA or OpenCL).
  • Java requirement for FFX: Force Field X tools require Java 21. If Java 21 is not available on your system, install it (e.g., conda install -c conda-forge openjdk=21) or use a system-wide JDK 21.

Input Files

The workflow requires prepared input files for both the guest molecule and host-guest complex:

Guest Molecule

  • <guest>.xyz: Atomic coordinates from PolType (placed in Guests/ directory)
  • <guest>.prm: Force field parameters from PolType
  • <guest>.pdb: PDB format structure (generated by Prepare.py)
  • <guest>.xml: OpenMM force field XML (generated by Prepare.py)

Host-Guest Complex

  • hp-bcd.pdb: Host molecule structure
  • hp-bcd.xml: Host force field parameters
  • Docked structure: SDF file containing docked guest coordinates (host pose from hp-bcd.pdb is used as-is)

Note: The current workflow is tailored to HP-BCD as the host; using a different host requires code changes (see "Host Assumption and Limitations").


Scripts and Entry Points

Main Workflow Script

Workflow.py

Purpose: Main orchestrator for the complete binding free energy calculation workflow.

Usage:

python Workflow.py --guest_name <name> [options]

Required Arguments:

  • --guest_name: Name of guest molecule (must match XYZ file in Guests/ directory)

Optional Arguments:

  • --setup_only <true|false>: Run setup only without submitting jobs (default: false)
  • --start_at <method>: Start workflow at specific step (options: setup_directories, submit_equil, submit_thermo, submit_bar, collect_energy)
  • --run_equilibration <true|false>: Whether to run equilibration (default: true)
  • --alchemical_atoms <list>: Comma-separated atom indices for alchemical transformation (auto-detected from guest if not specified)
  • --restraint_type <COM|BORESCH>: Restraint model to use (default: COM)
  • --restraint_atoms1 <list>: Comma-separated indices for restraint group 1
  • --restraint_atoms2 <list>: Comma-separated indices for restraint group 2
  • --submission_system <SGE|SLURM>: Job scheduler to target for job scripts and submission (default: SGE)

Scheduler Selection and Job Templates

  • Use --submission_system to select the scheduler targeted by job scripts and templates.
    • SGE (default): templates are loaded from JobFiles/SGE
    • SLURM: templates are loaded from JobFiles/SLURM
  • The workflow copies the appropriate equil.job, thermo.job, and bar.job from the selected template folder into your working directories.

Automatic Selections and Defaults

  • If --alchemical_atoms is not provided, the workflow auto-detects the alchemical atom range from the first line of Guests/<NAME>.xyz and sets it to 1-<N>.
  • For the Host_Guest leg, if restraint atoms are not provided:
    • Group 1 defaults to a host index list tailored to HP-BCD: 11,16,17,20,23,26,31,32,39,40,51,63,64,70,71,82,94,95,101,102,113,125,126,132,133,144,156,157,163,164,175,187,188,191,198
    • Group 2 is auto-selected using the built-in calculate_restraint_subsection function with a 5.0 Å cutoff around the guest. The indices are reported to the console.

Usage

Full Workflow (Default Behavior):

To run the full workflow including job submission:

python Workflow.py --guest_name Diflunisal

Setup Only::

The workflow supports a --setup_only flag that allows you to run the workflow to only set up directories and files without submitting any jobs or scripts.

python Workflow.py --guest_name Diflunisal --setup_only true

Next Steps After Setup-Only:

After running in setup-only mode, you can:

  1. Review the generated files in:

    • workflow/<GuestName>/Guest_Workflow/
    • workflow/<GuestName>/Host_Guest_Workflow/
  2. Manually submit jobs if needed, or

  3. Continue the workflow from the equilibration step:

    python Workflow.py --guest_name Diflunisal --start_at submit_equil

Output Files

Simulation Outputs

  • output*.dcd: Trajectory files in DCD format
  • output*.xml: Simulation state files for restart
  • output*.log: Simulation log files
  • *_solv.pdb: Solvated structures

Analysis Outputs

  • BAR free energy differences printed to stdout
  • TODO: Add file-based output for BAR results and final binding free energies

The following scripts are utilized within Workflow.py

Preparation Scripts

Prepare.py

Purpose: Convert guest molecule files from PolType output to OpenMM-compatible formats and merge with host structures.

Usage:

python Prepare.py --guest_file <xyz> --prm_file <prm> --host_file_dir <dir> --target_dir <dir> --docked_file <sdf>

Required Arguments:

  • --guest_file: Guest XYZ file from PolType
  • --prm_file: PRM parameter file from PolType
  • --host_file_dir: Directory containing host PDB and XML files
  • --target_dir: Output directory for prepared files
  • --docked_file: SDF file containing the docked guest coordinates (only guest coordinates are used; host pose from hp-bcd.pdb is unchanged)

Note on coordinate handling:

  • The docked SDF should contain the guest conformer pose from docking. Only the guest coordinates are taken from this file.
  • Prepare.py replaces the guest PDB coordinates with those from the SDF before creating the host-guest complex. The host pose from hp-bcd.pdb remains unchanged.
  • After merging host and guest, no additional post-merge coordinate copying is performed.

Solvate.py

Purpose: Add explicit solvent to a molecular system using PDBFixer and perform energy minimization.

Usage:

python Solvate.py --pdb_file <pdb> --forcefield_file <xml> [<xml> ...]

Required Arguments:

  • --pdb_file: Input PDB file
  • --forcefield_file: One or more OpenMM XML force field files

Output: Creates <filename>_solv.pdb with solvated and minimized structure.

Simulation Scripts

Equil.py

Purpose: Run equilibration MD simulation.

Usage:

python Equil.py --pdb_file <pdb> --forcefield_file <xml> [options]

Common Arguments:

  • --pdb_file: Input PDB structure
  • --forcefield_file: Force field XML file(s)
  • --nonbonded_method: Method for nonbonded interactions (NoCutoff, CutoffNonPeriodic, PME, Ewald)
  • --nonbonded_cutoff: Cutoff distance in nm (default: 1.0)
  • --num_steps: Number of simulation steps
  • --step_size: Integration time step in fs
  • --use_restraints: Enable harmonic restraints
  • --restraint_type: Restraint model to use (COM or BORESCH, default: COM)
  • --restraint_atoms_1, --restraint_atoms_2: Atom indices for restraints
  • --restraint_constant: Force constant for restraints (kcal/mol/Ų)
  • --restraint_lower_distance, --restraint_upper_distance: Flat-bottom restraint distances (Å)

Production.py

Purpose: Run production MD simulation with alchemical transformations.

Usage:

python Production.py --pdb_file <pdb> --forcefield_file <xml> --vdw_lambda <value> --elec_lambda <value> --alchemical_atoms <list> [options]

Additional Arguments:

  • --vdw_lambda: Van der Waals lambda value (0.0 to 1.0)
  • --elec_lambda: Electrostatic lambda value (0.0 to 1.0)
  • --alchemical_atoms: Comma-separated atom indices to transform

Analysis Scripts

BAR.py

Purpose: Calculate free energy differences between adjacent lambda windows using Bennett Acceptance Ratio method.

Usage:

python BAR.py --traj_i <dcd> --traj_ip1 <dcd> --pdb_file <pdb> --forcefield_file <xml> --vdw_lambda_i <value> --vdw_lambda_ip1 <value> --elec_lambda_i <value> --elec_lambda_ip1 <value> --alchemical_atoms <list> [options]

Required Arguments:

  • --traj_i: Trajectory file at lambda i (DCD format)
  • --traj_ip1: Trajectory file at lambda i+1
  • --pdb_file: Reference PDB structure
  • --forcefield_file: Force field XML file(s)
  • --vdw_lambda_i, --vdw_lambda_ip1: VDW lambda values
  • --elec_lambda_i, --elec_lambda_ip1: Electrostatic lambda values
  • --alchemical_atoms: Atoms undergoing transformation

Optional Arguments:

  • --start: Starting frame index (default: 0)
  • --stop: Ending frame index (default: all frames)
  • --step_size: Frame sampling interval (default: 1)
  • --nonbonded_method: Nonbonded method
  • --nonbonded_cutoff: Cutoff distance
  • --restraint_lambda_i, --restraint_lambda_ip1: Restraint scaling parameters for states i and i+1 (default: 1.0)

Output: Prints forward/reverse FEP and BAR free energy differences with uncertainties.

Energy.py

Purpose: Calculate potential energy of a system at specific alchemical lambda values.

Usage:

python Energy.py --pdb_file <pdb> --forcefield_file <xml> --vdw_lambda <value> --elec_lambda <value> --alchemical_atoms <list> [options]

Supports the same optional restraint flags as Equil.py (e.g., --use_restraints, --restraint_type, --restraint_atoms_1, --restraint_atoms_2, etc.). By default, this script requests the CUDA platform.

Utility Scripts

alchemistry/Freefix.py

Purpose: Calculate analytical corrections for harmonic restraint contributions to binding free energy.

Usage:

python alchemistry/Freefix.py <ri> <fi> <ro> <fo> <temp>

Arguments:

  • ri: Inner flat-bottom radius (Å)
  • fi: Inner force constant (kcal/mol/Ų)
  • ro: Outer flat-bottom radius (Å)
  • fo: Outer force constant (kcal/mol/Ų)
  • temp: Temperature (K)

Note: Currently implemented with HARMONIC method; BORESCH method not yet implemented.


Troubleshooting

Common Issues

  1. OpenMM Platform Issues (OpenCL/CUDA)

    • Production, BAR, and Energy request CUDA by default. Verify available platforms:
      • python - <<'PY'\nimport openmm\nprint([openmm.Platform.getPlatform(i).getName() for i in range(openmm.Platform.getNumPlatforms())])\nPY
    • To force CUDA (if installed): set OPENMM_DEFAULT_PLATFORM=CUDA or modify the script to use CUDA.
    • If CUDA not found, ensure CUDA toolkit and OpenMM CUDA package are correctly installed.
  2. Restraint Warnings

    • If you see "WARNING! Restraints triggered" messages, the system may have drifted outside restraint boundaries
    • Consider adjusting restraint parameters or checking system stability
  3. Force Field X Errors

    • Ensure ffx-1.0.0/bin is accessible
    • Check Java installation for FFX execution
  4. Memory Issues

    • Large systems may require significant GPU memory
    • Reduce trajectory output frequency or system size if needed

Contributing

TODO: Add contribution guidelines


License

TODO: No license file is currently present in the repository. Please add appropriate license information.


Citation

TODO: Add citation information for publications using this workflow.


Contact and Support

TODO: Add contact information for questions and support.


Acknowledgments

This workflow uses the following software packages:

  • OpenMM for molecular dynamics simulations
  • pymbar for free energy analysis
  • MDTraj for trajectory processing
  • Force Field X for force field parameterization

References

TODO: Add relevant literature references for:

  • Alchemical free energy methods
  • Bennett Acceptance Ratio method
  • Annihilation approach
  • Force field parameterization

Last Updated: 2026-04-06 13:02

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 63.7%
  • Shell 18.1%
  • Batchfile 8.4%
  • Groovy 4.4%
  • Jupyter Notebook 4.0%
  • MATLAB 1.4%