This repository provides a collection of scripts to perform active learning for efficiently training highly accurate machine learning potentials with the MACE model.
Active learning is a technique to efficiently train machine learning models by intelligently selecting the most informative data points to add to the training set. This repository provides tools to perform the following active learning cycle:
- Ensemble of Models: Train an ensemble (typically 3-10) models with an existing dataset varying the random seed for different models. If this is the first iteration, use a small but diverse seed dataset.
- Molecular Dynamics (MD): Run MD simulations using the current MACE models to explore the potential energy surface of the system.
- Identify Informative Frames: Make predictions on the MD frames with all models and track uncertainty between them. Select the most uncertain frames for labeling.
- Quantum mechanical (QM) Calculations: Perform QM calculations (e.g., DFT) on the selected frames to obtain reference energies and forces.
- Add new data to train set: Add the new QM-labeled data to the existing training set.
- Repeat: Repeat the cycle until the model's performance meets the desired criteria.
This repository is organized into modules for each major step of the workflow. The submit_*.sh scripts are examples for submitting jobs to a cluster using a scheduler like Slurm and will need to be adapted to your specific computing environment.
This directory contains a script for training a MACE model.
submit_mace_example.sh: A shell script for submitting a single MACE model training job to a batch scheduling system.
This directory contains scripts for performing hyperparameter optimization for the MACE model.
grid_search_example.py: Script for performing grid search over a defined hyperparameter space. Can be used to train an ensemble of models for active learning.random_search_example.py: Script for performing a random search over hyperparameter space.
This directory contains the core scripts for running the active learning cycle.
prepare_al_data.py: Randomly selects starting configurations for MD in active learning from the current training set.active_learning_array.py: Performs MD simulations on the structures selected in previous step, tracks uncertainty between the models and saves the candidate frames.collect_al_frames.py: Gathers the most uncertain frames from all candidates. These frames will be labeled with QM calculations.
This directory provides scripts for comprehensive evaluation of the trained MACE potentials.
evaluate_mace_model.py: A general script to evaluate a trained MACE model on a test set, calculating metrics like MAE/RMSE for energy and forces.evaluate_trajectory.py/evaluate_all_trajectories.py: Evaluates the model's performance on one or multiple trajectories, comparing MACE predictions to reference data.evaluation_single_frame.py/evaluate_all_frames.py: Evaluates the model on specific single frames or all frames from a trajectory file.
This directory contains scripts for running molecular dynamics simulations using a trained MACE potential.
run_nvt_md.py: Runs an MD simulation in the canonical (NVT) ensemble using the MACE potential as the calculator via the Atomic Simulation Environment (ASE).