Skip to content

Tutorial Y

Lee Burton edited this page Jul 12, 2025 · 15 revisions

Workflows

A job is an individual unit of computing.

A worflow is a series of jobs that must be performed to recieve a desired output. These are also called pipelines. In order to be truly high-throughput we need automated workflows.

There are many workflow management tools available. Aiida, Joblib, AFlow, ASE etc.

These are used to build automated databases, such as the Materials Project, OQMD, Materials Cloud, Jarvis etc.

JobFlow is the successor to Fireworks, which is what was used to build the Materials Project.

Workflows are executed by Workflow Engines. Workflow libraries are available that encode the process of the workflows.

Pymatgen is a transformation, processing or preparation tool. Custodian is an error correction in realtime tool.

A job can be any process. But it must return some kind of serialisable output object. (such as int, float, bool. list dict).

MSONable types too, the underlying datastructure of pymatgen.

Jobs are defiined using decorators.

JSON is JavaScript Object Notation. Designed to be both human and machine readable.

PyDantic used for validation and serialisation through type annotations.

"TaskDocuments" are Pydantic models with a function that initialises them from a directory or similar.

Model, schema and document are considered interchangable. Workflow and pipeline are considered interchangable.

Job flow is a language to construct work flows. Job flow Remote is to execute them. It is comprised of User, Runner and Worker.

On POWER

To begin this kind of workflow on POWER system, we have our own database server. On this server is the software MongoDB that can be accessed by the command "mongosh" short for Mongo Shell. Unfortunately this can't be installed on the log in node so once you are on POWER you will need to set up passwordless access to our compute node. Currently these are compute-0-422 (132.66.112.243)ย 

On a windows machine with Jupyter

๐Ÿงช Atomate2 + VASP Setup on Windows (Step 1: Local Environment & POSCAR Generation)

This guide walks you through setting up a working Atomate2-compatible environment on Windows, using Anaconda, and generating your first VASP input file (a POSCAR) using pymatgen.

โœ… By the end of this step, you'll be able to create crystal structures locally and write them to disk, with no kernel crashes.


โœ… Prerequisites

  • Windows 10 or 11
  • Anaconda Distribution
  • Optional: Git Bash or WSL for UNIX-style tools
  • Access to a remote HPC cluster (via VPN and SSH keys)
  • Basic knowledge of Python and Jupyter Notebooks

๐Ÿ“ฆ Step 1: Create a Conda Environment

Open Anaconda Prompt, then create and activate your environment:

conda create -n atomate2_env python=3.10 -y
conda activate atomate2_env

๐Ÿ”ง Step 2: Install Required Packages Install all scientific libraries using the conda-forge channel. This ensures consistent binaries and avoids DLL issues (especially those involving Intel MKL):

conda install -c conda-forge pymatgen numpy scipy spglib mkl

This command will:

Install pymatgen for materials structure manipulation

Install spglib for symmetry analysis

Ensure compatible versions of numpy, scipy, and Intel MKL are included

๐Ÿงช Step 3: Launch Jupyter Notebook Still in your activated environment, launch Jupyter Notebook:

jupyter notebook

A browser window should open with the Jupyter interface.

Create a new notebook and paste the following into the first cell:

from pymatgen.core.structure import Structure

# Create and write a silicon POSCAR file locally
structure = Structure.from_spacegroup("Fd-3m",
    lattice=[[3.84, 0, 0], [0, 3.84, 0], [0, 0, 3.84]],
    species=["Si", "Si"],
    coords=[[0, 0, 0], [0.25, 0.25, 0.25]]
)

structure.to(filename="POSCAR", fmt="poscar")
print("โœ… POSCAR written locally.")

If successful, you should see the output:

โœ… POSCAR written locally. A new file named POSCAR will appear in the same directory as the notebook.

โœ… Status Check At this point, you've successfully:

Created a working Conda environment for Atomate2

Installed all required dependencies with no kernel crashes

Verified that pymatgen and spglib run correctly on Windows

Generated a VASP-compatible POSCAR file locally

You are now ready to move on to working with remote clusters, submitting jobs, and using the full Atomate2 workflow.

โญ๏ธ Next Steps Coming soon:

Step 2: Connecting to a remote HPC using Fabric and Paramiko

Step 3: Submitting VASP jobs via SLURM

Step 4: Parsing and storing results with Atomate2

๐Ÿ”— Atomate2 + VASP Setup on Windows (Step 2: Remote Connection via SSH)

In this step, you'll connect from your Windows machine to a remote HPC cluster using Python code inside a Jupyter notebook. This is useful for scripting file transfers and job submissions using tools like Fabric.


โœ… Prerequisites

  • You have completed Step 1 and are inside your atomate2_env Conda environment.
  • You can connect to your HPC manually via ssh your_username@your_hpc_hostname from a terminal.
  • You have an SSH private key located at: C:\Users\<YourUsername>\.ssh\id_rsa
  • You are connected to your HPC's network (e.g. via VPN)

๐Ÿ”ง Step 1: Install SSH Libraries

Use the following command in Anaconda Prompt to install fabric and its dependency paramiko:

conda install -c conda-forge fabric paramiko

๐Ÿ’ป Step 2: Test SSH Connection from Python Open your Jupyter Notebook and paste this into a new cell (adjust username and host as needed):

from fabric import Connection
import os

# Adjust this to your HPC login node
hpc_host = "your_hpc_hostname"  # e.g., "powerslurm-login"
username = "your_username"      # e.g., "lalbu"

# Path to your SSH private key
key_path = os.path.expanduser("~/.ssh/id_rsa")

# Create the SSH connection
conn = Connection(
    host=hpc_host,
    user=username,
    connect_kwargs={"key_filename": key_path}
)

Test connection by running 'hostname' on the remote machine

result = conn.run("hostname", hide=True) print("โœ… Connected to remote host:", result.stdout.strip()) If successful, youโ€™ll see output like:

โœ… Connected to remote host: powerslurm-login

If not, double-check the following:

You're connected to VPN (if your HPC requires it)

key_path is correct and your private key has no passphrase (or use an agent)

The hostname and username are accurate

๐Ÿ“ Step 3: Upload a File (Optional) To test file upload from your local machine to the HPC:

conn.put("POSCAR", remote="~/POSCAR")
print("โœ… POSCAR uploaded to remote home directory.")
๐Ÿ›‘ Known Issues

If you get Permission denied (publickey), check your SSH key and permissions.

If you get kernel crashes, make sure you are not using paramiko.RSAKey.from_private_key(...) directly. Let fabric handle the key via key_filename.

โœ… Status Check At this point, you've:

Established SSH connectivity to your HPC from a Jupyter notebook

Verified execution of remote shell commands

(Optionally) Uploaded files using Python

Youโ€™re now ready to move on to submitting VASP jobs and automating workflows.

โญ๏ธ Next Steps Coming soon:

Step 3: Submitting VASP jobs via SLURM

Step 4: Automating result parsing and storage using Atomate2

โš™๏ธ Atomate2 + VASP Setup on Windows (Step 3: Submitting VASP Jobs via SLURM)

In this step, you'll prepare and submit a VASP relaxation job to your HPC cluster using a SLURM batch script, all managed through Python and Fabric.


โœ… Prerequisites

Before starting this step, make sure:

  • Youโ€™ve completed Step 2
  • You can connect to your HPC via Fabric from your Jupyter notebook
  • You have access to VASP on your cluster (via a module system or absolute path)
  • You have a working SSH connection with file transfer capability
  • A valid POSCAR file exists locally

๐Ÿ“‚ Step 1: Prepare VASP Input Files

You need the following files for a basic VASP job:

  • POSCAR (structure)
  • INCAR (calculation settings)
  • KPOINTS (k-point mesh)
  • POTCAR (pseudopotentials โ€” usually manually copied or generated)

You can create these in Python or upload them from your local machine.

For example, to upload files:

conn.put("POSCAR", remote="~/vasp_job/POSCAR")
conn.put("INCAR", remote="~/vasp_job/INCAR")
conn.put("KPOINTS", remote="~/vasp_job/KPOINTS")
conn.put("POTCAR", remote="~/vasp_job/POTCAR")
print("โœ… All VASP input files uploaded.")

๐Ÿ“ Step 2: Generate a SLURM Batch Script Create a run.slurm script locally or remotely using the following template:

slurm_script = """#!/bin/bash
#SBATCH --job-name=vasp_relax
#SBATCH --output=vasp.out
#SBATCH --error=vasp.err
#SBATCH --ntasks=16
#SBATCH --time=01:00:00
#SBATCH --partition=standard

module load vasp  # or use the full path to your VASP binary

srun vasp_std > vasp.log
"""

# Write it to file and upload
with open("run.slurm", "w") as f:
    f.write(slurm_script)

conn.put("run.slurm", remote="~/vasp_job/run.slurm")
print("โœ… SLURM batch script uploaded.")

Customize the SLURM options according to your cluster configuration.

๐Ÿš€ Step 3: Submit the SLURM Job

conn.run("cd ~/vasp_job && sbatch run.slurm")
print("โœ… SLURM job submitted.")
You can also check the job queue using:

```python
conn.run("squeue -u your_username")

Replace your_username with your actual HPC login name.

๐Ÿ” Step 4: Monitor Job Output Once the job is running or finished, you can fetch the output:

conn.get("~/vasp_job/vasp.out", local="vasp.out")
conn.get("~/vasp_job/vasp.err", local="vasp.err")
conn.get("~/vasp_job/vasp.log", local="vasp.log")
print("โœ… Output files downloaded for inspection.")

If you're running a larger workflow, you can also automate parsing with pymatgen or Atomate2 in later steps.

โœ… Status Check At this point, youโ€™ve:

Uploaded VASP input files to your cluster

Created a run.slurm batch file for SLURM

Submitted a job from Python via SSH

Optionally retrieved output files to your local machine

Youโ€™re now ready to parse the results and store them in a structured database or MongoDB.

โญ๏ธ Next Steps Coming soon:

Step 4: Parsing and storing VASP results with Atomate2

Optional: Using make-relax-job.py and parse-and-store.py automation scripts

๐Ÿงพ Atomate2 + VASP Setup on Windows (Step 4: Parsing and Storing VASP Results)

In this final step, youโ€™ll parse the output of your VASP relaxation job using atomate2 tools and optionally store the results into a MongoDB database for analysis and workflow tracking.


โœ… Prerequisites

  • You've completed Step 3
  • Your VASP job completed successfully on your HPC
  • You have access to the output files (vasprun.xml, OUTCAR, etc.)
  • You have installed:
    • atomate2
    • jobflow
    • pymatgen
    • monty
    • (optional) MongoDB running locally or remotely

๐Ÿ’พ Step 1: Install Atomate2 + MongoDB Tools

In your Conda environment (atomate2_env), run:

conda install -c conda-forge atomate2 jobflow monty pymatgen

If you plan to store results in MongoDB:

conda install -c conda-forge pymongo

๐Ÿ“ฅ Step 2: Download VASP Output Files (if needed) If your calculation ran remotely, download the outputs to your local machine:

conn.get("~/vasp_job/vasprun.xml", local="vasprun.xml")
conn.get("~/vasp_job/OUTCAR", local="OUTCAR")

๐Ÿงช Step 3: Parse the Results with Pymatgen You can parse the vasprun.xml and OUTCAR locally using pymatgen:

from pymatgen.io.vasp.outputs import Vasprun, Outcar

vasprun = Vasprun("vasprun.xml")
outcar = Outcar("OUTCAR")

final_structure = vasprun.final_structure
energy = vasprun.final_energy
print("โœ… Final energy:", energy)
print("โœ… Final structure formula:", final_structure.composition.formula)

This gives you access to relaxed structure, energy, band gap, and much more.

๐Ÿงฐ Step 4: Use Atomate2โ€™s VASP Parsers (Optional) If you're using Atomate2, you can use its built-in job parsers:

from atomate2.vasp.jobs.base import VaspRelaxMaker
from atomate2.vasp.run import run_vasp


# You can simulate a full job here (advanced):
maker = VaspRelaxMaker()

Atomate2โ€™s makers are more useful inside full jobflow workflows. For simple result inspection, pymatgen is often enough.

๐Ÿ—ƒ๏ธ Step 5: Store Results in MongoDB (Optional) If youโ€™re running a MongoDB server (locally or remotely), you can insert your results:

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")  # Or replace with your server address
db = client.vasp_database

# Example document
doc = {
    "formula": final_structure.composition.formula,
    "energy": energy,
    "structure": final_structure.as_dict(),
    "vasprun": vasprun.as_dict()
}

db.calculations.insert_one(doc)
print("โœ… Stored result in MongoDB.")

You can later query this data, analyze it, or use it to construct further workflows.

โœ… Status Check Youโ€™ve now:

Parsed the results of your VASP relaxation job using pymatgen

Optionally used atomate2 or jobflow tools

Optionally stored results into MongoDB

This completes your end-to-end local + remote Atomate2-style workflow on Windows.

๐Ÿ“š Optional: Automate All This In your Atomate2 project, you can write a Python script like parse-and-store.py to do all the parsing and storage automatically. You can call this at the end of your SLURM job using:

python parse-and-store.py --job-dir .

Or include it in your run.slurm like this:

python parse-and-store.py --job-dir . > postprocess.log 2>&1

๐Ÿ Conclusion Youโ€™ve now completed a full pipeline for running and analyzing VASP jobs with Python:

Generated structure (POSCAR) using pymatgen

Connected to HPC using fabric

Submitted VASP jobs with sbatch

Parsed and optionally stored the results

Youโ€™re ready to build higher-level workflows, generate datasets, or explore the full capabilities of Atomate2.

Happy computing! โš›๏ธ

Clone this wiki locally