Skip to content

Commit a2edc72

Browse files
authored
Merge pull request #782 from KaplanOpenSource/issue641
docs(dev-guide): Comprehensive Hermes workflow toolkit developer guide
2 parents 0ebeb80 + 9e13517 commit a2edc72

1 file changed

Lines changed: 347 additions & 8 deletions

File tree

Lines changed: 347 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,350 @@
1-
# Hermes workflow toolkit
1+
# Hermes Workflow Toolkit
22

3-
The `hermesWorkflowToolkit` is the base class for simulation toolkits that support workflow pipelines:
4-
5-
| Feature | Description |
6-
|---------|-------------|
7-
| Workflow groups | Organize simulations into named groups |
8-
| Chained steps | Chain multiple simulation steps (preprocessing → solving → post-processing) |
9-
| Template support | Use templates for reproducible case setup |
3+
The `hermesWorkflowToolkit` is the database-backed workflow orchestrator that wraps the [Hermes](https://github.com/KaplanOpenSource/hermes) library. It manages the full lifecycle of simulation workflows: creating them from JSON, storing them in MongoDB with auto-generated names, building them into Luigi task DAGs, executing them, and comparing parameter variations.
104

115
`OFToolkit` inherits from `hermesWorkflowToolkit`, gaining workflow support automatically.
6+
7+
---
8+
9+
## Architecture overview
10+
11+
### Relationship to Hermes
12+
13+
The toolkit is a **wrapper** around the Hermes workflow engine:
14+
15+
| Layer | Class | Role |
16+
|-------|-------|------|
17+
| **Hera toolkit** | `hermesWorkflowToolkit` | MongoDB storage, naming, retrieval, comparison, execution orchestration |
18+
| **Hermes workflow** | `hermes.workflow` | JSON → task DAG construction, parameter extraction, node management |
19+
| **Hermes engine** | `hermes.engines.luigi.builder` | Task DAG → Luigi Python module code generation |
20+
| **Hermes wrapper** | `hermes.taskwrapper.wrapper` | Wraps each node into a TaskWrapper with dependencies, parameters, and executer resolution |
21+
| **Luigi** | `luigi.Task` | Dependency-based execution engine with target-file state tracking |
22+
23+
### Class hierarchy
24+
25+
```
26+
abstractToolkit
27+
28+
hermesWorkflowToolkit (hera/simulations/hermesWorkflowToolkit.py)
29+
├─ OFToolkit (hera/simulations/openFoam/toolkit.py)
30+
│ └─ adds OFObjectHome, Analysis, Presentation, solver extensions
31+
└─ workflowToolkit [LSM variant] (hera/simulations/LSM/hermesWorkflowToolkit.py)
32+
└─ imports hermes handlers (handler_build, handler_expand, handler_execute)
33+
```
34+
35+
### Key constants
36+
37+
```python
38+
DOCTYPE_WORKFLOW = "hermesWorkflow" # MongoDB document type
39+
DESC_GROUPNAME = "groupName" # desc field keys
40+
DESC_GROUPID = "groupID"
41+
DESC_WORKFLOWNAME = "workflowName"
42+
DESC_PARAMETERS = "parameters"
43+
```
44+
45+
---
46+
47+
## Workflow JSON structure
48+
49+
A workflow is a directed acyclic graph (DAG) of **nodes**, where each node represents a simulation step (copy files, generate mesh, run solver, etc.):
50+
51+
```json
52+
{
53+
"workflow": {
54+
"solver": "simpleFoam",
55+
"root": "finalnode_xx",
56+
"nodeList": ["blockMesh", "decomposePar", "simpleFoam", "reconstructPar"],
57+
"nodes": {
58+
"blockMesh": {
59+
"type": "openFOAM.mesh.BlockMesh",
60+
"Execution": {
61+
"input_parameters": {
62+
"vertices": [[0,0,0], [100,0,0], ...],
63+
"cellCount": [50, 50, 20]
64+
}
65+
},
66+
"requires": []
67+
},
68+
"simpleFoam": {
69+
"type": "openFOAM.system.ControlDict",
70+
"Execution": {
71+
"input_parameters": {
72+
"executeDir": "{blockMesh.output}",
73+
"endTime": 1000
74+
}
75+
},
76+
"requires": ["blockMesh", "decomposePar"]
77+
}
78+
}
79+
}
80+
}
81+
```
82+
83+
### Parameter referencing
84+
85+
Nodes reference outputs from other nodes using curly-brace syntax:
86+
87+
| Pattern | Meaning |
88+
|---------|---------|
89+
| `{nodeName.param}` | Output parameter from another node |
90+
| `{workflow.param}` | Workflow-level parameter |
91+
| `{WebGui.formData.field}` | GUI form data (Hermes workbench) |
92+
93+
These references are resolved by the `TaskWrapper` to build the dependency graph automatically.
94+
95+
---
96+
97+
## Workflow lifecycle
98+
99+
### 1. Create: load workflow from JSON
100+
101+
```python
102+
# Dynamic class resolution based on solver field.
103+
# If solver is null → hermes.workflow (generic)
104+
# If solver is "simpleFoam" → hera.simulations.openFoam.OFWorkflow.workflow_simpleFoam
105+
wf = toolkit.getHermesWorkflowFromJSON(
106+
workflow="path/to/workflow.json", # or dict or JSON string
107+
name="my_workflow",
108+
resource="/path/to/case",
109+
)
110+
```
111+
112+
Internally, `pydoc.locate()` dynamically imports the workflow class based on the `solver` field. This allows solver-specific workflow subclasses to add custom node types and validation.
113+
114+
### 2. Add to database
115+
116+
```python
117+
doc = toolkit.addWorkflowToGroup(
118+
workflowJSON="path/to/workflow.json",
119+
groupName="dispersion",
120+
writeWorkflowToFile=False,
121+
resource=None, # auto-generated if None
122+
)
123+
# Creates document: dispersion_0001 (auto-incremented)
124+
```
125+
126+
**Idempotent add:** The method first queries the DB by parameter content. If an identical workflow already exists, it returns the existing document instead of creating a duplicate.
127+
128+
**What gets stored:**
129+
```python
130+
doc = self.addSimulationsDocument(
131+
resource=resource,
132+
dataFormat=datatypes.STRING,
133+
type="hermesWorkflow",
134+
desc={
135+
"groupName": groupName,
136+
"groupID": groupID,
137+
"workflowName": workflowName, # e.g. "dispersion_0001"
138+
"solver": hermesWF.solver,
139+
"workflow": hermesWF.json, # full workflow JSON
140+
"parameters": hermesWF.parametersJSON, # flattened for querying
141+
}
142+
)
143+
```
144+
145+
The `parameters` are stored separately from the full `workflow` JSON to enable efficient MongoDB queries via `dictToMongoQuery()` without parsing the workflow tree.
146+
147+
### 3. Build: generate Luigi tasks
148+
149+
```python
150+
# Inside executeWorkflowFromDB():
151+
build = hermesWF.build(buildername=workflow.BUILDER_LUIGI)
152+
```
153+
154+
The build process:
155+
156+
1. **`hermes.workflow._buildNetwork()`** traverses the JSON and creates `TaskWrapper` objects for each node
157+
2. Each `TaskWrapper` extracts its dependencies by scanning `{node.param}` references in input parameters
158+
3. **`LuigiBuilder.buildWorkflow()`** converts the task graph into a Python module containing Luigi Task classes
159+
4. The generated module has one class per task (e.g. `blockMesh_0`, `simpleFoam_0`) with:
160+
- `requires()` → returns upstream task instances
161+
- `output()` → returns `luigi.LocalTarget` for state tracking
162+
- `run()` → calls the Hermes executer for that node type
163+
164+
### 4. Execute: run Luigi pipeline
165+
166+
```python
167+
toolkit.executeWorkflowFromDB("dispersion_0001")
168+
```
169+
170+
Execution steps:
171+
172+
1. Retrieve workflow document from DB
173+
2. Reconstruct `hermes.workflow` object from stored JSON
174+
3. Build Luigi task DAG (generates Python module)
175+
4. Write workflow JSON + Python module to `filesDirectory`
176+
5. **Clean previous targets** — remove `{name}_targetFiles/` to force re-execution
177+
6. **Run Luigi** via command line: `python3 -m luigi --module {name} finalnode_xx_0 --local-scheduler`
178+
7. Clean up generated Python module (JSON stays)
179+
180+
The `finalnode_xx` is a synthetic terminal node that depends on all actual nodes — executing it triggers the entire DAG.
181+
182+
### 5. Compare and retrieve
183+
184+
**Cascading search**`getWorkflowDocumentFromDB(input)` tries four strategies in order:
185+
186+
1. **By name** — exact match on `workflowName` (e.g. `"dispersion_0001"`)
187+
2. **By resource** — match by file/directory path
188+
3. **By group name** — returns all workflows in the group (e.g. `"dispersion"`)
189+
4. **By JSON content** — parses input as JSON, extracts parameters, queries by flattened parameter values via `dictToMongoQuery()`
190+
191+
**Comparison:**
192+
193+
```python
194+
# Compare all workflows in a group
195+
diff = toolkit.compareWorkflowInGroup("dispersion")
196+
# Returns DataFrame: rows = differing parameters, columns = workflow names
197+
198+
# Compare specific workflows
199+
diff = toolkit.compareWorkflows(["dispersion_0001", "dispersion_0002"])
200+
201+
# List all workflows in a group
202+
toolkit.listWorkflows("dispersion", listNodes=True, listParameters=True)
203+
```
204+
205+
---
206+
207+
## MongoDB storage
208+
209+
### Document schema
210+
211+
```json
212+
{
213+
"_id": "ObjectId",
214+
"_cls": "Metadata.Simulations",
215+
"projectName": "MY_PROJECT",
216+
"type": "hermesWorkflow",
217+
"resource": "/path/to/dispersion_0001",
218+
"dataFormat": "string",
219+
"desc": {
220+
"groupName": "dispersion",
221+
"groupID": 1,
222+
"workflowName": "dispersion_0001",
223+
"solver": "simpleFoam",
224+
"workflow": { "workflow": { "nodeList": [...], "nodes": {...} } },
225+
"parameters": {
226+
"blockMesh": { "vertices": [...], "cellCount": [...] },
227+
"simpleFoam": { "endTime": 1000 }
228+
}
229+
}
230+
}
231+
```
232+
233+
### Querying
234+
235+
All queries filter by `type="hermesWorkflow"`. Parameters are queried using `dictToMongoQuery()` which flattens nested dicts into MongoEngine's double-underscore notation:
236+
237+
```python
238+
# {"parameters": {"simpleFoam": {"endTime": 1000}}}
239+
# → {"parameters__simpleFoam__endTime": 1000}
240+
```
241+
242+
---
243+
244+
## Group naming convention
245+
246+
**Format:** `<groupName>_<zero-padded-4-digit-id>`
247+
248+
```python
249+
# Examples: dispersion_0001, flow_0042, lsm_0003
250+
formatted_number = "{0:04d}".format(flowID)
251+
return f"{baseName}_{formatted_number}"
252+
```
253+
254+
**Counter mechanism:** Each group has its own counter stored in the project config via `getCounterAndAdd(groupName)`. The counter auto-increments on each `addWorkflowToGroup()` call.
255+
256+
**Parsing names:**
257+
```python
258+
base, id = toolkit.splitWorkflowName("dispersion_0001")
259+
# base = "dispersion", id = "0001"
260+
```
261+
262+
---
263+
264+
## LSM variant
265+
266+
**Source:** `hera/simulations/LSM/hermesWorkflowToolkit.py`
267+
268+
The LSM `workflowToolkit` extends the base with additional handler imports for the expand → build → execute pipeline:
269+
270+
| Handler | Purpose |
271+
|---------|---------|
272+
| `handler_expand` | Expand workflow templates with meteorological data |
273+
| `handler_build` | Build the workflow into executable tasks |
274+
| `handler_buildExecute` | Combined build + execute in one step |
275+
| `handler_execute` | Execute a pre-built workflow |
276+
277+
This supports LSM's pattern where workflows are first **expanded** with site-specific meteorological data before being built and executed. The base toolkit only uses build → execute.
278+
279+
---
280+
281+
## Key methods reference
282+
283+
### Workflow creation
284+
285+
| Method | Purpose |
286+
|--------|---------|
287+
| `getHermesWorkflowFromJSON(workflow, name, resource)` | Create workflow object from JSON (file/dict/string) |
288+
| `getHemresWorkflowFromDocument(documentList, returnFirst)` | Reconstruct workflow from DB document |
289+
| `updateDocumentWorkflow(document, workflow)` | Update stored JSON + parameters |
290+
291+
### Workflow storage
292+
293+
| Method | Purpose |
294+
|--------|---------|
295+
| `addWorkflowToGroup(workflowJSON, groupName, ...)` | Add workflow to DB with auto-naming |
296+
| `addWorkflowFileInGroup(workflowFilePath, write_file)` | Add from file, infer group from filename |
297+
| `findAvailableName(simulationGroup)` | Get next available ID + name |
298+
| `getworkFlowName(baseName, flowID)` | Format `<base>_<padded_id>` |
299+
| `splitWorkflowName(workflow_name)` | Parse `<base>_<id>` |
300+
301+
### Workflow retrieval
302+
303+
| Method | Purpose |
304+
|--------|---------|
305+
| `getHermesWorkflowFromDB(input, returnFirst)` | Retrieve as hermes.workflow object |
306+
| `getWorkflowDocumentFromDB(input, doctype, dockind)` | Retrieve raw document (cascading search) |
307+
| `getWorkflowDocumentByName(name, doctype, dockind)` | Retrieve by exact name |
308+
| `getWorkflowListDocumentFromDB(input)` | Retrieve as list of documents |
309+
| `getWorkflowDocumentsInGroup(groupName)` | All documents in a group |
310+
| `getWorkflowListOfSolvers(solverName)` | All documents for a solver |
311+
312+
### Comparison and listing
313+
314+
| Method | Purpose |
315+
|--------|---------|
316+
| `compareWorkflows(Workflow, longFormat, transpose)` | Compare workflows by name or list |
317+
| `compareWorkflowInGroup(workflowGroup)` | Compare all in a group |
318+
| `compareWorkflowObj(workflowList)` | Compare workflow objects directly |
319+
| `listWorkflows(workflowGroup, listNodes, listParameters)` | List workflows with optional details |
320+
| `listGroups(solver, workflowName)` | List all groups in the project |
321+
| `workflowTable(workflowGroup)` | Alias for `compareWorkflowInGroup` |
322+
323+
### Execution and cleanup
324+
325+
| Method | Purpose |
326+
|--------|---------|
327+
| `executeWorkflowFromDB(input)` | Build + execute via Luigi |
328+
| `deleteWorkflowInGroup(workflowGroup, deepDelete, resetCounter)` | Delete workflows, optionally remove files |
329+
330+
### Templates
331+
332+
| Method | Purpose |
333+
|--------|---------|
334+
| `listHermesSolverTemplates(solverName)` | List loaded templates for a solver |
335+
| `getHermesFlowTemplate(hermesFlowName)` | Get a flow template |
336+
| `listHermesNodesTemplates()` | List all node templates |
337+
| `getHermesNodeTemplate(hermesNodeName)` | Get a node template |
338+
339+
---
340+
341+
## Cross-references
342+
343+
| What | Where |
344+
|------|-------|
345+
| User guide | [Toolkits > Simulations > Hermes Workflows](../../toolkits/simulations/workflows.md) |
346+
| OpenFOAM toolkit (inherits from hermesWorkflowToolkit) | [Simulations > OpenFOAM](openfoam.md) |
347+
| LSM toolkit | [Simulations > LSM](lsm.md) |
348+
| Hermes library | `pyHermes/hermes/` (workflow, engines, taskwrapper) |
349+
| API reference | [API > Simulations](../api/simulations.md) |
350+
| Workflow examples | [Examples > Workflows](../../examples/workflows.md) |

0 commit comments

Comments
 (0)