Add new field inputs to the spec#444
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
4ead1d4 to
7d739b1
Compare
22c4a41 to
5585654
Compare
input_schema to the specinputs to the spec
5585654 to
81b3107
Compare
| * *id* (optional): graph identifier unique to a database of graphs (Default: "notspecified") | ||
| * *label* (optional): non-unique label to be used when identifying a graph for human consumption | ||
| * *schema_version* (optional): the schema version of this graph representation (Default: "1.0") | ||
| * *schema_version* (optional): the schema version of this graph representation (Default: "1.2") |
There was a problem hiding this comment.
Schema version was updated to 1.2
81b3107 to
a33bd7c
Compare
There was a problem hiding this comment.
The more I read the code, the more confusing it gets. I have no idea what the content of the graph attribute inputs is supposed to be. I see the demo example, I see the Pydantic model and they do not match.
Maybe we can start with what it needs to be in python at runtime. We are defining a list of parameters aliases. So I expect to see a list of dict's. Each alias has a name and one or more target (or alternatively the same alias name can be used more than once). A target is a node id and a parameter name.
I think the confusion comes from trying to mimic the CLI. But this is conceptually very different imo. For example
Input alias "IN" has a type and a description. It maps to parameter "in" from "node1", parameter "in" for "node2", parameter "in1" from "node3" and parameter "in2" from "node3". These are not typo's.
Output aliases would be only 1-to-1 mapping (unless 1-to-many would yield a dict but not sure what the keys would be, perhaps the node's but I don't see a use case)
Output alias "OUT" has a type and a description. It maps to output "result" of "node3".
We just need node id's here. The mapping is very explicit.
As I understand this PR is only the model. I suggest we try to actually implement it to understand what the best content of graph "inputs" is. My intuition tells me 1-to-many for inputs and 1-to-1 for outputs and only use node id's. But I would need to implement it it be sure.
If we want to introduce the concept of "required graph inputs" as you did in the demo example, I would first think about what this means, at what level this is checked and how this shows up in "ewoks show".
Talking about "ewoks show": we could think of
ewoks show: only show the workflow inputsewoks show --advanced: current table with in brackets the aliases if any
Most workflow won't have workflows inputs so not sure. I guess ewoks show and ewoks show --advanced would be the same in that case.
To be discussed in person.
Edit: In this context I use
- "input alias" as synonym for "workflow input parameter"
- "output alias" as synonym for "workflow output parameter"
| "properties": { | ||
| "task0a": { | ||
| "type": "number", | ||
| "__ewoks__": {"name": "a", "id": "task0"}, | ||
| }, | ||
| "task0b": { | ||
| "type": "number", | ||
| "default": 0, | ||
| "__ewoks__": {"name": "b", "id": 2}, | ||
| }, |
There was a problem hiding this comment.
Using the word "task" or "node" in the parameter name is confusing. I would propose something else. Maybe all-caps for workflow inputs and smallcaps for task inputs. Like global and local variable names in python. Just for the example I mean, this is not something enforced by ewoks.
What happens if I have parameter "B" and I want this to be parameter "b" of task1 and task2? Can I do that?
Also, "task0" does not have parameter "a" and "b" and there is no "id=2":
╒════════╤════════════════╤═══════════════════╤═══════╕
│ Name │ Value │ Task identifier │ Id │
╞════════╪════════════════╪═══════════════════╪═══════╡
│ list │ [0, 1, 2] │ SumList │ task0 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay │ 0 │ SumList │ task0 │
├────────┼────────────────┼───────────────────┼───────┤
│ b │ <MISSING_DATA> │ SumTask │ task1 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay │ 0 │ SumTask │ task1 │
├────────┼────────────────┼───────────────────┼───────┤
│ a │ 2 │ SumTask │ task2 │
├────────┼────────────────┼───────────────────┼───────┤
│ b │ <MISSING_DATA> │ SumTask │ task2 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay │ 0 │ SumTask │ task2 │
├────────┼────────────────┼───────────────────┼───────┤
│ b │ 3 │ SumTask │ task3 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay │ 0 │ SumTask │ task3 │
├────────┼────────────────┼───────────────────┼───────┤
│ b │ 4 │ SumTask │ task4 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay │ 0 │ SumTask │ task4 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay │ 0 │ SumTask │ task5 │
├────────┼────────────────┼───────────────────┼───────┤
│ b │ 6 │ SumTask │ task6 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay │ 0 │ SumTask │ task6 │
╘════════╧════════════════╧═══════════════════╧═══════╛
| "__ewoks__": {"name": "b", "id": 2}, | ||
| }, | ||
| }, | ||
| "required": ["task0a"], |
There was a problem hiding this comment.
Required in what sense? The workflow will fail if this parameter is not provided? What happens if I do not provide workflow inputs and I just provide node inputs?
There was a problem hiding this comment.
For the GUI it can be useful to know whether something is required or not.
| EwoksGraph(**graph_dict) | ||
|
|
||
|
|
||
| def test_input_schema(): |
There was a problem hiding this comment.
The test is only testing malformed inputs. Perhaps reflect this in the name of the test.
| except ValidationError: | ||
| raise |
There was a problem hiding this comment.
Capture and re-raise. What is the purpose?
| except SchemaError: | ||
| raise |
There was a problem hiding this comment.
Capture and re-raise. What is the purpose?
| requirements: Sequence[str] = [] | ||
| input_nodes: Sequence[EwoksNodeAlias] = [] | ||
| output_nodes: Sequence[EwoksNodeAlias] = [] | ||
| inputs: Dict[str, Any] = {} |
There was a problem hiding this comment.
What is this dict? It maps what to what? I would expect this to be a List[EwoksWorkflowInput].
Also when defining aliases, we'll need 1-to-many (1 parameter alias, many targets).
There was a problem hiding this comment.
Ok so inputs is a JSON Schema which is an instance of the JSON Schema meta-schema https://json-schema.org/draft/2020-12/schema which itself is a JSON Schema.
Draft 2020-12 meta-schema
(a JSON Schema)
↓
validates JSON Schemas
↓
which validate JSON instances
Examples
JSON schema for a single integer
meta_schema = {
"type": "object",
"properties": {
"type": {
"type": "string"
}
}
}schema = {
"type": "number"
}instance = 42JSON schema for a str->int dictionary with keys required value1 and optional value2
meta_schema = {
"type": "object",
"properties": {
"type": {
"type": "string"
}
}
}schema = {
"type": "object",
"properties": {
"value1": {
"type": "number"
},
"value2": {
"type": "number"
},
},
"required": ["value"]
}instance = {
"value1": 42,
"value2": 99, # optional
}Ewoks graph attribute inputs
So this is a schema, not meta_schema or instance. But it cannot be just any schema because ultimately we want to define some kind of parameter mapping on the one hand but then the user provides parameter values on the other hand.
There was a problem hiding this comment.
Suggestion after discussion: https://confluence.esrf.fr/display/AAWWK/Workflow+inputs+and+outputs#Workflowinputsandoutputs-LLMsuggestions
|
A json schema would look like this {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/product.schema.json",
"title": "Product",
"description": "A product from Acme's catalog",
"type": "object",
"properties": {
"productId": {
"description": "The unique identifier for a product",
"type": "integer"
},
"productName": {
"description": "Name of the product",
"type": "string"
},
"price": {
"description": "The price of the product",
"type": "number",
"exclusiveMinimum": 0
}
},
"required": [ "productId", "productName", "price" ]
}But we only have "properties". Is that still a valid JSON schema? Edit: it seems so. Providing a schema (which itself has a schema) in the instance of another schema (ewoks graph). My brain hurts. No human can be expected to write this by hand. Maybe if the spec in the docs clearly provide a simple example but more complex stuff like anyOf is not for humans. It should be generated from the json schema of the |
|
For reference, execution inputs/outputs (parameters of Execution inputs/outputs (execute_graph arguments)inputs = [
{"all": True, "name":"in1", "value":1}, # target all nodes
{"name":"in2", "value":2}, # target all start nodes
{"label": "My Label", "name":"in3", "value":3}, # target all nodes with a specific label
{"task_identifier": "MyTask", "name":"in4", "value":4}, # target all nodes with a specific task identifier
{"id": "mynode", "name":"in5", "value":5}, # target a node with a specific id
]
outputs = [
{"all": True, "name":"out1"}, # target all nodes
{"name":"out2"}, # target all end nodes
{"label": "My Label", "name":"out3"}, # target all nodes with a specific label
{"task_identifier": "MyTask", "name":"out4"}, # target all nodes with a specific task identifier
{"id": "mynode", "name":"in5"}, # target a node with a specific id
]Node aliases (graph attributes)input_nodes = [
{"id": "in1", "node": "id1"},
{"id": "in2", "node": "id2", "sub_node": "start"}
]
output_nodes = [
{"id": "out1", "node": "id3"},
{"id": "out2", "node": "id2", "sub_node": "end"}
]The "id" is the alias, the "node" is the target node id in the current graph, "sub_node" is the target node id in the sub-graph. Note that this is all id's, not labels, task identifiers, all=True or all=False. Workflow inputs/outputs (graph attributes)I was thinking about this in the same way as node aliases. You were thinking about this in the same way as execution inputs/outputs. Both are valid ideas so to be discussed. The choice would mostly depend on how workflow inputs/outputs are defined by a human. Secondary but not unimportant criterion is the implementation. |
|
Here is an example (LLM generated stuff) if we go for execute_graph arguments instead of parameter aliases workflow_schema = {
"type": "object",
"properties": {
# Fan-out to all nodes and one specific node
"GAIN": {
"type": "number",
"title": "Gain",
"x-ewoks-targets": [
{"name": "in1", "all": True},
{"name": "in7", "id": "node3"},
],
},
# Default/start-node targeting
"OFFSET": {
"type": "number",
"x-ewoks-targets": [
{"name": "in2"}
],
},
# Label-based targeting
"THRESHOLD": {
"type": "integer",
"x-ewoks-targets": [
{"name": "threshold", "label": "Detector"}
],
},
# Task identifier targeting
"EXPOSURE": {
"type": "number",
"x-ewoks-targets": [
{"name": "exposure_time", "task_identifier": "AcquireImage"}
],
},
# Multiple specific nodes, different input names
"FILENAME": {
"type": "string",
"x-ewoks-targets": [
{"name": "filename", "id": "reader1"},
{"name": "output_file", "id": "writer1"}
],
},
# Same workflow input sent to several kinds of targets
"DEBUG": {
"type": "boolean",
"default": False,
"x-ewoks-targets": [
{"name": "debug", "all": True,},
{"name": "verbose", "label": "Special"},
{"name": "enabled", "task_identifier": "Logger"},
],
},
},
"required": ["GAIN", "FILENAME"],
}User provides these workflow inputs when executing the workflow values = {
"GAIN": 2.5,
"OFFSET": 1.0,
"THRESHOLD": 100,
"EXPOSURE": 0.05,
"FILENAME": "data.h5",
"DEBUG": True,
}Ewoks generates inputs = [
{"all": True, "name": "in1", "value": 2.5},
{"id": "node3", "name": "in7", "value": 2.5},
{"name": "in2", "value": 1.0},
{"label": "Detector", "name": "threshold", "value": 100},
{"task_identifier": "AcquireImage", "name": "exposure_time", "value": 0.05},
{"id": "reader1", "name": "filename", "value": "data.h5"},
{"id": "writer1", "name": "output_file", "value": "data.h5"},
{"all": True, "name": "debug", "value": True},
{"label": "Special", "name": "verbose", "value": True},
{"task_identifier": "Logger", "name": "enabled", "value": True},
]And then the Pydantic model instead of inputs: Dict[str, Any] = {}inputs: WorkflowInputSchemafrom typing import Any, Literal
from pydantic import BaseModel, Field
class EwoksParameterTarget(BaseModel):
name: str
all: bool | None = None
id: str | None = None
label: str | None = None
task_identifier: str | None = None
class WorkflowInput(BaseModel):
type: Literal[
"string",
"number",
"integer",
"boolean",
"object",
"array",
]
title: str | None = None
default: Any = None
x_ewoks_targets: list[EwoksParameterTarget] = Field(
alias="x-ewoks-targets",
default_factory=list,
)
model_config = {
"populate_by_name": True,
}
class WorkflowInputSchema(BaseModel):
type: Literal["object"] = "object"
properties: dict[str, WorkflowInput]
required: list[str] = Field(default_factory=list)Something similar would need to be done for outputs. I would do both at the same time. This model might be too restrictive. We should probably allow anyof etc. Nevertheless it cannot be just any json schema. So maybe just this class WorkflowInputSchema(BaseModel):
type: Literal["object"] = "object"
properties: dict[str, Any]
required: list[str] = Field(default_factory=list) |
Add new field
inputsingraph. This new field is a JSON schema defining the inputs of the workflow, each input being an entry of thepropertiesfield. See https://json-schema.org/learn/getting-started-step-by-step#add-the-properties-object or the updateddemoworkflow for an example.The key of a
propertiesentry can be anything. We just need to be able to define inputs with distinguishable keys.But, since we need more info to address the inputs to execute, I also added a
__ewoks__field to the properties entries. This field will be ignored when validating the JSON schema.The
__ewoks__field should define a workflow input similarly to the CLI. The input name is mandatory since we need it to pass it to the workflow. Then, the input should be addressable by label, by task identifier, by node id or should be passed to all nodes. A specific Pydantic model was added to check this.I did not handle the
inputswhen executing for now since we first have to discuss the syntax and model itself.