Skip to content

Add new field inputs to the spec#444

Open
loichuder wants to merge 2 commits into
mainfrom
add-input-schema
Open

Add new field inputs to the spec#444
loichuder wants to merge 2 commits into
mainfrom
add-input-schema

Conversation

@loichuder

@loichuder loichuder commented Apr 28, 2026

Copy link
Copy Markdown
Member

Add new field inputs in graph. This new field is a JSON schema defining the inputs of the workflow, each input being an entry of the properties field. See https://json-schema.org/learn/getting-started-step-by-step#add-the-properties-object or the updated demo workflow for an example.

The key of a properties entry can be anything. We just need to be able to define inputs with distinguishable keys.

But, since we need more info to address the inputs to execute, I also added a __ewoks__ field to the properties entries. This field will be ignored when validating the JSON schema.

The __ewoks__ field should define a workflow input similarly to the CLI. The input name is mandatory since we need it to pass it to the workflow. Then, the input should be addressable by label, by task identifier, by node id or should be passed to all nodes. A specific Pydantic model was added to check this.

I did not handle the inputs when executing for now since we first have to discuss the syntax and model itself.

@codecov

codecov Bot commented Apr 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@loichuder loichuder moved this from Backlog to In progress in Ewoks dev week - June 2026 May 29, 2026
@loichuder loichuder force-pushed the add-input-schema branch 2 times, most recently from 22c4a41 to 5585654 Compare June 5, 2026 14:32
@loichuder loichuder changed the title Add new field input_schema to the spec Add new field inputs to the spec Jun 5, 2026
Comment thread doc/reference/specs.rst
* *id* (optional): graph identifier unique to a database of graphs (Default: "notspecified")
* *label* (optional): non-unique label to be used when identifying a graph for human consumption
* *schema_version* (optional): the schema version of this graph representation (Default: "1.0")
* *schema_version* (optional): the schema version of this graph representation (Default: "1.2")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema version was updated to 1.2

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI There is a ticket about this: #456

Comment thread src/ewokscore/graph/schema/model.py
@loichuder loichuder marked this pull request as ready for review June 5, 2026 15:17
@loichuder loichuder requested a review from woutdenolf June 5, 2026 15:17
Comment thread src/ewokscore/graph/schema/model.py

@woutdenolf woutdenolf left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I read the code, the more confusing it gets. I have no idea what the content of the graph attribute inputs is supposed to be. I see the demo example, I see the Pydantic model and they do not match.

Maybe we can start with what it needs to be in python at runtime. We are defining a list of parameters aliases. So I expect to see a list of dict's. Each alias has a name and one or more target (or alternatively the same alias name can be used more than once). A target is a node id and a parameter name.

I think the confusion comes from trying to mimic the CLI. But this is conceptually very different imo. For example

Input alias "IN" has a type and a description. It maps to parameter "in" from "node1", parameter "in" for "node2", parameter "in1" from "node3" and parameter "in2" from "node3". These are not typo's.

Output aliases would be only 1-to-1 mapping (unless 1-to-many would yield a dict but not sure what the keys would be, perhaps the node's but I don't see a use case)

Output alias "OUT" has a type and a description. It maps to output "result" of "node3".

We just need node id's here. The mapping is very explicit.

As I understand this PR is only the model. I suggest we try to actually implement it to understand what the best content of graph "inputs" is. My intuition tells me 1-to-many for inputs and 1-to-1 for outputs and only use node id's. But I would need to implement it it be sure.

If we want to introduce the concept of "required graph inputs" as you did in the demo example, I would first think about what this means, at what level this is checked and how this shows up in "ewoks show".

Talking about "ewoks show": we could think of

  • ewoks show: only show the workflow inputs
  • ewoks show --advanced: current table with in brackets the aliases if any

Most workflow won't have workflows inputs so not sure. I guess ewoks show and ewoks show --advanced would be the same in that case.

To be discussed in person.

Edit: In this context I use

  • "input alias" as synonym for "workflow input parameter"
  • "output alias" as synonym for "workflow output parameter"

Comment on lines +11 to +20
"properties": {
"task0a": {
"type": "number",
"__ewoks__": {"name": "a", "id": "task0"},
},
"task0b": {
"type": "number",
"default": 0,
"__ewoks__": {"name": "b", "id": 2},
},

@woutdenolf woutdenolf Jun 8, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the word "task" or "node" in the parameter name is confusing. I would propose something else. Maybe all-caps for workflow inputs and smallcaps for task inputs. Like global and local variable names in python. Just for the example I mean, this is not something enforced by ewoks.

What happens if I have parameter "B" and I want this to be parameter "b" of task1 and task2? Can I do that?

Also, "task0" does not have parameter "a" and "b" and there is no "id=2":

╒════════╤════════════════╤═══════════════════╤═══════╕
│ Name   │ Value          │ Task identifier   │ Id    │
╞════════╪════════════════╪═══════════════════╪═══════╡
│ list   │ [0, 1, 2]      │ SumList           │ task0 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay  │ 0              │ SumList           │ task0 │
├────────┼────────────────┼───────────────────┼───────┤
│ b      │ <MISSING_DATA> │ SumTask           │ task1 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay  │ 0              │ SumTask           │ task1 │
├────────┼────────────────┼───────────────────┼───────┤
│ a      │ 2              │ SumTask           │ task2 │
├────────┼────────────────┼───────────────────┼───────┤
│ b      │ <MISSING_DATA> │ SumTask           │ task2 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay  │ 0              │ SumTask           │ task2 │
├────────┼────────────────┼───────────────────┼───────┤
│ b      │ 3              │ SumTask           │ task3 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay  │ 0              │ SumTask           │ task3 │
├────────┼────────────────┼───────────────────┼───────┤
│ b      │ 4              │ SumTask           │ task4 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay  │ 0              │ SumTask           │ task4 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay  │ 0              │ SumTask           │ task5 │
├────────┼────────────────┼───────────────────┼───────┤
│ b      │ 6              │ SumTask           │ task6 │
├────────┼────────────────┼───────────────────┼───────┤
│ delay  │ 0              │ SumTask           │ task6 │
╘════════╧════════════════╧═══════════════════╧═══════╛

"__ewoks__": {"name": "b", "id": 2},
},
},
"required": ["task0a"],

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required in what sense? The workflow will fail if this parameter is not provided? What happens if I do not provide workflow inputs and I just provide node inputs?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the GUI it can be useful to know whether something is required or not.

EwoksGraph(**graph_dict)


def test_input_schema():

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is only testing malformed inputs. Perhaps reflect this in the name of the test.

Comment on lines +153 to +154
except ValidationError:
raise

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capture and re-raise. What is the purpose?

Comment on lines +146 to +147
except SchemaError:
raise

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capture and re-raise. What is the purpose?

Comment thread src/ewokscore/graph/schema/model.py
requirements: Sequence[str] = []
input_nodes: Sequence[EwoksNodeAlias] = []
output_nodes: Sequence[EwoksNodeAlias] = []
inputs: Dict[str, Any] = {}

@woutdenolf woutdenolf Jun 8, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this dict? It maps what to what? I would expect this to be a List[EwoksWorkflowInput].

Also when defining aliases, we'll need 1-to-many (1 parameter alias, many targets).

@woutdenolf woutdenolf Jun 8, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so inputs is a JSON Schema which is an instance of the JSON Schema meta-schema https://json-schema.org/draft/2020-12/schema which itself is a JSON Schema.

Draft 2020-12 meta-schema
(a JSON Schema)

validates JSON Schemas

which validate JSON instances

Examples

JSON schema for a single integer

meta_schema = {
    "type": "object",
    "properties": {
        "type": {
            "type": "string"
        }
    }
}
schema = {
    "type": "number"
}
instance = 42

JSON schema for a str->int dictionary with keys required value1 and optional value2

meta_schema = {
    "type": "object",
    "properties": {
        "type": {
            "type": "string"
        }
    }
}
schema = {
    "type": "object",
    "properties": {
        "value1": {
            "type": "number"
        },
        "value2": {
            "type": "number"
        },
    },
    "required": ["value"]
}
instance = {
    "value1": 42,
    "value2": 99,  # optional
}

Ewoks graph attribute inputs

So this is a schema, not meta_schema or instance. But it cannot be just any schema because ultimately we want to define some kind of parameter mapping on the one hand but then the user provides parameter values on the other hand.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@woutdenolf

woutdenolf commented Jun 8, 2026

Copy link
Copy Markdown
Member

A json schema would look like this

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/product.schema.json",
  "title": "Product",
  "description": "A product from Acme's catalog",
  "type": "object",
  "properties": {
    "productId": {
      "description": "The unique identifier for a product",
      "type": "integer"
    },
    "productName": {
      "description": "Name of the product",
      "type": "string"
    },
    "price": {
      "description": "The price of the product",
      "type": "number",
      "exclusiveMinimum": 0
    }
  },
  "required": [ "productId", "productName", "price" ]
}

But we only have "properties". Is that still a valid JSON schema?

Edit: it seems so. Providing a schema (which itself has a schema) in the instance of another schema (ewoks graph). My brain hurts. No human can be expected to write this by hand. Maybe if the spec in the docs clearly provide a simple example

"graph": {
  "inputs": {
  "properties": {
    "A": {
      "description": "This is a description of workflow parameter A",
      "type": "integer"
    },
    "B": {
      "description": "This is a description of workflow parameter B",
      "type": "integer"
    },

but more complex stuff like anyOf is not for humans. It should be generated from the json schema of the BaseInputModel of the tasks.

@woutdenolf

woutdenolf commented Jun 8, 2026

Copy link
Copy Markdown
Member

For reference, execution inputs/outputs (parameters of execute_graph) vs. input_nodes/output_nodes (graph attributes referred to as node aliases)

Execution inputs/outputs (execute_graph arguments)

inputs = [
    {"all": True, "name":"in1", "value":1},  # target all nodes
    {"name":"in2", "value":2}, # target all start nodes
    {"label": "My Label", "name":"in3", "value":3}, # target all nodes with a specific label
    {"task_identifier": "MyTask", "name":"in4", "value":4}, # target all nodes with a specific task identifier
    {"id": "mynode", "name":"in5", "value":5}, # target a node with a specific id
]

outputs = [
    {"all": True, "name":"out1"},  # target all nodes
    {"name":"out2"}, # target all end nodes
    {"label": "My Label", "name":"out3"}, # target all nodes with a specific label
    {"task_identifier": "MyTask", "name":"out4"}, # target all nodes with a specific task identifier
    {"id": "mynode", "name":"in5"}, # target a node with a specific id
]

Node aliases (graph attributes)

input_nodes = [
    {"id": "in1", "node": "id1"},
    {"id": "in2", "node": "id2", "sub_node": "start"}
]

output_nodes = [
    {"id": "out1", "node": "id3"},
    {"id": "out2", "node": "id2", "sub_node": "end"}
]

The "id" is the alias, the "node" is the target node id in the current graph, "sub_node" is the target node id in the sub-graph. Note that this is all id's, not labels, task identifiers, all=True or all=False.

Workflow inputs/outputs (graph attributes)

I was thinking about this in the same way as node aliases. You were thinking about this in the same way as execution inputs/outputs.

Both are valid ideas so to be discussed. The choice would mostly depend on how workflow inputs/outputs are defined by a human. Secondary but not unimportant criterion is the implementation.

@woutdenolf

woutdenolf commented Jun 9, 2026

Copy link
Copy Markdown
Member

Here is an example (LLM generated stuff) if we go for execute_graph arguments instead of parameter aliases

workflow_schema = {
    "type": "object",
    "properties": {
        # Fan-out to all nodes and one specific node
        "GAIN": {
            "type": "number",
            "title": "Gain",
            "x-ewoks-targets": [
                {"name": "in1", "all": True},
                {"name": "in7", "id": "node3"},
            ],
        },

        # Default/start-node targeting
        "OFFSET": {
            "type": "number",
            "x-ewoks-targets": [
                {"name": "in2"}
            ],
        },

        # Label-based targeting
        "THRESHOLD": {
            "type": "integer",
            "x-ewoks-targets": [
                {"name": "threshold", "label": "Detector"}
            ],
        },

        # Task identifier targeting
        "EXPOSURE": {
            "type": "number",
            "x-ewoks-targets": [
                {"name": "exposure_time", "task_identifier": "AcquireImage"}
            ],
        },

        # Multiple specific nodes, different input names
        "FILENAME": {
            "type": "string",
            "x-ewoks-targets": [
                {"name": "filename", "id": "reader1"},
                {"name": "output_file", "id": "writer1"}
            ],
        },

        # Same workflow input sent to several kinds of targets
        "DEBUG": {
            "type": "boolean",
            "default": False,
            "x-ewoks-targets": [
                {"name": "debug", "all": True,},
                {"name": "verbose", "label": "Special"},
                {"name": "enabled", "task_identifier": "Logger"},
            ],
        },
    },
    "required": ["GAIN", "FILENAME"],
}

User provides these workflow inputs when executing the workflow

values = {
    "GAIN": 2.5,
    "OFFSET": 1.0,
    "THRESHOLD": 100,
    "EXPOSURE": 0.05,
    "FILENAME": "data.h5",
    "DEBUG": True,
}

Ewoks generates

inputs = [
    {"all": True, "name": "in1", "value": 2.5},
    {"id": "node3", "name": "in7", "value": 2.5},

    {"name": "in2", "value": 1.0},

    {"label": "Detector", "name": "threshold", "value": 100},

    {"task_identifier": "AcquireImage", "name": "exposure_time", "value": 0.05},

    {"id": "reader1", "name": "filename", "value": "data.h5"},

    {"id": "writer1", "name": "output_file", "value": "data.h5"},

    {"all": True, "name": "debug", "value": True},

    {"label": "Special", "name": "verbose", "value": True},

    {"task_identifier": "Logger", "name": "enabled", "value": True},
]

And then the Pydantic model instead of

inputs: Dict[str, Any] = {}
inputs: WorkflowInputSchema
from typing import Any, Literal

from pydantic import BaseModel, Field

class EwoksParameterTarget(BaseModel):
    name: str
    all: bool | None = None
    id: str | None = None
    label: str | None = None
    task_identifier: str | None = None

class WorkflowInput(BaseModel):
    type: Literal[
        "string",
        "number",
        "integer",
        "boolean",
        "object",
        "array",
    ]

    title: str | None = None
    default: Any = None

    x_ewoks_targets: list[EwoksParameterTarget] = Field(
        alias="x-ewoks-targets",
        default_factory=list,
    )

    model_config = {
        "populate_by_name": True,
    }


class WorkflowInputSchema(BaseModel):
    type: Literal["object"] = "object"

    properties: dict[str, WorkflowInput]

    required: list[str] = Field(default_factory=list)

Something similar would need to be done for outputs. I would do both at the same time.

This model might be too restrictive. We should probably allow anyof etc. Nevertheless it cannot be just any json schema. So maybe just this

class WorkflowInputSchema(BaseModel):
    type: Literal["object"] = "object"

    properties: dict[str, Any]

    required: list[str] = Field(default_factory=list)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants