Add new field `inputs` to the spec by loichuder · Pull Request #444 · ewoks-kit/ewokscore

loichuder · 2026-04-28T13:14:42Z

Add new field inputs in graph. This new field is a JSON schema defining the inputs of the workflow, each input being an entry of the properties field. See https://json-schema.org/learn/getting-started-step-by-step#add-the-properties-object or the updated demo workflow for an example.

The key of a properties entry can be anything. We just need to be able to define inputs with distinguishable keys.

But, since we need more info to address the inputs to execute, I also added a __ewoks__ field to the properties entries. This field will be ignored when validating the JSON schema.

The __ewoks__ field should define a workflow input similarly to the CLI. The input name is mandatory since we need it to pass it to the workflow. Then, the input should be addressable by label, by task identifier, by node id or should be passed to all nodes. A specific Pydantic model was added to check this.

I did not handle the inputs when executing for now since we first have to discuss the syntax and model itself.

codecov · 2026-04-28T13:17:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

loichuder · 2026-06-05T15:16:06Z

 * *id* (optional): graph identifier unique to a database of graphs (Default: "notspecified")
 * *label* (optional): non-unique label to be used when identifying a graph for human consumption
-* *schema_version* (optional): the schema version of this graph representation (Default: "1.0")
+* *schema_version* (optional): the schema version of this graph representation (Default: "1.2")


Schema version was updated to 1.2

FYI There is a ticket about this: #456

woutdenolf

The more I read the code, the more confusing it gets. I have no idea what the content of the graph attribute inputs is supposed to be. I see the demo example, I see the Pydantic model and they do not match.

Maybe we can start with what it needs to be in python at runtime. We are defining a list of parameters aliases. So I expect to see a list of dict's. Each alias has a name and one or more target (or alternatively the same alias name can be used more than once). A target is a node id and a parameter name.

I think the confusion comes from trying to mimic the CLI. But this is conceptually very different imo. For example

Input alias "IN" has a type and a description. It maps to parameter "in" from "node1", parameter "in" for "node2", parameter "in1" from "node3" and parameter "in2" from "node3". These are not typo's.

Output aliases would be only 1-to-1 mapping (unless 1-to-many would yield a dict but not sure what the keys would be, perhaps the node's but I don't see a use case)

Output alias "OUT" has a type and a description. It maps to output "result" of "node3".

We just need node id's here. The mapping is very explicit.

As I understand this PR is only the model. I suggest we try to actually implement it to understand what the best content of graph "inputs" is. My intuition tells me 1-to-many for inputs and 1-to-1 for outputs and only use node id's. But I would need to implement it it be sure.

If we want to introduce the concept of "required graph inputs" as you did in the demo example, I would first think about what this means, at what level this is checked and how this shows up in "ewoks show".

Talking about "ewoks show": we could think of

ewoks show: only show the workflow inputs
ewoks show --advanced: current table with in brackets the aliases if any

Most workflow won't have workflows inputs so not sure. I guess ewoks show and ewoks show --advanced would be the same in that case.

To be discussed in person.

Edit: In this context I use

"input alias" as synonym for "workflow input parameter"
"output alias" as synonym for "workflow output parameter"

woutdenolf · 2026-06-08T17:52:29Z

+            "properties": {
+                "task0a": {
+                    "type": "number",
+                    "__ewoks__": {"name": "a", "id": "task0"},
+                },
+                "task0b": {
+                    "type": "number",
+                    "default": 0,
+                    "__ewoks__": {"name": "b", "id": 2},
+                },


Using the word "task" or "node" in the parameter name is confusing. I would propose something else. Maybe all-caps for workflow inputs and smallcaps for task inputs. Like global and local variable names in python. Just for the example I mean, this is not something enforced by ewoks.

What happens if I have parameter "B" and I want this to be parameter "b" of task1 and task2? Can I do that?

Also, "task0" does not have parameter "a" and "b" and there is no "id=2":

╒════════╤════════════════╤═══════════════════╤═══════╕ │ Name │ Value │ Task identifier │ Id │ ╞════════╪════════════════╪═══════════════════╪═══════╡ │ list │ [0, 1, 2] │ SumList │ task0 │ ├────────┼────────────────┼───────────────────┼───────┤ │ delay │ 0 │ SumList │ task0 │ ├────────┼────────────────┼───────────────────┼───────┤ │ b │ <MISSING_DATA> │ SumTask │ task1 │ ├────────┼────────────────┼───────────────────┼───────┤ │ delay │ 0 │ SumTask │ task1 │ ├────────┼────────────────┼───────────────────┼───────┤ │ a │ 2 │ SumTask │ task2 │ ├────────┼────────────────┼───────────────────┼───────┤ │ b │ <MISSING_DATA> │ SumTask │ task2 │ ├────────┼────────────────┼───────────────────┼───────┤ │ delay │ 0 │ SumTask │ task2 │ ├────────┼────────────────┼───────────────────┼───────┤ │ b │ 3 │ SumTask │ task3 │ ├────────┼────────────────┼───────────────────┼───────┤ │ delay │ 0 │ SumTask │ task3 │ ├────────┼────────────────┼───────────────────┼───────┤ │ b │ 4 │ SumTask │ task4 │ ├────────┼────────────────┼───────────────────┼───────┤ │ delay │ 0 │ SumTask │ task4 │ ├────────┼────────────────┼───────────────────┼───────┤ │ delay │ 0 │ SumTask │ task5 │ ├────────┼────────────────┼───────────────────┼───────┤ │ b │ 6 │ SumTask │ task6 │ ├────────┼────────────────┼───────────────────┼───────┤ │ delay │ 0 │ SumTask │ task6 │ ╘════════╧════════════════╧═══════════════════╧═══════╛

woutdenolf · 2026-06-08T17:53:58Z

+                    "__ewoks__": {"name": "b", "id": 2},
+                },
+            },
+            "required": ["task0a"],


Required in what sense? The workflow will fail if this parameter is not provided? What happens if I do not provide workflow inputs and I just provide node inputs?

For the GUI it can be useful to know whether something is required or not.

woutdenolf · 2026-06-08T17:55:46Z

        EwoksGraph(**graph_dict)
+
+
+def test_input_schema():


The test is only testing malformed inputs. Perhaps reflect this in the name of the test.

woutdenolf · 2026-06-08T17:56:54Z

+            except ValidationError:
+                raise


Capture and re-raise. What is the purpose?

woutdenolf · 2026-06-08T17:57:56Z

+        except SchemaError:
+            raise


Capture and re-raise. What is the purpose?

woutdenolf · 2026-06-08T18:00:57Z

    requirements: Sequence[str] = []
    input_nodes: Sequence[EwoksNodeAlias] = []
    output_nodes: Sequence[EwoksNodeAlias] = []
+    inputs: Dict[str, Any] = {}


What is this dict? It maps what to what? I would expect this to be a List[EwoksWorkflowInput].

Also when defining aliases, we'll need 1-to-many (1 parameter alias, many targets).

Ok so inputs is a JSON Schema which is an instance of the JSON Schema meta-schema https://json-schema.org/draft/2020-12/schema which itself is a JSON Schema.

Draft 2020-12 meta-schema
(a JSON Schema)
↓
validates JSON Schemas
↓
which validate JSON instances

Examples

JSON schema for a single integer

meta_schema = { "type": "object", "properties": { "type": { "type": "string" } } }

schema = { "type": "number" }

instance = 42

JSON schema for a str->int dictionary with keys required value1 and optional value2

meta_schema = { "type": "object", "properties": { "type": { "type": "string" } } }

schema = { "type": "object", "properties": { "value1": { "type": "number" }, "value2": { "type": "number" }, }, "required": ["value"] }

instance = { "value1": 42, "value2": 99, # optional }

Ewoks graph attribute inputs

So this is a schema, not meta_schema or instance. But it cannot be just any schema because ultimately we want to define some kind of parameter mapping on the one hand but then the user provides parameter values on the other hand.

Suggestion after discussion: https://confluence.esrf.fr/display/AAWWK/Workflow+inputs+and+outputs#Workflowinputsandoutputs-LLMsuggestions

woutdenolf · 2026-06-08T18:35:13Z

A json schema would look like this

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/product.schema.json",
  "title": "Product",
  "description": "A product from Acme's catalog",
  "type": "object",
  "properties": {
    "productId": {
      "description": "The unique identifier for a product",
      "type": "integer"
    },
    "productName": {
      "description": "Name of the product",
      "type": "string"
    },
    "price": {
      "description": "The price of the product",
      "type": "number",
      "exclusiveMinimum": 0
    }
  },
  "required": [ "productId", "productName", "price" ]
}

But we only have "properties". Is that still a valid JSON schema?

Edit: it seems so. Providing a schema (which itself has a schema) in the instance of another schema (ewoks graph). My brain hurts. No human can be expected to write this by hand. Maybe if the spec in the docs clearly provide a simple example

"graph": {
  "inputs": {
  "properties": {
    "A": {
      "description": "This is a description of workflow parameter A",
      "type": "integer"
    },
    "B": {
      "description": "This is a description of workflow parameter B",
      "type": "integer"
    },

but more complex stuff like anyOf is not for humans. It should be generated from the json schema of the BaseInputModel of the tasks.

woutdenolf · 2026-06-08T22:57:08Z

For reference, execution inputs/outputs (parameters of execute_graph) vs. input_nodes/output_nodes (graph attributes referred to as node aliases)

Execution inputs/outputs (execute_graph arguments)

inputs = [
    {"all": True, "name":"in1", "value":1},  # target all nodes
    {"name":"in2", "value":2}, # target all start nodes
    {"label": "My Label", "name":"in3", "value":3}, # target all nodes with a specific label
    {"task_identifier": "MyTask", "name":"in4", "value":4}, # target all nodes with a specific task identifier
    {"id": "mynode", "name":"in5", "value":5}, # target a node with a specific id
]

outputs = [
    {"all": True, "name":"out1"},  # target all nodes
    {"name":"out2"}, # target all end nodes
    {"label": "My Label", "name":"out3"}, # target all nodes with a specific label
    {"task_identifier": "MyTask", "name":"out4"}, # target all nodes with a specific task identifier
    {"id": "mynode", "name":"in5"}, # target a node with a specific id
]

Node aliases (graph attributes)

input_nodes = [
    {"id": "in1", "node": "id1"},
    {"id": "in2", "node": "id2", "sub_node": "start"}
]

output_nodes = [
    {"id": "out1", "node": "id3"},
    {"id": "out2", "node": "id2", "sub_node": "end"}
]

The "id" is the alias, the "node" is the target node id in the current graph, "sub_node" is the target node id in the sub-graph. Note that this is all id's, not labels, task identifiers, all=True or all=False.

Workflow inputs/outputs (graph attributes)

I was thinking about this in the same way as node aliases. You were thinking about this in the same way as execution inputs/outputs.

Both are valid ideas so to be discussed. The choice would mostly depend on how workflow inputs/outputs are defined by a human. Secondary but not unimportant criterion is the implementation.

woutdenolf · 2026-06-09T00:01:43Z

Here is an example (LLM generated stuff) if we go for execute_graph arguments instead of parameter aliases

workflow_schema = {
    "type": "object",
    "properties": {
        # Fan-out to all nodes and one specific node
        "GAIN": {
            "type": "number",
            "title": "Gain",
            "x-ewoks-targets": [
                {"name": "in1", "all": True},
                {"name": "in7", "id": "node3"},
            ],
        },

        # Default/start-node targeting
        "OFFSET": {
            "type": "number",
            "x-ewoks-targets": [
                {"name": "in2"}
            ],
        },

        # Label-based targeting
        "THRESHOLD": {
            "type": "integer",
            "x-ewoks-targets": [
                {"name": "threshold", "label": "Detector"}
            ],
        },

        # Task identifier targeting
        "EXPOSURE": {
            "type": "number",
            "x-ewoks-targets": [
                {"name": "exposure_time", "task_identifier": "AcquireImage"}
            ],
        },

        # Multiple specific nodes, different input names
        "FILENAME": {
            "type": "string",
            "x-ewoks-targets": [
                {"name": "filename", "id": "reader1"},
                {"name": "output_file", "id": "writer1"}
            ],
        },

        # Same workflow input sent to several kinds of targets
        "DEBUG": {
            "type": "boolean",
            "default": False,
            "x-ewoks-targets": [
                {"name": "debug", "all": True,},
                {"name": "verbose", "label": "Special"},
                {"name": "enabled", "task_identifier": "Logger"},
            ],
        },
    },
    "required": ["GAIN", "FILENAME"],
}

User provides these workflow inputs when executing the workflow

values = {
    "GAIN": 2.5,
    "OFFSET": 1.0,
    "THRESHOLD": 100,
    "EXPOSURE": 0.05,
    "FILENAME": "data.h5",
    "DEBUG": True,
}

Ewoks generates

inputs = [
    {"all": True, "name": "in1", "value": 2.5},
    {"id": "node3", "name": "in7", "value": 2.5},

    {"name": "in2", "value": 1.0},

    {"label": "Detector", "name": "threshold", "value": 100},

    {"task_identifier": "AcquireImage", "name": "exposure_time", "value": 0.05},

    {"id": "reader1", "name": "filename", "value": "data.h5"},

    {"id": "writer1", "name": "output_file", "value": "data.h5"},

    {"all": True, "name": "debug", "value": True},

    {"label": "Special", "name": "verbose", "value": True},

    {"task_identifier": "Logger", "name": "enabled", "value": True},
]

And then the Pydantic model instead of

inputs: Dict[str, Any] = {}

inputs: WorkflowInputSchema

from typing import Any, Literal

from pydantic import BaseModel, Field

class EwoksParameterTarget(BaseModel):
    name: str
    all: bool | None = None
    id: str | None = None
    label: str | None = None
    task_identifier: str | None = None

class WorkflowInput(BaseModel):
    type: Literal[
        "string",
        "number",
        "integer",
        "boolean",
        "object",
        "array",
    ]

    title: str | None = None
    default: Any = None

    x_ewoks_targets: list[EwoksParameterTarget] = Field(
        alias="x-ewoks-targets",
        default_factory=list,
    )

    model_config = {
        "populate_by_name": True,
    }


class WorkflowInputSchema(BaseModel):
    type: Literal["object"] = "object"

    properties: dict[str, WorkflowInput]

    required: list[str] = Field(default_factory=list)

Something similar would need to be done for outputs. I would do both at the same time.

This model might be too restrictive. We should probably allow anyof etc. Nevertheless it cannot be just any json schema. So maybe just this

class WorkflowInputSchema(BaseModel):
    type: Literal["object"] = "object"

    properties: dict[str, Any]

    required: list[str] = Field(default_factory=list)

loichuder force-pushed the add-input-schema branch from 4ead1d4 to 7d739b1 Compare May 29, 2026 08:09

loichuder added this to Ewoks dev week - June 2026 May 29, 2026

github-project-automation Bot moved this to Backlog in Ewoks dev week - June 2026 May 29, 2026

loichuder moved this from Backlog to In progress in Ewoks dev week - June 2026 May 29, 2026

woutdenolf assigned loichuder Jun 4, 2026

loichuder force-pushed the add-input-schema branch 2 times, most recently from 22c4a41 to 5585654 Compare June 5, 2026 14:32

loichuder changed the title ~~Add new field input_schema to the spec~~ Add new field inputs to the spec Jun 5, 2026

loichuder force-pushed the add-input-schema branch from 5585654 to 81b3107 Compare June 5, 2026 15:04

loichuder commented Jun 5, 2026

View reviewed changes

loichuder marked this pull request as ready for review June 5, 2026 15:17

loichuder requested a review from woutdenolf June 5, 2026 15:17

woutdenolf reviewed Jun 5, 2026

View reviewed changes

Comment thread src/ewokscore/graph/schema/model.py

loichuder added 2 commits June 8, 2026 09:30

Add new field inputs to the spec

564f29e

Change EWOKS_KEY key and add serialization test

a33bd7c

loichuder force-pushed the add-input-schema branch from 81b3107 to a33bd7c Compare June 8, 2026 08:43

woutdenolf requested changes Jun 8, 2026

View reviewed changes

Uh oh!

Conversation

loichuder commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

woutdenolf left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

woutdenolf Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

woutdenolf Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

woutdenolf Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Examples

JSON schema for a single integer

JSON schema for a str->int dictionary with keys required value1 and optional value2

Ewoks graph attribute inputs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

woutdenolf commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

woutdenolf commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Execution inputs/outputs (execute_graph arguments)

Node aliases (graph attributes)

Workflow inputs/outputs (graph attributes)

Uh oh!

woutdenolf commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

loichuder commented Apr 28, 2026 •

edited

Loading

codecov Bot commented Apr 28, 2026 •

edited

Loading

woutdenolf left a comment •

edited

Loading

woutdenolf Jun 8, 2026 •

edited

Loading

woutdenolf Jun 8, 2026 •

edited

Loading

woutdenolf Jun 8, 2026 •

edited

Loading

Ewoks graph attribute `inputs`

woutdenolf commented Jun 8, 2026 •

edited

Loading

woutdenolf commented Jun 8, 2026 •

edited

Loading

woutdenolf commented Jun 9, 2026 •

edited

Loading