In TROs created so far, TRPs reference arrangements directly:
{
"@id": "trp/0",
"@type": "trov:TrustedResearchPerformance",
"trov:accessedArrangement": [
{ "@id": "arrangement/0" },
{ "@id": "arrangement/1" }
],
"trov:contributedToArrangement": { "@id": "arrangement/2" }
}
We want to optionally specify where an arrangement was accessible to the TRP during execution via an additional term (analogous to but more general than the mount point of a Unix file system). To support this, we're proposing an intermediate object (trov:ArrangementBinding) that binds an arrangement to a TRP at a specific path. This follows the same pattern as trov:ArtifactLocation, which binds an artifact to a path within an arrangement.
{
"@id": "trp/0",
"@type": "trov:TrustedResearchPerformance",
"trov:accessedArrangement": [
{
"@id": "trp/0/binding/0",
"@type": "trov:ArrangementBinding",
"trov:arrangement": { "@id": "arrangement/0" },
"trov:boundTo": "/data"
},
{
"@id": "trp/0/binding/1",
"@type": "trov:ArrangementBinding",
"trov:arrangement": { "@id": "arrangement/1" },
"trov:boundTo": "/workspace"
}
],
"trov:contributedToArrangement": {
"@id": "trp/0/binding/2",
"@type": "trov:ArrangementBinding",
"trov:arrangement": { "@id": "arrangement/2" },
"trov:boundTo": "/workspace"
}
}
There are two situations that come up that might mean trov:boundTo needs to be optional on a binding. First, if the paths associated with the artifacts in the arrangement are exactly those used by processes in the TRP to access the artifacts (the value of trov:boundTo is really empty at run time). Second, if the TRP accessed the artifacts via paths prefixed with a common path, but that path has been redacted in the TRO declaration (e.g. for confidentiality). In either case, the question is how to handle TRP-arrangement references that don't carry a trov:boundTo. Two options:
Option 1: Uniform — always use ArrangementBinding
Every TRP-arrangement reference uses an ArrangementBinding, with trov:boundTo included when informative and omitted when not.
"trov:accessedArrangement": [
{
"@id": "trp/0/binding/0",
"@type": "trov:ArrangementBinding",
"trov:arrangement": { "@id": "arrangement/0" },
"trov:boundTo": "/data"
},
{
"@id": "trp/0/binding/1",
"@type": "trov:ArrangementBinding",
"trov:arrangement": { "@id": "arrangement/1" }
}
],
"trov:contributedToArrangement": {
"@id": "trp/0/binding/2",
"@type": "trov:ArrangementBinding",
"trov:arrangement": { "@id": "arrangement/2" },
"trov:boundTo": "/workspace"
}
Query for accessed arrangements:
PREFIX trov: <https://w3id.org/trace/trov/0.1#>
SELECT ?trp ?arrangement ?boundTo
WHERE {
?trp trov:accessedArrangement ?binding .
?binding trov:arrangement ?arrangement .
OPTIONAL { ?binding trov:boundTo ?boundTo }
}
Option 2: Mixed — bare references when no mount path is needed
Use ArrangementBinding only when trov:boundTo is needed. Otherwise, reference the arrangement directly.
"trov:accessedArrangement": [
{
"@id": "trp/0/binding/0",
"@type": "trov:ArrangementBinding",
"trov:arrangement": { "@id": "arrangement/0" },
"trov:boundTo": "/data"
},
{ "@id": "arrangement/1" }
],
"trov:contributedToArrangement": {
"@id": "trp/0/binding/2",
"@type": "trov:ArrangementBinding",
"trov:arrangement": { "@id": "arrangement/2" },
"trov:boundTo": "/workspace"
}
The analogous query now requires handling both patterns:
PREFIX trov: <https://w3id.org/trace/trov/0.1#>
SELECT ?trp ?arrangement ?boundTo
WHERE {
?trp trov:accessedArrangement ?ref .
{
?ref trov:arrangement ?arrangement .
OPTIONAL { ?ref trov:boundTo ?boundTo }
}
UNION
{
FILTER NOT EXISTS { ?ref trov:arrangement ?any }
BIND(?ref AS ?arrangement)
}
}
Tradeoffs
Option 2 produces smaller JSON-LD when the boundTo property isn't needed. When none of the TRPs bind arrangements, it looks very simple; there are just the bare references we have today. The problem is that a declaration that mixes bindings and bare references is confusing to read due to its heterogeneity. Two different structures mean the same thing in the same array. This heterogeneity is reflected in the query complexity.
Option 1 might be considered more verbose than necessary when no boundTo values are employed in a particular TRO. But the structure has the advantage of always being the same (even if the boundTo properties are optional). Both producers and consumers use the same structure regardless of whether boundTo is present.
The JSON Schema is also simpler for Option 1, but the schema is written once by us, so that's a one-time cost either way. The declaration readability and SPARQL query complexity of Option 2 (including the cost of executing the more complex queries) are paid repeatedly, however.
Questions
-
Do we expect a TRS commonly to mix bound and unbound arrangements in the same TRP? If not, the poorer readability of option 2 may not be an issue.
-
How important is SPARQL query complexity for TRO consumers? If consumers query TROs through tooling that can abstract the pattern, the mixed-case query complexity is a one-time cost for authoring queries (although the performance costs remain).
-
Do we expect existing TROs (with bare arrangement references) to remain valid? Option 1 is a breaking change. Option 2 is backward-compatible. On the other hand, this is the time to make breaking changes.
In TROs created so far, TRPs reference arrangements directly:
{ "@id": "trp/0", "@type": "trov:TrustedResearchPerformance", "trov:accessedArrangement": [ { "@id": "arrangement/0" }, { "@id": "arrangement/1" } ], "trov:contributedToArrangement": { "@id": "arrangement/2" } }We want to optionally specify where an arrangement was accessible to the TRP during execution via an additional term (analogous to but more general than the mount point of a Unix file system). To support this, we're proposing an intermediate object (
trov:ArrangementBinding) that binds an arrangement to a TRP at a specific path. This follows the same pattern astrov:ArtifactLocation, which binds an artifact to a path within an arrangement.{ "@id": "trp/0", "@type": "trov:TrustedResearchPerformance", "trov:accessedArrangement": [ { "@id": "trp/0/binding/0", "@type": "trov:ArrangementBinding", "trov:arrangement": { "@id": "arrangement/0" }, "trov:boundTo": "/data" }, { "@id": "trp/0/binding/1", "@type": "trov:ArrangementBinding", "trov:arrangement": { "@id": "arrangement/1" }, "trov:boundTo": "/workspace" } ], "trov:contributedToArrangement": { "@id": "trp/0/binding/2", "@type": "trov:ArrangementBinding", "trov:arrangement": { "@id": "arrangement/2" }, "trov:boundTo": "/workspace" } }There are two situations that come up that might mean
trov:boundToneeds to be optional on a binding. First, if the paths associated with the artifacts in the arrangement are exactly those used by processes in the TRP to access the artifacts (the value oftrov:boundTois really empty at run time). Second, if the TRP accessed the artifacts via paths prefixed with a common path, but that path has been redacted in the TRO declaration (e.g. for confidentiality). In either case, the question is how to handle TRP-arrangement references that don't carry atrov:boundTo. Two options:Option 1: Uniform — always use ArrangementBinding
Every TRP-arrangement reference uses an
ArrangementBinding, withtrov:boundToincluded when informative and omitted when not.Query for accessed arrangements:
Option 2: Mixed — bare references when no mount path is needed
Use
ArrangementBindingonly whentrov:boundTois needed. Otherwise, reference the arrangement directly.The analogous query now requires handling both patterns:
Tradeoffs
Option 2 produces smaller JSON-LD when the
boundToproperty isn't needed. When none of the TRPs bind arrangements, it looks very simple; there are just the bare references we have today. The problem is that a declaration that mixes bindings and bare references is confusing to read due to its heterogeneity. Two different structures mean the same thing in the same array. This heterogeneity is reflected in the query complexity.Option 1 might be considered more verbose than necessary when no
boundTovalues are employed in a particular TRO. But the structure has the advantage of always being the same (even if theboundToproperties are optional). Both producers and consumers use the same structure regardless of whetherboundTois present.The JSON Schema is also simpler for Option 1, but the schema is written once by us, so that's a one-time cost either way. The declaration readability and SPARQL query complexity of Option 2 (including the cost of executing the more complex queries) are paid repeatedly, however.
Questions
Do we expect a TRS commonly to mix bound and unbound arrangements in the same TRP? If not, the poorer readability of option 2 may not be an issue.
How important is SPARQL query complexity for TRO consumers? If consumers query TROs through tooling that can abstract the pattern, the mixed-case query complexity is a one-time cost for authoring queries (although the performance costs remain).
Do we expect existing TROs (with bare arrangement references) to remain valid? Option 1 is a breaking change. Option 2 is backward-compatible. On the other hand, this is the time to make breaking changes.