Skip to content

Add flowerpress functionality to daffodil-sbt#183

Open
jadams-tresys wants to merge 11 commits into
apache:mainfrom
jadams-tresys:flowerpress
Open

Add flowerpress functionality to daffodil-sbt#183
jadams-tresys wants to merge 11 commits into
apache:mainfrom
jadams-tresys:flowerpress

Conversation

@jadams-tresys
Copy link
Copy Markdown

Notable improvements from the current released flowerpress:

  • Works on XSLT files as well
  • Only packages files that are referenced by the projects files

Notable improvements from the current released flowerpress:
 - Works on XSLT files as well
 - Only packages files that are referenced by the projects files
Copy link
Copy Markdown
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, mostly a bunch of suggestions for cleanups or just make thing a bit more SBT-y.

I think my one concern is how do we exclude all the things that need to be excluded that are currently commented out in flattenExcludes (i.e. random XML files that exist in dependencies. I don't really like hard coding a bunch of random paths, esepcially since we don't know what could be added as a project depdendncy. Maybe we need some approach, like we only include files that are referneced by files in src/main/resources?

Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread VERSION Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala
Josh Adams added 2 commits May 8, 2026 14:38
Somewhat major rework to get things more SBT friendly and to use
existing classpath jars directly instead of extracting them all and
sorting through the files.
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
* file. Add this file to the accumulator list
*/
getReferences(
root,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this root stuff some more, I'm not sure the way we handle root is correct. For example, say we have two files in two separate resource directories in a project. The most common example would be managed and unmanaged resources, e.g.

unmanagedResourceDir/foo.dfdl.xsd
managedResourceDir/bar.dfdl.xsd

As we resolve references in foo.dfdl.xsd, root will be set to unmanagedResourceDir. And say foo.dfdl.xsd references bar.dfdl.xsd. I think in this case we will relativize bar.dfdl.xsd relative to foo's root, which is different and things will break.

One option is if something resolves to file then do not recursively call getReferences, with the assumption that it should exist in projectReferences and it's root should be correcte. Though, I'm not positive that assumption is always true, especially if users do weird things with excludeFilter, which might exclude a file.

Another option is if something resolves to a file then we just check if the files starts with all possible resource directories and remove which ever it starts with. It's a bit inefficient, but it should work. The only case I can think it would break is if a resource directory is a subdirectory of another resource directory and so a file could potentially star with multiple resource directories, but I don't think that should ever happen--feels like a project doing that would have lots of other issues if it had nested resource dirs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one comment I didn't really address in my most recent changes. With the current code it will look for references relative to their current root, but if it fails to find them, it will use the classLoader to search the classpath for them, which will include all resource directories.

I think this should be sufficient for at least the vast majority of cases, but I'm not sure if we have any schema projects that make use of multiple resource directories to test against.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe some projects do exist that make use of resource generators. We also don't know what other project might exists. They probably aren't common, but they could exists. And in those cases we will have separate resource directories and those files could reference each other. I feel it's important to make sure this works correctly and I don't think adds too much complexity. My latest review adds some comments about how we might be able to do this.

Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
No more recursion and a few other fixups
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
} else {
// Relative path
val relPath = contextRelPath.resolveSibling(origLocation).normalize()
if (Files.exists(rootPath.resolve(relPath))) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't have the same logic as Daffodil. If contextURI is a jar:file:/.. and it contains a relative path, then this will look for the relative path relative to rootPath. But instead it should look inside the same jar as contextURI.

For example, if the contextURI is

jar:file:/path/to/foo.jar!/path/to/schema.dfdl.xsd

and that schema.dfdl.xsd references schemaLocation="foo.dfdl.xsd", we should be looking for the file at:

jar:file:/path/to/foo.jar!/path/to/foo.dfdl.xsd

But this logic will check in

resourceDirectoryRoot/path/to/foo.dfdl.xsd

If the context is a jar, we shouldn't even consider looking at the filesystem.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about instances where we have multiple payload types? For example, using JREAP it defines it's own generic payload, but when combined with Link16 using the payload defined in JREAP would be incorrect as we would want to use the one from Link16, which shares the same classpath name.

I realize that we typically have a 3rd schema (link16-with-jreap) which means we are usually dealing with just a different jar file anyways, but I could definitely see a project where it schema via a jar dependency, but then overwrites an element with something defined in it's project.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention that Daffodil implements is that relative paths can only refer to files in the same jar and absolute paths must be used to reference files in another jar.

So in the case of Jreap, it defines that a payload schema must exist at a specific path, and it does so using an absolute path so that it can reference a file in another jar, like in the link16 jar. And then the link16 jar is responsible for containing that payload file at the absolute path that jreap expects.

Note that jreap doesn't actually define a payload schema in src/main/schema, it just says where it must exist as provided by another jar. It does have a payload schema it uses for testing, but that is in src/main/test so isn't reachable by jars that depend on it like link16.

Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
val bytes = contextURI.toURL.openStream().readAllBytes()
val contextRelPath = contextURI.getScheme match {
case "jar" => Paths.get(contextURI.toString.split("!")(1).tail)
case "file" => Paths.get(rootURI.relativize(contextURI).getPath)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue with the using root mentioned above, it's possible that the contetxURI is a different root that rootURI because the are multple possible roots for a URI--it isn't necessarily the same one as this root we're looking at.

So I think this wants to be something like this:

case "file" => {
  val root = (Compile / resourceDirectories).value.find(contextURI.startsWith).get
  Paths.get(root.relativize(contextURI))
}

} else {
// Relative path
val relPath = contextRelPath.resolveSibling(origLocation).normalize()
if (Files.exists(rootPath.resolve(relPath))) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this review, I've thought of a new issue if the contextURI is a file:// and the origLocation is a relative path. In that case a relative resource path could access a file in a different resourceDirectory since they are would be in the same jar. So I think the logic for looking up relative paths when the context is a file:// wants to be something like this:

val resolvedRoot = (Compile / resourceDirectories).value.find(root => File.exists(root.resolve(relPath)))
if (resolvedRoot.isDefined) {
  Some(origLocation -> resolvedRoot.resolve(relPath))
}

So similar to what you have, but we have try to resolve the path relative to all possible roots instead of the current root. Similar to above, that is because a path relative to a file:// could resolve to any resource root directory, not just the current one.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a problem with the current behavior where if it doesn't find it in the current root it will use the URLClassLoader which contains URL's for all the roots? I just feel like this results in the same behavior.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Daffodil defines that relative schemaLocations only look within the jar, they can never be overridden or provided by another jar. So if we don't find a file in any of the resourceDirectories, it means the file is missing, we do not fall back to classpath and look inside some other jar--we can only look at files within this jar (which in this case is the files in the resourceDirectories).

Note that one execption is that deprecated (but allowed) behavior where if we don't find a relative path then we can look it up as if it were an absolute path where the schemaLocation is intended to be absolute but they forgot the leading slash. So we can look up the classpath in that case, but it must be the full originalLocation and not the relPath.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll make the change. I do still need the fallback check of the classpath for the exception you described. The Asterix schemas reference DFDLGeneralFormat without a leading '/'.

Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread README.md Outdated
* file. Add this file to the accumulator list
*/
getReferences(
root,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe some projects do exist that make use of resource generators. We also don't know what other project might exists. They probably aren't common, but they could exists. And in those cases we will have separate resource directories and those files could reference each other. I feel it's important to make sure this works correctly and I don't think adds too much complexity. My latest review adds some comments about how we might be able to do this.

Copy link
Copy Markdown
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, most just just thoughts on refactor/cleanups, up to you to change or not.

Two issues:

  1. I still think we need consider the case of multiple referenceing files with the same flattened path. I can easily imagine hitting that and things breaking.
  2. I think the handling of seen/unseen at the end needs tweaking.
  3. Don't forget to revert the VERSION change

Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
case "file" => {
val path = Paths.get(contextURI)
val root = (Compile / resourceDirectories).value
.find(file => contextURI.toString.contains(file.toString))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be startsWith instead of contains? Should be slightly more efficient since it wont' have to scan the whole string, and the contextURI should start with one of the resource directories.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it can't. The resource directories don't include "file:" whereas the URI's do.

Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Copy link
Copy Markdown
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple change suggests, and I think resolution migh still not be exactly right in some edge cases.

Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Copy link
Copy Markdown
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 minor cleanpus/suggestions

Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala
Comment thread src/main/scala/org/apache/daffodil/DaffodilPlugin.scala Outdated
@jadams-tresys jadams-tresys requested a review from olabusayoT May 19, 2026 15:29
Copy link
Copy Markdown

@olabusayoT olabusayoT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Small documentation question

Comment thread README.md
`test/` directory. Source files are those that end with `*.scala` or `*.java`,
and resource files are anything else.

### Flatten Schemas
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this contain explanations of the other flatten options for completeness ex daffodilFlattenTarget and the regex one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants