Add flowerpress functionality to daffodil-sbt#183
Conversation
Notable improvements from the current released flowerpress: - Works on XSLT files as well - Only packages files that are referenced by the projects files
stevedlawrence
left a comment
There was a problem hiding this comment.
Looks good, mostly a bunch of suggestions for cleanups or just make thing a bit more SBT-y.
I think my one concern is how do we exclude all the things that need to be excluded that are currently commented out in flattenExcludes (i.e. random XML files that exist in dependencies. I don't really like hard coding a bunch of random paths, esepcially since we don't know what could be added as a project depdendncy. Maybe we need some approach, like we only include files that are referneced by files in src/main/resources?
Somewhat major rework to get things more SBT friendly and to use existing classpath jars directly instead of extracting them all and sorting through the files.
| * file. Add this file to the accumulator list | ||
| */ | ||
| getReferences( | ||
| root, |
There was a problem hiding this comment.
Thinking about this root stuff some more, I'm not sure the way we handle root is correct. For example, say we have two files in two separate resource directories in a project. The most common example would be managed and unmanaged resources, e.g.
unmanagedResourceDir/foo.dfdl.xsd
managedResourceDir/bar.dfdl.xsd
As we resolve references in foo.dfdl.xsd, root will be set to unmanagedResourceDir. And say foo.dfdl.xsd references bar.dfdl.xsd. I think in this case we will relativize bar.dfdl.xsd relative to foo's root, which is different and things will break.
One option is if something resolves to file then do not recursively call getReferences, with the assumption that it should exist in projectReferences and it's root should be correcte. Though, I'm not positive that assumption is always true, especially if users do weird things with excludeFilter, which might exclude a file.
Another option is if something resolves to a file then we just check if the files starts with all possible resource directories and remove which ever it starts with. It's a bit inefficient, but it should work. The only case I can think it would break is if a resource directory is a subdirectory of another resource directory and so a file could potentially star with multiple resource directories, but I don't think that should ever happen--feels like a project doing that would have lots of other issues if it had nested resource dirs.
There was a problem hiding this comment.
This is one comment I didn't really address in my most recent changes. With the current code it will look for references relative to their current root, but if it fails to find them, it will use the classLoader to search the classpath for them, which will include all resource directories.
I think this should be sufficient for at least the vast majority of cases, but I'm not sure if we have any schema projects that make use of multiple resource directories to test against.
There was a problem hiding this comment.
I believe some projects do exist that make use of resource generators. We also don't know what other project might exists. They probably aren't common, but they could exists. And in those cases we will have separate resource directories and those files could reference each other. I feel it's important to make sure this works correctly and I don't think adds too much complexity. My latest review adds some comments about how we might be able to do this.
No more recursion and a few other fixups
| } else { | ||
| // Relative path | ||
| val relPath = contextRelPath.resolveSibling(origLocation).normalize() | ||
| if (Files.exists(rootPath.resolve(relPath))) { |
There was a problem hiding this comment.
This doesn't have the same logic as Daffodil. If contextURI is a jar:file:/.. and it contains a relative path, then this will look for the relative path relative to rootPath. But instead it should look inside the same jar as contextURI.
For example, if the contextURI is
jar:file:/path/to/foo.jar!/path/to/schema.dfdl.xsd
and that schema.dfdl.xsd references schemaLocation="foo.dfdl.xsd", we should be looking for the file at:
jar:file:/path/to/foo.jar!/path/to/foo.dfdl.xsd
But this logic will check in
resourceDirectoryRoot/path/to/foo.dfdl.xsd
If the context is a jar, we shouldn't even consider looking at the filesystem.
There was a problem hiding this comment.
What about instances where we have multiple payload types? For example, using JREAP it defines it's own generic payload, but when combined with Link16 using the payload defined in JREAP would be incorrect as we would want to use the one from Link16, which shares the same classpath name.
I realize that we typically have a 3rd schema (link16-with-jreap) which means we are usually dealing with just a different jar file anyways, but I could definitely see a project where it schema via a jar dependency, but then overwrites an element with something defined in it's project.
There was a problem hiding this comment.
The convention that Daffodil implements is that relative paths can only refer to files in the same jar and absolute paths must be used to reference files in another jar.
So in the case of Jreap, it defines that a payload schema must exist at a specific path, and it does so using an absolute path so that it can reference a file in another jar, like in the link16 jar. And then the link16 jar is responsible for containing that payload file at the absolute path that jreap expects.
Note that jreap doesn't actually define a payload schema in src/main/schema, it just says where it must exist as provided by another jar. It does have a payload schema it uses for testing, but that is in src/main/test so isn't reachable by jars that depend on it like link16.
| val bytes = contextURI.toURL.openStream().readAllBytes() | ||
| val contextRelPath = contextURI.getScheme match { | ||
| case "jar" => Paths.get(contextURI.toString.split("!")(1).tail) | ||
| case "file" => Paths.get(rootURI.relativize(contextURI).getPath) |
There was a problem hiding this comment.
This is an issue with the using root mentioned above, it's possible that the contetxURI is a different root that rootURI because the are multple possible roots for a URI--it isn't necessarily the same one as this root we're looking at.
So I think this wants to be something like this:
case "file" => {
val root = (Compile / resourceDirectories).value.find(contextURI.startsWith).get
Paths.get(root.relativize(contextURI))
}| } else { | ||
| // Relative path | ||
| val relPath = contextRelPath.resolveSibling(origLocation).normalize() | ||
| if (Files.exists(rootPath.resolve(relPath))) { |
There was a problem hiding this comment.
In this review, I've thought of a new issue if the contextURI is a file:// and the origLocation is a relative path. In that case a relative resource path could access a file in a different resourceDirectory since they are would be in the same jar. So I think the logic for looking up relative paths when the context is a file:// wants to be something like this:
val resolvedRoot = (Compile / resourceDirectories).value.find(root => File.exists(root.resolve(relPath)))
if (resolvedRoot.isDefined) {
Some(origLocation -> resolvedRoot.resolve(relPath))
}So similar to what you have, but we have try to resolve the path relative to all possible roots instead of the current root. Similar to above, that is because a path relative to a file:// could resolve to any resource root directory, not just the current one.
There was a problem hiding this comment.
Is there a problem with the current behavior where if it doesn't find it in the current root it will use the URLClassLoader which contains URL's for all the roots? I just feel like this results in the same behavior.
There was a problem hiding this comment.
Yes, Daffodil defines that relative schemaLocations only look within the jar, they can never be overridden or provided by another jar. So if we don't find a file in any of the resourceDirectories, it means the file is missing, we do not fall back to classpath and look inside some other jar--we can only look at files within this jar (which in this case is the files in the resourceDirectories).
Note that one execption is that deprecated (but allowed) behavior where if we don't find a relative path then we can look it up as if it were an absolute path where the schemaLocation is intended to be absolute but they forgot the leading slash. So we can look up the classpath in that case, but it must be the full originalLocation and not the relPath.
There was a problem hiding this comment.
Ok, I'll make the change. I do still need the fallback check of the classpath for the exception you described. The Asterix schemas reference DFDLGeneralFormat without a leading '/'.
| * file. Add this file to the accumulator list | ||
| */ | ||
| getReferences( | ||
| root, |
There was a problem hiding this comment.
I believe some projects do exist that make use of resource generators. We also don't know what other project might exists. They probably aren't common, but they could exists. And in those cases we will have separate resource directories and those files could reference each other. I feel it's important to make sure this works correctly and I don't think adds too much complexity. My latest review adds some comments about how we might be able to do this.
stevedlawrence
left a comment
There was a problem hiding this comment.
Looks good, most just just thoughts on refactor/cleanups, up to you to change or not.
Two issues:
- I still think we need consider the case of multiple referenceing files with the same flattened path. I can easily imagine hitting that and things breaking.
- I think the handling of seen/unseen at the end needs tweaking.
- Don't forget to revert the VERSION change
| case "file" => { | ||
| val path = Paths.get(contextURI) | ||
| val root = (Compile / resourceDirectories).value | ||
| .find(file => contextURI.toString.contains(file.toString)) |
There was a problem hiding this comment.
Can this be startsWith instead of contains? Should be slightly more efficient since it wont' have to scan the whole string, and the contextURI should start with one of the resource directories.
There was a problem hiding this comment.
Unfortunately it can't. The resource directories don't include "file:" whereas the URI's do.
stevedlawrence
left a comment
There was a problem hiding this comment.
Couple change suggests, and I think resolution migh still not be exactly right in some edge cases.
stevedlawrence
left a comment
There was a problem hiding this comment.
+1 minor cleanpus/suggestions
| `test/` directory. Source files are those that end with `*.scala` or `*.java`, | ||
| and resource files are anything else. | ||
|
|
||
| ### Flatten Schemas |
There was a problem hiding this comment.
Shouldn't this contain explanations of the other flatten options for completeness ex daffodilFlattenTarget and the regex one?
Notable improvements from the current released flowerpress: