feat: mobster enrich WIP#301
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
There was a problem hiding this comment.
Please remove .DS_Store files from this PR.
There was a problem hiding this comment.
Bump of this comment.
BorekZnovustvoritel
left a comment
There was a problem hiding this comment.
Thank you for your contribution. In the current state the code cannot be comprehensively reviewed, as I expect many things to change from the TODOs and WIP in title.
Make sure to use our static checks even during development so that you don't have to make a huge refactor after the whole functionality is ready, but it doesn't pass static checks. To do this, set the local environment and run tox.
My main points & questions:
- Is this PR linked to some ticket? Having a clear goal defined would be great for reviewing
- What is the benefit of using those json mappings? Do these files have a spec that could be followed and enforced?
- The SBOMs are handled as dictionaries, which makes it prone to typos in key names and makes the code harder to read. Please try to invest some time into trying to use typed libraries for SPDX and CycloneDX (already used in parts of Mobster) and if suitable, consider using these instead of dictionaries which just have
Anyas their value type - After everything is ready, please add a documentation to your code
| @@ -0,0 +1,74 @@ | |||
| { | |||
There was a problem hiding this comment.
Do I understand it correctly that this file is supposed to hold the translation map from CDX to SPDX 3.0?
BorekZnovustvoritel
left a comment
There was a problem hiding this comment.
Thank you for the refactor. The code is much better and easier to read, although I think there are still same rough edges we would like to tackle. With some additional effort we can make this a new and polished feature, which I am very much looking forward to.
| ```bash | ||
| mobster enrich \ | ||
| --output output-sbom.json \ | ||
| oci-image \ |
There was a problem hiding this comment.
Proposal: We will be adding other oci-related enrichment features later (for example I am currently working on license enrichment of oci-image SBOMs). Maybe a different command name like ai-bom may be more fitting for your feature? So the usage would be mobster enrich --output foo.jso ai-bom --sbom bar.json, --enrichment-file foobar.json
| index = 0 | ||
| for component in sbom.components: |
There was a problem hiding this comment.
Nitpick: Can be refactored to for index, component in enumerate(sbom.components). The line index += 1 can be deleted in such case.
| outputter = make_outputter( | ||
| bom=sbom, | ||
| output_format=OutputFormat.JSON, | ||
| schema_version=SchemaVersion.V1_6, | ||
| ) | ||
| sbom_json = outputter.output_as_string() | ||
| sbom_dict: dict[str, Any] = json.loads(sbom_json) | ||
| await EnrichImageCommand.dump_model_card_to_dict(sbom, sbom_dict) |
There was a problem hiding this comment.
The Modelcard fields don't seem to be fully supported in CycloneDX library. I think we should reuse CycloneDX1BomWrapper (currently located in mobster.cmd.generate.oci_image.cyclonedx_wrapper). A new field (the modelcard) should be added, serialization and deserialization should be updated and the whole file should be moved somewhere where this module can also use it.
This way we can avoid complicating the dump function here.
| """ | ||
| if ( | ||
| self.cli_args.sbom is None or self.cli_args.enrichment_file is None | ||
| # and self.cli_args.image_pullspec is None |
There was a problem hiding this comment.
Commented-out code should not stay in the final code.
| ArgumentError: If the base sbom or enrichment is not provided. | ||
| """ | ||
| if ( | ||
| self.cli_args.sbom is None or self.cli_args.enrichment_file is None |
There was a problem hiding this comment.
Nitpick: this check may be redundant as the cli specifies these args as required. But it's ok to leave it as-is IMO.
| package.description = field_value | ||
| elif field_name == "licenses" and package.license_concluded is None: | ||
| package.license_concluded = spdx_licensing.parse(field_value) | ||
|
|
There was a problem hiding this comment.
Suggestion: We can also append a licenseComment that the license was concluded by comparing the package to HuggingFace AIBom (I will leave the exact wording up to you). We should make sure not to erase any previous licenseComments though. If some exist for the package, we should just append a new line and add our comment after that.
|
|
||
| # check for any AI specific fields | ||
| if field_name in ai_mappings: | ||
| spdx_ai_field_name = ai_mappings[field_name]["SPDX_Equivalent"] |
There was a problem hiding this comment.
I'm worried that the ai_mapping may get outdated without us noticing and then using [field_name] instead of .get(field_name, <some default>) raises KeyError.
I propose handling this gracefully with some warning log and skipping this field.
| except Exception as e: | ||
| raise e |
There was a problem hiding this comment.
Again, this handles nothing. Constructs like these are useful for logging. If we don't want to log anything in the except clause, we may as well remove it.
There was a problem hiding this comment.
Bump of this comment.
|
|
||
| if hasattr(component, "model_card"): | ||
| model_card = component.model_card | ||
| for field in model_card["properties"]: |
There was a problem hiding this comment.
We should never use mapping[attribute_name] when its schema doesn't define it as required. Always prefer mapping.get(attribute_name, default), which doesn't raise KeyError and kills the whole script. Missing attribute must be handled gracefully (log + skip is the best approach IMO).
Documentation for the enrich feature lives in docs/sboms/enrich.md, and a comprehensive architecture explanation lives in docs/sboms/enrich_architecture.md.
Note about implementation: almost everything has been updated to use spdx and cdx python packages. However, cyclonedx-tools-lib does not currently support modelCard (which is where the AI related fields live in CycloneDX). Therefore, the enrich feature loads in the whole modelCard as a dictionary into the Bom object, then uses dictionary indexing later on to access parts of it.
While cyclonedx-tools-lib does not support model_card, the python package needed to be updated to the latest version to support parsing a CycloneDX SBOM that has a model_card (old versions raise an error when it tried to parse a model_card, new versions just ignore it).