Skip to content

feat: mobster enrich WIP#301

Open
gdozortsev wants to merge 2 commits into
konflux-ci:mainfrom
gdozortsev:dev
Open

feat: mobster enrich WIP#301
gdozortsev wants to merge 2 commits into
konflux-ci:mainfrom
gdozortsev:dev

Conversation

@gdozortsev
Copy link
Copy Markdown

@gdozortsev gdozortsev commented Jan 8, 2026

Documentation for the enrich feature lives in docs/sboms/enrich.md, and a comprehensive architecture explanation lives in docs/sboms/enrich_architecture.md.

Note about implementation: almost everything has been updated to use spdx and cdx python packages. However, cyclonedx-tools-lib does not currently support modelCard (which is where the AI related fields live in CycloneDX). Therefore, the enrich feature loads in the whole modelCard as a dictionary into the Bom object, then uses dictionary indexing later on to access parts of it.
While cyclonedx-tools-lib does not support model_card, the python package needed to be updated to the latest version to support parsing a CycloneDX SBOM that has a model_card (old versions raise an error when it tried to parse a model_card, new versions just ignore it).

@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented Jan 8, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove .DS_Store files from this PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump of this comment.

Copy link
Copy Markdown
Contributor

@BorekZnovustvoritel BorekZnovustvoritel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. In the current state the code cannot be comprehensively reviewed, as I expect many things to change from the TODOs and WIP in title.

Make sure to use our static checks even during development so that you don't have to make a huge refactor after the whole functionality is ready, but it doesn't pass static checks. To do this, set the local environment and run tox.

My main points & questions:

  • Is this PR linked to some ticket? Having a clear goal defined would be great for reviewing
  • What is the benefit of using those json mappings? Do these files have a spec that could be followed and enforced?
  • The SBOMs are handled as dictionaries, which makes it prone to typos in key names and makes the code harder to read. Please try to invest some time into trying to use typed libraries for SPDX and CycloneDX (already used in parts of Mobster) and if suitable, consider using these instead of dictionaries which just have Any as their value type
  • After everything is ready, please add a documentation to your code

Comment thread src/mobster/cmd/enrich/__init__.py Outdated
Comment thread src/mobster/sbom/enrich.py Outdated
Comment thread src/mobster/sbom/enrich.py Outdated
Comment thread src/mobster/sbom/enrich_tools/SPDXmappings2.3.json Outdated
@@ -0,0 +1,74 @@
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that this file is supposed to hold the translation map from CDX to SPDX 3.0?

Comment thread src/mobster/sbom/enrich_tools/SPDXmappingAI.json Outdated
Comment thread src/mobster/sbom/enrich.py Outdated
Comment thread src/mobster/sbom/enrich.py Outdated
Comment thread src/mobster/sbom/enrich.py Outdated
Copy link
Copy Markdown
Contributor

@BorekZnovustvoritel BorekZnovustvoritel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the refactor. The code is much better and easier to read, although I think there are still same rough edges we would like to tackle. With some additional effort we can make this a new and polished feature, which I am very much looking forward to.

Comment thread docs/sboms/enrich.md
```bash
mobster enrich \
--output output-sbom.json \
oci-image \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal: We will be adding other oci-related enrichment features later (for example I am currently working on license enrichment of oci-image SBOMs). Maybe a different command name like ai-bom may be more fitting for your feature? So the usage would be mobster enrich --output foo.jso ai-bom --sbom bar.json, --enrichment-file foobar.json

Comment on lines +40 to +41
index = 0
for component in sbom.components:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Can be refactored to for index, component in enumerate(sbom.components). The line index += 1 can be deleted in such case.

Comment on lines +62 to +69
outputter = make_outputter(
bom=sbom,
output_format=OutputFormat.JSON,
schema_version=SchemaVersion.V1_6,
)
sbom_json = outputter.output_as_string()
sbom_dict: dict[str, Any] = json.loads(sbom_json)
await EnrichImageCommand.dump_model_card_to_dict(sbom, sbom_dict)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Modelcard fields don't seem to be fully supported in CycloneDX library. I think we should reuse CycloneDX1BomWrapper (currently located in mobster.cmd.generate.oci_image.cyclonedx_wrapper). A new field (the modelcard) should be added, serialization and deserialization should be updated and the whole file should be moved somewhere where this module can also use it.

This way we can avoid complicating the dump function here.

"""
if (
self.cli_args.sbom is None or self.cli_args.enrichment_file is None
# and self.cli_args.image_pullspec is None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should not stay in the final code.

ArgumentError: If the base sbom or enrichment is not provided.
"""
if (
self.cli_args.sbom is None or self.cli_args.enrichment_file is None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: this check may be redundant as the cli specifies these args as required. But it's ok to leave it as-is IMO.

package.description = field_value
elif field_name == "licenses" and package.license_concluded is None:
package.license_concluded = spdx_licensing.parse(field_value)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: We can also append a licenseComment that the license was concluded by comparing the package to HuggingFace AIBom (I will leave the exact wording up to you). We should make sure not to erase any previous licenseComments though. If some exist for the package, we should just append a new line and add our comment after that.


# check for any AI specific fields
if field_name in ai_mappings:
spdx_ai_field_name = ai_mappings[field_name]["SPDX_Equivalent"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that the ai_mapping may get outdated without us noticing and then using [field_name] instead of .get(field_name, <some default>) raises KeyError.

I propose handling this gracefully with some warning log and skipping this field.

Comment on lines +469 to +470
except Exception as e:
raise e
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this handles nothing. Constructs like these are useful for logging. If we don't want to log anything in the except clause, we may as well remove it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump of this comment.


if hasattr(component, "model_card"):
model_card = component.model_card
for field in model_card["properties"]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never use mapping[attribute_name] when its schema doesn't define it as required. Always prefer mapping.get(attribute_name, default), which doesn't raise KeyError and kills the whole script. Missing attribute must be handled gracefully (log + skip is the best approach IMO).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants