Skip to content

Conversation

@LalatenduMohanty
Copy link
Member

@LalatenduMohanty LalatenduMohanty commented May 16, 2025

Replaces the Python email library parser with packaging.metadata.Metadata for parsing wheel/package metadata.

Fixes #561

@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 2 times, most recently from 8bd775c to 1247975 Compare May 19, 2025 10:45
@LalatenduMohanty LalatenduMohanty requested a review from a team as a code owner July 4, 2025 18:07
@LalatenduMohanty LalatenduMohanty changed the title [WIP] Replaceing the metadata parser from packaging.metadata Replaceing the metadata parser from packaging.metadata Jul 4, 2025
@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 2 times, most recently from cb3813b to bdada13 Compare July 4, 2025 18:35
@LalatenduMohanty
Copy link
Member Author

@tiran @dhellmann PTAL

Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. How many places in the code do we do something similar to parse metadata? How useful would it be to have a function that takes a path and returns the metadata?

@tiran
Copy link
Collaborator

tiran commented Jul 7, 2025

packaging.metadata.parse_email does not round trip and looses any field that it does not understand. Is this okay? Do we need a round trip-safe function?

@LalatenduMohanty
Copy link
Member Author

How many places in the code do we do something similar to parse metadata? How useful would it be to have a function that takes a path and returns the metadata?

My bad. I should have checked all code to see if same pattern exists else where. I can see https://github.com/python-wheel-build/fromager/blob/main/src/fromager/candidate.py#L82 . I do not think we need a common function yet. PTAL and let me know.

@LalatenduMohanty
Copy link
Member Author

packaging.metadata.parse_email does not round trip and looses any field that it does not understand. Is this okay? Do we need a round trip-safe function?

Let me get back to you on this.

@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 3 times, most recently from 6e992d6 to a1a70e7 Compare July 7, 2025 19:42
@LalatenduMohanty
Copy link
Member Author

@tiran Since fromager only reads metadata for dependency resolution and doesn't need to write it back, round trip safety isn't necessary. The benefits of type safety and validation from packaging.metadata outweigh the loss of unknown fields that aren't being used anyway.

Copy link
Contributor

@rd4398 rd4398 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I will wait for @tiran to approve since he had clarification questions

Comment on lines 787 to 788
raw_metadata, _ = parse_email(f.read())
metadata = Metadata.from_raw(raw_metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using parse_email() + Metadata.from_raw() instead of Metadata.parse_email()? The Metadata.parse_email() combines parse_email(), Metadata.from_raw(), and additional validation.

This code should probably use fromager.dependencies.parse_metadata(metadata_filename).

@LalatenduMohanty
Copy link
Member Author

@tiran PTAL when you have a chance

@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 2 times, most recently from 5554a83 to c8ee75b Compare January 26, 2026 15:14
@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 3 times, most recently from 85151fb to b5ab037 Compare January 27, 2026 03:45
Replaces the Python email library parser with packaging.metadata.Metadata
for parsing wheel/package metadata.

Fixes: python-wheel-build#561

Co-Authored-By: Claude <claude@anthropic.com>

Signed-off-by: Lalatendu Mohanty <lmohanty@redhat.com>
Comment on lines +478 to +481
wheel_name_parts = wheel_filename.stem.split("-")
dist_name = wheel_name_parts[0]
dist_version = wheel_name_parts[1]
predicted_dist_info = f"{dist_name}-{dist_version}.dist-info/METADATA"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not invent our own wheel parsing algorithm. The function add_extra_metadata_to_wheels has some code to get the dist-info directory of a wheel file. Perhaps move the code into a common, shared helper?

p = BytesParser()
metadata = p.parse(f, headersonly=True)
return Version(metadata["Version"])
metadata = dependencies.parse_metadata(metadata_filename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_metadata validates the metadata by default. This will raise an exception if the metadata version does not match the metadata content, e.g. license-file field in Metadata < 2.4.

Comment on lines +485 to +487
if predicted_dist_info in whl.namelist():
metadata_content = whl.read(predicted_dist_info)
return parse_metadata(metadata_content, validate=validate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructs a list of all files in a wheel and then searches through the list of files. It's inefficient.

try:
    metadata_content = whl.read(metadata_file)
except ...

Comment on lines +489 to +493
# Fallback to iterating if prediction fails (e.g., non-standard naming)
for entry in whl.namelist():
if entry.endswith(".dist-info/METADATA"):
metadata_content = whl.read(entry)
return parse_metadata(metadata_content, validate=validate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not necessary. A wheel file MUST contain a correct dist-info directory. If the dist-info directory does not match the wheel file name, then pip and uv will refuse to install the file.

$ mv fromager-0.75.0-py3-none-any.whl cheesemaker-0.75.0-py3-none-any.whl
$ pip install ./cheesemaker-0.75.0-py3-none-any.whl 
Processing ./cheesemaker-0.75.0-py3-none-any.whl

ERROR: cheesemaker has an invalid wheel, .dist-info directory 'fromager-0.75.0.dist-info' does not start with 'cheesemaker'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use packaging library to parse metadata instead of doing it ourselves

4 participants