-
Notifications
You must be signed in to change notification settings - Fork 42
Replaceing the metadata parser from packaging.metadata #607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Replaceing the metadata parser from packaging.metadata #607
Conversation
8bd775c to
1247975
Compare
1247975 to
2132d61
Compare
2132d61 to
42ef5bf
Compare
cb3813b to
bdada13
Compare
|
@tiran @dhellmann PTAL |
dhellmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. How many places in the code do we do something similar to parse metadata? How useful would it be to have a function that takes a path and returns the metadata?
|
|
My bad. I should have checked all code to see if same pattern exists else where. I can see https://github.com/python-wheel-build/fromager/blob/main/src/fromager/candidate.py#L82 . I do not think we need a common function yet. PTAL and let me know. |
Let me get back to you on this. |
6e992d6 to
a1a70e7
Compare
|
@tiran Since fromager only reads metadata for dependency resolution and doesn't need to write it back, round trip safety isn't necessary. The benefits of type safety and validation from packaging.metadata outweigh the loss of unknown fields that aren't being used anyway. |
a1a70e7 to
d948afc
Compare
rd4398
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I will wait for @tiran to approve since he had clarification questions
src/fromager/bootstrapper.py
Outdated
| raw_metadata, _ = parse_email(f.read()) | ||
| metadata = Metadata.from_raw(raw_metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using parse_email() + Metadata.from_raw() instead of Metadata.parse_email()? The Metadata.parse_email() combines parse_email(), Metadata.from_raw(), and additional validation.
This code should probably use fromager.dependencies.parse_metadata(metadata_filename).
7517935 to
93625a4
Compare
93625a4 to
a077307
Compare
|
@tiran PTAL when you have a chance |
18668f8 to
90afccf
Compare
90afccf to
56b0acd
Compare
5554a83 to
c8ee75b
Compare
85151fb to
b5ab037
Compare
Replaces the Python email library parser with packaging.metadata.Metadata for parsing wheel/package metadata. Fixes: python-wheel-build#561 Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Lalatendu Mohanty <lmohanty@redhat.com>
b5ab037 to
7e8504b
Compare
| wheel_name_parts = wheel_filename.stem.split("-") | ||
| dist_name = wheel_name_parts[0] | ||
| dist_version = wheel_name_parts[1] | ||
| predicted_dist_info = f"{dist_name}-{dist_version}.dist-info/METADATA" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not invent our own wheel parsing algorithm. The function add_extra_metadata_to_wheels has some code to get the dist-info directory of a wheel file. Perhaps move the code into a common, shared helper?
| p = BytesParser() | ||
| metadata = p.parse(f, headersonly=True) | ||
| return Version(metadata["Version"]) | ||
| metadata = dependencies.parse_metadata(metadata_filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parse_metadata validates the metadata by default. This will raise an exception if the metadata version does not match the metadata content, e.g. license-file field in Metadata < 2.4.
| if predicted_dist_info in whl.namelist(): | ||
| metadata_content = whl.read(predicted_dist_info) | ||
| return parse_metadata(metadata_content, validate=validate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructs a list of all files in a wheel and then searches through the list of files. It's inefficient.
try:
metadata_content = whl.read(metadata_file)
except ...
| # Fallback to iterating if prediction fails (e.g., non-standard naming) | ||
| for entry in whl.namelist(): | ||
| if entry.endswith(".dist-info/METADATA"): | ||
| metadata_content = whl.read(entry) | ||
| return parse_metadata(metadata_content, validate=validate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is not necessary. A wheel file MUST contain a correct dist-info directory. If the dist-info directory does not match the wheel file name, then pip and uv will refuse to install the file.
$ mv fromager-0.75.0-py3-none-any.whl cheesemaker-0.75.0-py3-none-any.whl
$ pip install ./cheesemaker-0.75.0-py3-none-any.whl
Processing ./cheesemaker-0.75.0-py3-none-any.whl
ERROR: cheesemaker has an invalid wheel, .dist-info directory 'fromager-0.75.0.dist-info' does not start with 'cheesemaker'
Replaces the Python email library parser with packaging.metadata.Metadata for parsing wheel/package metadata.
Fixes #561