-
Notifications
You must be signed in to change notification settings - Fork 102
DRAFT: fix: zip64 central header #629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @Its-Just-Nans, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue with the generation of ZIP64 central directory headers, specifically concerning the correct signaling of ZIP64 extensions. It introduces a mechanism to accurately determine when ZIP64 extensions are necessary for files exceeding standard ZIP limits and ensures that the central directory header properly indicates the presence of ZIP64 extra fields. Additionally, it refines the process of writing these extra fields to prevent potential errors or malformations, leading to more compliant ZIP archives. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
This PR attempts to fix ZIP64 central header handling but contains critical issues that must be addressed:
Critical Issues Found
-
Debug Code Left In: Line 2290 in
src/write.rscontains a debugprintln!statement that must be removed before merging. -
ZIP Specification Violation: The logic in
block()method (lines 972-995 insrc/types.rs) incorrectly setscompressed_size,uncompressed_size, andoffsetto 0xFFFFFFFF for ALL ZIP64 files, regardless of actual file sizes. According to the ZIP specification (APPNOTE 4.4.1.24), these fields should only be set to 0xFFFFFFFF when the actual values exceed the 32-bit threshold. The current implementation usesis_zip_64()which returns true based onlarge_fileflag or other conditions, not just actual size values. This creates non-compliant ZIP files.
Recommended Actions
- Remove the debug println statement
- Fix the ZIP64 field logic to only use 0xFFFFFFFF when actual field values exceed the threshold, not when
is_zip_64()returns true - Test with both small files that have
large_file=trueand actual large files to ensure correct behavior
The PR description mentions "two extra fields" which appears to be addressed by the write.rs change that avoids calling zip64_extra_field_block() twice, which is good. However, the core logic issue must be fixed first.
You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the ZIP64 handling logic, introducing an is_zip_64 helper function and modifying central directory block creation to resolve ZIP64 header issues. However, it introduces a significant logic error: header fields are globally set to the ZIP64 escape value (0xFFFFFFFF) when any field requires ZIP64, without ensuring all such fields are included in the ZIP64 extra field block. This, along with writing duplicate ZIP64 extra fields to the central directory, can lead to corrupted archives and security bypasses. Additionally, a debug print statement needs removal.
I am having trouble creating individual review comments. Click here to see my feedback.
src/types.rs (972-995)
When is_zip_64() is true, the code sets compressed_size, uncompressed_size, and offset to 0xFFFFFFFF (via spec::ZIP64_BYTES_THR as u32) in the ZipCentralEntryBlock. However, the corresponding ZIP64 extra field (created by zip64_extra_field_block()) only includes these fields if they individually exceed the threshold or if large_file is set. This creates an inconsistent ZIP file that violates the specification (APPNOTE.TXT 4.5.3), as fields marked with 0xFFFFFFFF in the main header MUST be present in the ZIP64 extra field. This will cause many ZIP readers to fail to parse the file or misinterpret the data.
src/types.rs (956-968)
The logic in is_zip_64 is inconsistent with how the ZIP64 extra field is created in zip64_extra_field_block. Specifically:
- It uses
>instead of>=for size and offset comparisons. The ZIP specification uses0xFFFFFFFFas a flag, so values equal to this threshold also require ZIP64 handling. - It doesn't account for the
self.large_fileflag, which also forces the use of a ZIP64 extra field for file sizes.
This inconsistency can lead to incorrect or non-optimal ZIP archives. The logic should be aligned with zip64_extra_field_block to ensure that whenever a ZIP64 extra field is created, the main header fields are correctly set as placeholders.
pub(crate) fn is_zip_64(&self) -> bool {
// TODO: Also check disk number (unsupported for now)
self.large_file
|| self.uncompressed_size >= spec::ZIP64_BYTES_THR
|| self.compressed_size >= spec::ZIP64_BYTES_THR
|| self.header_start >= spec::ZIP64_BYTES_THR
}src/write.rs (2290)
This println! statement appears to be a debug leftover and should be removed before merging. It will otherwise print to standard output for users of this library.
|
This MR is incorrect |
|
I think we should wait on pmqs/zipdetails#28 If it's merged, we should then be able to remove the code < line 1000 in |
Co-authored-by: amazon-q-developer[bot] <208079219+amazon-q-developer[bot]@users.noreply.github.com> Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
|
Amazon AI is correct but our current implementation is not This MR needs work and testing Also, we will need to think about large_file option which is causing our implementation to be incorrect |
Two ZIP64 extra fields is wrong. You only want one. The second one looks ok. The first is corrupt |
| } else { | ||
| None | ||
| }; | ||
| // TODO: (unsupported for now) |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note
| } else { | ||
| None | ||
| }; | ||
| // TODO: (unsupported for now) |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note
|
Lines 977 to 982 in 5fca23d
What's the reason of this code ? Are you sure about |
|
Well, if they reach 4GiB compressed size before the EOCD is written, then they have to be so that the EOCD offset can be written. |
On the zip64 attributes some can be optionnal In a case of a small file, placed inside the Zip file over 4GiB, only the But |
Fix #617
Currenlty there is also two extra field, that's not correct I think