Skip to content

Conversation

@Its-Just-Nans
Copy link
Member

Fix #617

Currenlty there is also two extra field, that's not correct I think

125C00A50 CENTRAL HEADER #13    02014B50 (33639248)
125C00A54 Created Zip Spec      2D (45) '4.5'
125C00A55 Created OS            03 (3) 'Unix'
125C00A56 Extract Zip Spec      2D (45) '4.5'
125C00A57 Extract OS            00 (0) 'MS-DOS'
125C00A58 General Purpose Flag  0000 (0)
125C00A5A Compression Method    0000 (0) 'Stored'
125C00A5C Modification Time     5C461814 (1548097556) 'Thu Feb  5 20:00:40 2026'
125C00A60 CRC                   540AD08C (1409994892)
125C00A64 Compressed Size       FFFFFFFF (4294967295)
125C00A68 Uncompressed Size     FFFFFFFF (4294967295)
125C00A6C Filename Length       003C (60)
125C00A6E Extra Length          0038 (56)
125C00A70 Comment Length        0000 (0)
125C00A72 Disk Start            0000 (0)
125C00A74 Int File Attributes   0000 (0)
          [Bit 0]               0 'Binary Data'
125C00A76 Ext File Attributes   81ED0000 (2179792896)
          [Bits 16-24]          01ED (493) 'Unix attrib: rwxr-xr-x'
          [Bits 28-31]          08 (8) 'Regular File'
125C00A7A Local Header Offset   FFFFFFFF (4294967295)
125C00A7E Filename              'b803dca6-5fbc-57aa-be7f-cdf25bc43471/SUBDIR2/random_5M_9.bin'
125C00ABA Extra ID #1           0001 (1) 'ZIP64'
125C00ABC   Length              0018 (24)
125C00ABE   Uncompressed Size   00000000FFFFFFFF (4294967295)
125C00AC6   Compressed Size     00000000FFFFFFFF (4294967295)
125C00ACE   Offset to Local Dir 000000010680042A (4404020266)
125C00AD6 Extra ID #2           0001 (1) 'ZIP64'
125C00AD8   Length              0018 (24)
125C00ADA   Uncompressed Size   000000001F400000 (524288000)
125C00AE2   Compressed Size     000000001F400000 (524288000)
125C00AEA   Offset to Local Dir 000000010680042A (4404020266)

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Its-Just-Nans, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue with the generation of ZIP64 central directory headers, specifically concerning the correct signaling of ZIP64 extensions. It introduces a mechanism to accurately determine when ZIP64 extensions are necessary for files exceeding standard ZIP limits and ensures that the central directory header properly indicates the presence of ZIP64 extra fields. Additionally, it refines the process of writing these extra fields to prevent potential errors or malformations, leading to more compliant ZIP archives.

Highlights

  • Corrected ZIP64 Central Header Generation: Implemented logic to properly identify files requiring ZIP64 extensions and set the corresponding 32-bit size/offset fields in the central directory header to 0xFFFFFFFF to signal the presence of ZIP64 extra fields, as per the ZIP specification.
  • Optimized ZIP64 Extra Field Handling: Refactored the write_central_directory_header function to cache the result of zip64_extra_field_block() to ensure consistent calculation of extra field length and writing of the extra field, potentially resolving issues with duplicate or malformed ZIP64 extra fields.
  • Improved Path Display in Example: Updated the write_dir.rs example to use Path::display() for println! macros, providing more user-friendly and correct path representations for non-UTF-8 paths.
Changelog
  • examples/write_dir.rs
    • Changed println! macros to use path.display() and path_stripped.display() for better path representation.
    • Updated error message for non-UTF-8 paths to use path_stripped.display().
  • src/types.rs
    • Added is_zip_64 method to ZipFileData to check if a file requires ZIP64 extensions based on its compressed size, uncompressed size, or header start offset.
    • Modified ZipFileData::block to conditionally set compressed_size, uncompressed_size, and offset to spec::ZIP64_BYTES_THR (0xFFFFFFFF) if the file is ZIP64, otherwise truncating to u32.
  • src/write.rs
    • Stored the result of file.zip64_extra_field_block() in a local variable zip64_extra_field_block to avoid redundant calls and ensure consistency when calculating extra field length and writing the extra field.
    • Added a debug println!("UPDATE LOCAL"); in update_local_zip64_extra_field.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@amazon-q-developer amazon-q-developer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR attempts to fix ZIP64 central header handling but contains critical issues that must be addressed:

Critical Issues Found

  1. Debug Code Left In: Line 2290 in src/write.rs contains a debug println! statement that must be removed before merging.

  2. ZIP Specification Violation: The logic in block() method (lines 972-995 in src/types.rs) incorrectly sets compressed_size, uncompressed_size, and offset to 0xFFFFFFFF for ALL ZIP64 files, regardless of actual file sizes. According to the ZIP specification (APPNOTE 4.4.1.24), these fields should only be set to 0xFFFFFFFF when the actual values exceed the 32-bit threshold. The current implementation uses is_zip_64() which returns true based on large_file flag or other conditions, not just actual size values. This creates non-compliant ZIP files.

Recommended Actions

  • Remove the debug println statement
  • Fix the ZIP64 field logic to only use 0xFFFFFFFF when actual field values exceed the threshold, not when is_zip_64() returns true
  • Test with both small files that have large_file=true and actual large files to ensure correct behavior

The PR description mentions "two extra fields" which appears to be addressed by the write.rs change that avoids calling zip64_extra_field_block() twice, which is good. However, the core logic issue must be fixed first.


You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the ZIP64 handling logic, introducing an is_zip_64 helper function and modifying central directory block creation to resolve ZIP64 header issues. However, it introduces a significant logic error: header fields are globally set to the ZIP64 escape value (0xFFFFFFFF) when any field requires ZIP64, without ensuring all such fields are included in the ZIP64 extra field block. This, along with writing duplicate ZIP64 extra fields to the central directory, can lead to corrupted archives and security bypasses. Additionally, a debug print statement needs removal.

I am having trouble creating individual review comments. Click here to see my feedback.

src/types.rs (972-995)

security-high high

When is_zip_64() is true, the code sets compressed_size, uncompressed_size, and offset to 0xFFFFFFFF (via spec::ZIP64_BYTES_THR as u32) in the ZipCentralEntryBlock. However, the corresponding ZIP64 extra field (created by zip64_extra_field_block()) only includes these fields if they individually exceed the threshold or if large_file is set. This creates an inconsistent ZIP file that violates the specification (APPNOTE.TXT 4.5.3), as fields marked with 0xFFFFFFFF in the main header MUST be present in the ZIP64 extra field. This will cause many ZIP readers to fail to parse the file or misinterpret the data.

src/types.rs (956-968)

high

The logic in is_zip_64 is inconsistent with how the ZIP64 extra field is created in zip64_extra_field_block. Specifically:

  1. It uses > instead of >= for size and offset comparisons. The ZIP specification uses 0xFFFFFFFF as a flag, so values equal to this threshold also require ZIP64 handling.
  2. It doesn't account for the self.large_file flag, which also forces the use of a ZIP64 extra field for file sizes.

This inconsistency can lead to incorrect or non-optimal ZIP archives. The logic should be aligned with zip64_extra_field_block to ensure that whenever a ZIP64 extra field is created, the main header fields are correctly set as placeholders.

    pub(crate) fn is_zip_64(&self) -> bool {
        // TODO: Also check disk number (unsupported for now)
        self.large_file
            || self.uncompressed_size >= spec::ZIP64_BYTES_THR
            || self.compressed_size >= spec::ZIP64_BYTES_THR
            || self.header_start >= spec::ZIP64_BYTES_THR
    }

src/write.rs (2290)

high

This println! statement appears to be a debug leftover and should be removed before merging. It will otherwise print to standard output for users of this library.

@Its-Just-Nans
Copy link
Member Author

This MR is incorrect

 The order of the fields in the zip64 extended 
      information record is fixed, but the fields MUST
      only appear if the corresponding Local or Central
      directory record field is set to 0xFFFF or 0xFFFFFFFF.

@Its-Just-Nans
Copy link
Member Author

Its-Just-Nans commented Feb 6, 2026

I think we should wait on pmqs/zipdetails#28

If it's merged, we should then be able to remove the code < line 1000 in types.rs (like is_zip_64)

@Its-Just-Nans Its-Just-Nans marked this pull request as draft February 6, 2026 04:07
Its-Just-Nans and others added 3 commits February 5, 2026 22:59
Co-authored-by: amazon-q-developer[bot] <208079219+amazon-q-developer[bot]@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
@Its-Just-Nans
Copy link
Member Author

Amazon AI is correct but our current implementation is not

This MR needs work and testing

Also, we will need to think about large_file option which is causing our implementation to be incorrect

@pmqs
Copy link

pmqs commented Feb 6, 2026

Fix #617

Currenlty there is also two extra field, that's not correct I think

125C00A50 CENTRAL HEADER #13    02014B50 (33639248)
125C00A54 Created Zip Spec      2D (45) '4.5'
125C00A55 Created OS            03 (3) 'Unix'
125C00A56 Extract Zip Spec      2D (45) '4.5'
125C00A57 Extract OS            00 (0) 'MS-DOS'
125C00A58 General Purpose Flag  0000 (0)
125C00A5A Compression Method    0000 (0) 'Stored'
125C00A5C Modification Time     5C461814 (1548097556) 'Thu Feb  5 20:00:40 2026'
125C00A60 CRC                   540AD08C (1409994892)
125C00A64 Compressed Size       FFFFFFFF (4294967295)
125C00A68 Uncompressed Size     FFFFFFFF (4294967295)
125C00A6C Filename Length       003C (60)
125C00A6E Extra Length          0038 (56)
125C00A70 Comment Length        0000 (0)
125C00A72 Disk Start            0000 (0)
125C00A74 Int File Attributes   0000 (0)
          [Bit 0]               0 'Binary Data'
125C00A76 Ext File Attributes   81ED0000 (2179792896)
          [Bits 16-24]          01ED (493) 'Unix attrib: rwxr-xr-x'
          [Bits 28-31]          08 (8) 'Regular File'
125C00A7A Local Header Offset   FFFFFFFF (4294967295)
125C00A7E Filename              'b803dca6-5fbc-57aa-be7f-cdf25bc43471/SUBDIR2/random_5M_9.bin'
125C00ABA Extra ID #1           0001 (1) 'ZIP64'
125C00ABC   Length              0018 (24)
125C00ABE   Uncompressed Size   00000000FFFFFFFF (4294967295)
125C00AC6   Compressed Size     00000000FFFFFFFF (4294967295)
125C00ACE   Offset to Local Dir 000000010680042A (4404020266)
125C00AD6 Extra ID #2           0001 (1) 'ZIP64'
125C00AD8   Length              0018 (24)
125C00ADA   Uncompressed Size   000000001F400000 (524288000)
125C00AE2   Compressed Size     000000001F400000 (524288000)
125C00AEA   Offset to Local Dir 000000010680042A (4404020266)

Two ZIP64 extra fields is wrong. You only want one.

The second one looks ok. The first is corrupt

} else {
None
};
// TODO: (unsupported for now)

Check notice

Code scanning / devskim

A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note

Suspicious comment
} else {
None
};
// TODO: (unsupported for now)

Check notice

Code scanning / devskim

A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note

Suspicious comment
// if self.header_start >= spec::ZIP64_BYTES_THR {
// return true;
// }
// // TODO: Also disk number (unsupported for now)

Check notice

Code scanning / devskim

A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note

Suspicious comment
@Its-Just-Nans
Copy link
Member Author

@Pr0methean

zip2/src/write.rs

Lines 977 to 982 in 5fca23d

// Check if we're close to the 4GB boundary and force ZIP64 if needed
// This ensures we properly handle appending to files close to 4GB
if header_start > spec::ZIP64_BYTES_THR {
// Files that start on or past the 4GiB boundary are always ZIP64
options.large_file = true;
}

What's the reason of this code ?

Are you sure about Files that start on or past the 4GiB boundary are always ZIP64 ?

@Pr0methean
Copy link
Member

Well, if they reach 4GiB compressed size before the EOCD is written, then they have to be so that the EOCD offset can be written.

@Its-Just-Nans
Copy link
Member Author

Its-Just-Nans commented Feb 8, 2026

Well, if they reach 4GiB compressed size before the EOCD is written, then they have to be so that the EOCD offset can be written.

On the zip64 attributes some can be optionnal

In a case of a small file, placed inside the Zip file over 4GiB, only the local_offset needs to be in the zip64 attributes.

But large_file is forcing the compressed size and uncompressed size to be in the zip64 attributes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zipfile >4GB with complex dir tree is created incorrectly

3 participants