round-trip encoding for extra zero bytes#13
Conversation
|
Thank you for PR! Two notes though:
P.S. Some of source code lines are not formatted (according to cargo fmt). I have added |
There was a problem hiding this comment.
Pull request overview
This PR aims to preserve and round-trip “meaningless” 0x00 padding bytes that can appear before/between BER-TLV objects (per EMV guidance), so re-encoding after parsing can reproduce the original byte stream (important for EMV/DDA hash composition use cases).
Changes:
- Added
leading_zeroestracking onTlvand updated parsing to capture leading0x00bytes rather than discarding them. - Updated
to_vec()to emit any preserved leading zero bytes. - Adjusted quickcheck and unit tests to reflect the new round-trip behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/tlv.rs |
Stores and emits leading 0x00 padding around TLVs; updates parsing logic and unit tests. |
src/lib.rs |
Changes TlvError::InvalidTagNumber shape to support internal “is zero padding” detection. |
tests/lib.rs |
Updates quickcheck property to stop stripping leading zeros before comparing round-tripped bytes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .next() | ||
| .cloned() | ||
| .ok_or_else(|| TlvError::TruncatedTlv)?; | ||
| let first: u8 = iter.next().cloned().ok_or_else(|| TlvError::TruncatedTlv)?; |
There was a problem hiding this comment.
read_tag() no longer skips leading 0x00 bytes (previously it did). from_iter() now re-implements skipping/counting, but other callers like parse_tag_list() still call read_tag() directly, so their behavior changes to error on padding zeros. If padding zeros should still be tolerated there, consider keeping a “skip 0x00” variant of tag parsing for non-roundtrip use cases, or move the zero-skipping logic back into read_tag() behind a mode/flag.
| let first: u8 = iter.next().cloned().ok_or_else(|| TlvError::TruncatedTlv)?; | |
| // Skip leading 0x00 bytes (padding) before reading the actual tag. | |
| let first: u8 = loop { | |
| let byte = iter.next().cloned().ok_or_else(|| TlvError::TruncatedTlv)?; | |
| if byte == 0x00 { | |
| continue; | |
| } else { | |
| break byte; | |
| } | |
| }; |
| pub enum TlvError { | ||
| TruncatedTlv, | ||
| InvalidLength, | ||
| InvalidTagNumber, | ||
| InvalidTagNumber { is_zero: bool }, | ||
| TooShortBody { expected: usize, found: usize }, |
There was a problem hiding this comment.
Renaming InvalidTagNumber from a unit variant to InvalidTagNumber { is_zero: bool } is a public API breaking change for downstream code that pattern-matches on TlvError. Since is_zero is only used internally to detect/record leading padding bytes (and isn’t reflected in the Display message), consider handling that case without encoding internal details into the public error type (e.g., by peeking/consuming 0x00 bytes in the parser loop, or by using a private/internal error and mapping to InvalidTagNumber).
| pub fn to_vec(&self) -> Vec<u8> { | ||
| let mut out: Vec<u8> = (self.tag as u64) | ||
| let mut out: Vec<u8> = vec![0; self.leading_zeroes]; | ||
|
|
||
| out.append(&mut (self.tag as u64) | ||
| .to_be_bytes() | ||
| .iter() | ||
| .skip_while(|&&x| x == 0) | ||
| .cloned() | ||
| .collect(); | ||
| .collect()); | ||
|
|
||
| out.append(&mut self.val.encode_len()); |
There was a problem hiding this comment.
to_vec() now prepends leading_zeroes, but Tlv::len()/Value::len() (used by encode_len() and by Tlv::new() when splitting a constructed Value::Val) do not account for those bytes. This makes constructed TLVs emit an incorrect length field and can break re-parsing/indexing when a child TLV has leading zeroes. Include leading_zeroes in the encoded-size calculation (and any other length math) or move the preserved padding out of the TLV byte stream produced by to_vec() so length remains consistent.
|
(Sorry for the delay here, will try to get through the rest of this tomorrow!) |
Unfortunately, I've realized that having round-trip encoding of extra zero bytes is important e.g. for being able to correctly compose the DDA data for checking the hash in the decoded ICC certificate.
It's not super elegant to add a fourth field to a "TLV" element, but I don't know a better way to do it. And I remembered to update the tests this time...