chore: release v0.1.2 by MagicalTux · Pull Request #4 · OxideAV/oxideav-pdf

MagicalTux · 2026-05-10T06:03:15Z

🤖 New release

oxideav-pdf: 0.1.1 -> 0.1.2

Changelog

0.1.2 - 2026-05-12

Other

PDF /Sig annotation writer (ISO 32000-1 §12.7.4.5 + §12.8.1 + RFC 5652 §5 + §5.4 + §11.2)

reading-order layout pass over Tagged PDF StructTreeRoot (ISO 32000-1 §14.6 + §14.7 + §14.8)

simple-font /Encoding /Differences resolver wired into text extraction (ISO 32000-1 §9.6.6.1 + §D.2 + AGL v2.0)

linearization param dict + hierarchy validator + PDF/A signals

annotations beyond Link (Text/FreeText/Stamp/markup/geometry/Widget) + XMP packet field extraction (DC/XMP/PDF/PDF-A)

PDF outline (bookmarks) tree + Link annotations

CMS KARI X448 ECDH (RFC 7748 §5 + RFC 8410 §3 + RFC 8418 §2.1+§2.2)

JPEG passthrough on /DCTDecode Image XObjects (ISO 32000-1 §7.4.8 + §8.9)

PDF text extraction (ISO 32000-1 §9 + §9.10)

PDF /Sig annotation reader (ISO 32000-1 §12.7.4.5 + §12.8.1)

Added

Round 30: PDF /Sig annotation writer (ISO 32000-1 §12.7.4.5 +
§12.8.1 + RFC 5652 §5 + §5.4 + §11.2). Symmetric encoder side of
the round-21 reader + round-27 verifier: given an
[oxideav_scene::Scene] + a [Signer] + a signer-cert chain, the
new writer emits a signed PDF whose AcroForm contains a /FT /Sig
terminal field whose /V points at a signature dictionary
(/Type /Sig /Filter /Adobe.PPKLite /SubFilter /adbe.pkcs7.detached)
carrying valid /ByteRange placeholders + a hex-encoded CMS
SignedData ContentInfo blob. The classic "ByteRange-placeholder
fill-in" pattern of §12.8.1.1 is implemented end-to-end:

Step 1 — the base PDF is rendered via the existing
write_pdf_from_scene.

Step 2 — an incremental-update revision (§7.5.6) is appended that
overrides the Catalog with /AcroForm <ref>, plus an AcroForm
dict (/Fields [<sig-field-ref>] /SigFlags 3), a Sig form
field (/FT /Sig /T (Signature1)), and a Sig dictionary with
fixed-width /ByteRange (4 × 10-digit slots) +
/Contents <0…0> (8192 hex chars = 4096 raw bytes — enough
for any RSA-2048 / ECDSA-P256 SHA-256 SignedData with a single
signer + cert).

Step 3 — /ByteRange is patched in place with the actual offsets
(the four integers themselves are inside the signed range, so they
reach their final value BEFORE the hash is computed).

Step 4 — the bytes named by /ByteRange are SHA-256-hashed, the
hash is wrapped into a CAdES-BES-style signedAttrs SET
(contentType 1.2.840.113549.1.9.3 = id-data, messageDigest
1.2.840.113549.1.9.4 = SHA-256(signed) per RFC 5652 §11.2), the
SET is canonical-re-tagged from [0] IMPLICIT to the universal
SET tag per §5.4 and hashed, and the resulting digest is signed by
the Signer.

Step 5 — the signature is wrapped in a CMS SignedData
ContentInfo (version=1, single SignerInfo,
IssuerAndSerialNumber slot, full cert chain in the
SET-of-CertificateChoices field, detached eContent),
hex-encoded, and overwritten into the /Contents placeholder
(length-preserving — the bytes between < and > are the
EXCLUDED range under /ByteRange, so this write does not
invalidate the hash computed in step 4).

New public surface under oxideav_pdf::sig:

pub trait Signer { fn algorithm() -> SigningAlgorithm; fn sign(&self, tbs_hash: &[u8]) -> Result<Vec<u8>, PdfError>; }
— abstract signing primitive; user plugs in whatever crypto
stack they want (ring, hardware token, HSM, ...). The trait
receives a SHA-2 digest and returns wire-form signature octets
(PKCS#1 v1.5 padded big-endian for RSA, DER-encoded
Ecdsa-Sig-Value for ECDSA).

SigningAlgorithm { RsaPkcs1v15Sha256, EcdsaP256Sha256 } —
enum of the two algorithm slots round 30 ships; the writer
picks the right CMS digestAlgorithm (SHA-256) +
signatureAlgorithm (rsaEncryption / ecdsa-with-SHA256) OIDs
based on the implementor's choice.

RsaPkcs1v15Sha256Signer / EcdsaP256Sha256Signer —
reference Signer impls that wrap the in-crate rsa / p256
deps (no new crypto deps added for the writer).

SignerIdentity { issuer_der, serial, cert_chain } —
decoupled identity bundle; from_signer_cert_der(der) is the
convenience constructor for the typical single-cert
self-signed deployment.

SigWriter::new(scene, signer, identity).sign() -> Vec<u8> —
the builder.

sign_pdf_from_scene(scene, signer, identity) -> Vec<u8> —
one-shot convenience wrapper.

pkcs7_wrap_signed_data(algorithm, issuer_der, serial, cert_chain, signed_attrs_body, signature_bytes) -> Vec<u8> —
standalone CMS DER builder; useful when stitching a signed PDF
together at a lower level than SigWriter.

Six integration tests under tests/sig_writer_round30.rs cover:

RSA-PKCS#1 v1.5 + SHA-256 round-trip (writer → round-21
reader → round-20 verify_signature end-to-end).

ECDSA-P256 + SHA-256 round-trip.

/ByteRange placeholder filled correctly (start = 0, second
range starts at the > after a fixed 8192-byte-wide
/Contents gap, two ranges cover everything but the gap, last
byte of range 1 is <, first byte of range 2 is >).

Tamper-detection (flipping a body byte fails the
messageDigest cross-check per RFC 5652 §11.2).

qpdf --check accepts the RSA-signed PDF.

qpdf --check accepts the ECDSA-signed PDF.

Provenance: ISO 32000-1 §12.7.4.5 + §12.8.1 + §7.5.6 (incremental
updates) + RFC 5652 §5 + §5.4 + §11.1 (contentType attribute) +
§11.2 (messageDigest attribute) + RFC 5754 §2 (SHA-256 with NULL
params in CMS) + RFC 5753 §2.1 (ECDSA Ecdsa-Sig-Value SEQUENCE).
No third-party PDF / CMS source consulted.

Round 29: Reading-order layout pass over Tagged PDF
StructTreeRoot (ISO 32000-1 §14.6 + §14.7 + §14.8). New
oxideav_pdf::reader::layout::read_in_logical_order(reader) — and
the convenience DocumentReader::read_in_logical_order() — walks
the catalog's /StructTreeRoot /K tree and emits text runs in
author-intended reading order rather than the painter's raster
order. For a 2-column document, raster extraction interleaves
column 1's first row, column 2's first row, column 1's second row,
…; the round-29 pass walks [Sect_col1, Sect_col2] and emits all
of column 1 before any of column 2. The walker handles every leaf
shape ISO 32000-1 §14.7.4.4 defines:

Bare-integer MCID kids resolve against the inheritable /Pg
field on the nearest ancestor.

<</Type /MCR /Pg p /MCID m>> marked-content references override
the inherited /Pg, supporting Tagged tables whose rows draw
from multiple pages.

<</Type /OBJR …>> object references (annotations, not content)
are skipped — they carry no text.

Nested /StructElem kids (Sect inside Div inside …) recurse;
indirect refs are followed with a 64-deep cycle guard.
Documents without a /StructTreeRoot (or a malformed / empty
tree) fall back to the existing raster-order extraction with
LayoutMode::Raster set on the return so callers can branch.

The pass piggybacks on a round-29 addition to the round-22 text
walker: the new extract_text_marked(reader) (and matching
DocumentReader::marked_text_extraction()) emits every text run
alongside the marked-content /MCID it was painted under (ISO
32000-1 §14.6 — BDC / BMC / EMC operators). The walker
recognises BDC / BMC / EMC / MP / DP keywords and parses
the /MCID slot out of inline <</MCID n>> property dicts at the
top level. New public surfaces under oxideav_pdf::reader:

MarkedTextRun { run, mcid, page_obj_num, page_index }

PdfMarkedTextExtraction { runs }

LayoutMode { Tagged, Raster }

ReadingOrderText { mode, runs } (with flat_text())

Seven fixtures under tests/reading_order_round29.rs cover:
two-column tagged-PDF logical reordering vs. raster baseline,
non-tagged fallback, cross-page MCRs (/MCR /Pg ... /MCID ...),
marked-text MCID accounting, and nested /Sect > /P > MCID
recursion. No external library was consulted.

Round 28: Simple-font /Encoding /Differences resolver wired into
text extraction (ISO 32000-1 §9.6.6.1 + §D.2 + Adobe Glyph List v2.0
public document). When a simple Type1 / TrueType / Type3 font carries
an encoding dictionary (not just a name) the reader now overlays the
/Differences array onto the /BaseEncoding map before mapping bytes
back to Unicode. Three new public surfaces under
oxideav_pdf::reader::encoding:

parse_encoding_differences(arr) -> EncodingDifferences walks the
flat [N name1 name2 … M nameK …] form per §9.6.6.1 — numeric
tokens reset the running code, names land at consecutive slots,
unknown tokens are tolerated. Honours Object::Integer AND
Object::Real numeric forms.

apply_encoding_differences(&base, &diffs) -> EncodingMap overlays
one parsed array on top of any of the six named BaseEncoding
variants (WinAnsi / MacRoman / MacExpert / Standard /
Symbol / ZapfDingbats). Unknown glyph names leave the slot
empty so the decoder emits U+FFFD as a marker (matching what
pdftotext --raw does for un-resolvable glyphs).

EncodingMap::from_base(BaseEncoding) ships a 256-entry table per
Annex D.2 / D.4 / D.5 / D.6 plus the Adobe Type 1 Standard
encoding. Multi-character glyph expansions (/fi → "fi", /fl →
"fl") are accommodated; the table slot is a short String rather
than a single char.

The Adobe Glyph List subset shipped with the resolver covers the
PostScript Latin character set, common Greek letters, smart-quote /
dash / fraction set, math operators, arrows, and the /fi and /fl
ligatures — about 320 glyph names. Extension to the full ~4280-line
AGL is round-29+. Glyph list staged under
docs/document/pdf/agl/subset.txt and the README there cites the AGL
v2.0 public-document source. Seven new fixtures under
tests/encoding_differences_round28.rs cover smart-quote overrides,
Greek glyph remap, /fi / /fl ligature expansion, multi-segment
arrays with running-code resets, unknown-glyph replacement-char
fallback, empty /Differences, and /MacRomanEncoding base
encoding. Three of them feed the fixture PDF to a system pdftotext
binary when available and assert the extracted text contains the
expected substring.

Round 27: Linearization Parameter Dictionary reader + Object
Hierarchy validator + PDF/A conformance detection beyond XMP
(ISO 32000-1 §F.2 + §7.7.2 + §7.7.3 / ISO 19005-1..4 §6.x).
Three new reader-side surfaces:

parse_linearization_dict(bytes) -> Result<Option<LinearizationParams>>
and DocumentReader::linearization() parse the /Linearized 1 /L /H [off len] /O /E /N /T first-object dictionary every Fast-Web-View
PDF emits in its head (§F.3.3 — entirely within first 1024 bytes).
Round 9's writer-side emission now has its reader-side complement.
LinearizationParams::verify(&bytes) cross-checks /L against the
actual file length and bounds-checks /T, /E, /H. The parser
returns Ok(None) for plain (non-linearized) files so callers can
branch on the Option. Hint-table decoding (Annex F.4) is round 28+.

verify_pdf_hierarchy(reader) -> Result<HierarchyReport> (and
DocumentReader::verify_hierarchy()) walks Catalog → Pages → Page
and collects every spec divergence as a HierarchyIssue with
IssueSeverity::Error or Warning: Catalog /Type + /Pages
presence (§7.7.2 Table 28), /Pages node /Type / /Kids /
/Count (§7.7.3.2 Table 29), /Page leaf /Parent back-reference

/MediaBox presence (§7.7.3.3 Table 30), cycle detection with
a 32-hop depth guard. Never aborts the walk — surfaces every issue
at once so a downstream tool can report.is_valid() or filter by
severity.

read_pdf_pdfa_signals(reader) -> Result<PdfACatalogSignals> (and
DocumentReader::pdfa_signals() + ::pdfa_conformance()) surface
the structural PDF/A signals from the catalog independently of the
XMP pdfaid:part claim: /MarkInfo /Marked|UserProperties|Suspects,
/StructTreeRoot presence, /Lang, /OutputIntents count, and
/Metadata presence. PdfAConformance::from_signals_and_xmp cross-
verifies the XMP-declared part + conformance against the structural
prerequisites ISO 19005-1 §6.2.2 / §6.7 / §6.8 require — an A-level
claim missing /MarkInfo /Marked true or /StructTreeRoot flags
claim_inconsistent = true with a free-form diagnostic.
Tested end-to-end with +33 tests (15 integration in tests/round27.rs

10 unit in src/reader/linearize.rs + 4 unit in
src/reader/hierarchy.rs + 7 unit in src/reader/pdfa.rs).

Round 26: Annotations beyond Link + XMP packet field extraction
(ISO 32000-1 §12.5.6 Tables 169..209 + §14.3.2 / Adobe XMP Spec
2012 / ISO 16684-1 / ISO 19005-1..3 §6.x). New reader entry
DocumentReader::annotations() (free function: read_pdf_annotations)
walks every page's /Annots array and surfaces every entry as a
PdfAnnotation. Per-subtype payload covers /Text (§12.5.6.4 Table
172 — /Open, /Name icon, /State, /StateModel), /FreeText
(§12.5.6.6 Table 174 — /DA, /Q quadding, /RC, /IT intent),
/Stamp (§12.5.6.13 Table 184 — icon name), the four text-markup
variants /Highlight / /Underline / /Squiggly / /StrikeOut
(§12.5.6.10 Table 179 — /QuadPoints), /Square + /Circle
(§12.5.6.8 Table 177 — /IC, /RD), /Link (re-uses round-25's
go-to / URI dispatch), and /Widget (§12.5.6.19 Table 188 + §12.7.4
Table 220 — /FT, /T, /V). Unknown subtypes surface as
AnnotationKind::Other { subtype }. Common Table 164 fields
(/Rect, /Contents, /NM, /M, /F, /C, /Border) are
decoded for every subtype.
New DocumentReader::xmp_packet() (and XmpPacket::parse(bytes) for
callers with the raw bytes already in hand) parses the document-level
XMP packet round-19 surfaces into a structured view of the most-used
Dublin Core (dc:title through rdf:Alt / dc:creator through
rdf:Seq / dc:subject rdf:Bag / dc:rights / dc:format),
XMP Basic (xmp:CreateDate / xmp:ModifyDate / xmp:MetadataDate
/ xmp:CreatorTool), PDF schema (pdf:Producer / pdf:Keywords /
pdf:PDFVersion / pdf:Trapped), and PDF/A identification schema
(pdfaid:part / pdfaid:conformance) fields. Element-body and
attribute forms both recognised; the standard five XML entities
(& / < / > / " / ') plus numeric
character references decode. XmpPacket::is_pdf_a() and
pdf_a_conformance() collapse the pair into a 1B-style designator
for PDF/A conformance detection. Tested end-to-end with +36 tests
(19 integration in tests/annotations_round26.rs covering every
subtype dispatch, common-field decode, page-without-annots baseline,
unified-reader round-trip of the writer's Link annotations, XMP
Dublin Core / XMP Basic / PDF / PDF/A identification, attribute-form
XMP, XML-entity decode, and absent-XMP None; +6 unit tests in
src/reader/annotation.rs and +11 unit tests in src/reader/xmp.rs).

Round 25: Document outline (bookmarks) + Link annotations
(ISO 32000-1 §12.3.3 Tables 152+153 + §12.5.6.5 Table 173 + §12.3.2
Table 151 destinations). New writer entry points
write_pdf_from_scene_with_outlines + …_with_outlines_and_links
attach a /Outlines tree to the catalog and per-page /Annots [/Subtype /Link] arrays without disturbing the existing single-/
multi-page entry points. New reader functions read_pdf_outline

read_pdf_links walk the bookmark tree (the doubly-linked
/First//Last//Next//Prev shape collapses back into a
parent-owned children Vec) and per-page link list. Destinations
cover all eight Table 151 forms — Xyz / Fit / FitH / FitV
/ FitR / FitB / FitBH / FitBV — with null retain-current
semantics on the optional numerics. Link targets cover both
internal /Dest <explicit-array> go-to and external
/A << /S /URI /URI (...) >> action forms. Outline /Count
honours the open / closed sign per Table 153 (open ⇒
+visible_descendants; closed ⇒ -|hidden_descendants|), and the
reader's OutlineNode::is_open() / descendant_count() helpers
expose the same convention to callers. Tested end-to-end with
+19 tests (16 integration in tests/outline_round25.rs covering
three-bookmark catalog, nested open/closed chapters, every dest
variant, Unicode title, URI + go-to link, multi-page link
grouping, out-of-range writer rejection, combined outline+link
round-trip, and empty-input baseline; +13 unit tests across
src/outline.rs + src/reader/outline.rs + src/reader/link.rs).

Round 24: CMS KARI X448 ECDH (RFC 7748 §5 + RFC 8410 §3 + RFC 8418
§2.1 + §2.2). New KariCurve::X448 joins the existing P-256/P-384/
P-521/X25519 dispatch — id-X448 (OID 1.3.101.111), 56-byte raw
u-coordinate keys, 224-bit security level. Default KDF binding is
X9.63-SHA-512 (security-strength match); HKDF SHA-256/384/512 are
also valid via the new KariRecipient::x448_hkdf_* constructors.
Reader (unwrap_kari / read_pdf_to_scene_with_certificate) and
writer (write_pdf_from_scene_pubsec_kari) both handle X448 KARI
envelopes through the existing entry points. RFC 7748 §6.2
Alice/Bob test vector cross-checked byte-for-byte. Backed by the
pure-Rust x448 (RustCrypto / ed448-goldilocks) crate.

Round 23: JPEG passthrough on /Filter /DCTDecode Image XObjects
(ISO 32000-1 §7.4.8 + §8.9). New DocumentReader::image_xobjects()
walks every page's /Resources /XObject subdict and surfaces every
Image XObject whose final filter is /DCTDecode. The returned
PdfImageXObject carries the unmodified JPEG bytes (ready for any
JPEG decoder), the /Width / /Height, the /ColorSpace
(DeviceRGB / DeviceCMYK / DeviceGray / Indexed / Other),
and the /BitsPerComponent. Wrapping /ASCII85Decode /
/ASCIIHexDecode / /FlateDecode filters preceding /DCTDecode are
unwrapped before the JPEG payload is returned. Cross-checked against
pdfimages -all (poppler-utils) as a black-box validator — extracted
bytes are byte-identical to both the source JPEG and pdfimages's
dump.

Round 22: text extraction. DocumentReader::text_extraction() walks
every page's content stream and emits TextRuns (text + position +
font name + font size) for Tj / TJ / ' / " operators. Maps
encoded glyphs back to Unicode through embedded /ToUnicode CMaps
(bfchar / bfrange per ISO 32000-1 §9.10.3), Identity-H Type 0
CIDs, WinAnsiEncoding, and MacRomanEncoding (Annex D.2). Cross-checked
against pdftotext (poppler) as a black-box validator.

This PR was generated with release-plz.

MagicalTux force-pushed the release-plz-2026-05-10T06-03-13Z branch 8 times, most recently from 4104ffc to 33dd3c0 Compare May 11, 2026 23:51

chore: release v0.1.2

b5282ac

MagicalTux force-pushed the release-plz-2026-05-10T06-03-13Z branch from 33dd3c0 to b5282ac Compare May 12, 2026 07:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: release v0.1.2#4

chore: release v0.1.2#4
MagicalTux wants to merge 1 commit into
masterfrom
release-plz-2026-05-10T06-03-13Z

MagicalTux commented May 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MagicalTux commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 New release

0.1.2 - 2026-05-12

Other

Added

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MagicalTux commented May 10, 2026 •

edited

Loading