Pure-Rust PDF writer + reader for the oxideav framework. The
writer emits PDF 1.4 vector documents from
VectorFrame /
Scene inputs (paths stay paths,
fills stay fills); the reader walks bytes back into a Scene, with
optional decryption for password-protected files. Zero C dependencies.
Part of the oxideav framework — a pure-Rust media stack. Codec, container, and filter crates are implemented from the spec (no C codec libraries linked or wrapped, no *-sys crates).
- Paths:
MoveTo(m),LineTo(l),CubicCurveTo(c),QuadCurveTo(lifted to cubic via the2/3 * (control - endpoint)trick),ArcTo(flattened to cubic per SVG 1.1 Appendix F.6.5),Close(h). - Fills:
Paint::Solid(DeviceRGBsc),Paint::LinearGradient(axial pattern shading,Pattern Type 2+Function Type 2),Paint::RadialGradient(radial shading,Function Type 3). - Strokes: width (
w), cap (J), join (j), miter limit (M), dash pattern (d). - Transforms: every
Group::transformemits onecmoperator. - Groups:
q ... Qsave/restore brackets around children. Group opacity becomes anExtGStateresource referenced via/GSx gs. - Clip paths: emitted before the children's content stream as
W n(orW* nfor even-odd fill rule). - Fill rules:
NonZero(f/B) vs.EvenOdd(f*/B*). - Embedded raster:
ImageRefwhose underlyingVideoFrameis RGBA8 lands as a FlateDecodeImageXObject and is painted withDo.
The reader handles password-protected PDFs under the standard security handler across the full revision range ISO 32000 defines:
- R=2 — RC4-40 (V=1,
Length=40). - R=3 — RC4-128 (V=2,
Length=128). - R=4 — AES-128 CBC or RC4-128, picked from the crypt-filter
CFM(AESV2vsV2). - R=5 — AES-256 CBC, V=5,
CFM=AESV3. Adobe extension level 3 (PDF 1.7); plain SHA-256 password derivation with validation + key salts. - R=6 — AES-256 CBC, V=5,
CFM=AESV3. ISO 32000-2:2020 (PDF 2.0); iterated SHA-256/384/512 hash chain (Algorithm 2.B) plus/Permsblock validation (Algorithm 13).
Both user and owner passwords authenticate (Algorithms 6 + 7 for R≤4; Algorithms 11 + 12 for R≥5); the default empty user password is tried first so PDFs encrypted "just for permission flags" open with no caller intervention. Strings and stream payloads are decrypted via per-object keys (Algorithm 1) for R≤4 and via the file key directly (no per-object derivation) for R≥5.
let pdf = std::fs::read("locked.pdf")?;
// Default API tries the empty user password.
match oxideav_pdf::read_pdf_to_scene(&pdf) {
Ok(scene) => println!("opened: {} pages", scene.pages.unwrap().len()),
Err(_) => {
// Password-protected — supply one.
let scene = oxideav_pdf::read_pdf_to_scene_with_password(&pdf, b"hunter2")?;
}
}
# Ok::<(), Box<dyn std::error::Error>>(())Per-stream crypt-filter overrides land in a follow-up round.
The reader and writer both handle public-key-encrypted PDFs under
the adbe.pkcs7.s3 / s4 / s5 SubFilters of the public-key
security handler (ISO 32000-1 §7.6.4 + ISO 32000-2 §7.6.5):
adbe.pkcs7.s3— RC4-40, V=1, SHA-1 file-key derivation.adbe.pkcs7.s4— RC4-128, V=2, SHA-1.adbe.pkcs7.s5, V=4 — RC4-128 or AES-128 CBC viaCFM(V2 / AESV2).adbe.pkcs7.s5, V=5 — AES-256 CBC,CFM=AESV3, SHA-256.
The trailer's /Recipients array (or /CF /<StmF> /Recipients for
s5) carries one CMS EnvelopedData (RFC 5652 §6.1) per access-
permission set; each envelope's KeyTransRecipientInfo SET wraps the
content-encryption key with RSAES-PKCS1-v1_5 to a recipient's RSA
public key. The reader matches by either IssuerAndSerialNumber (CMS
v0) or SubjectKeyIdentifier (CMS v2 — RFC 5280 §4.2.1.2 method 1
SHA-1 of the SPKI BIT STRING contents), RSA-decrypts the wrapped CEK,
decrypts the envelope contents (RC4 / AES-128 / AES-256 CBC), then
derives the file encryption key per §7.6.4.3 / §7.6.5.3.
use oxideav_pdf::{read_pdf_to_scene_with_certificate, PubSecCredential};
let cert_der = std::fs::read("user.cert.der")?;
let pkcs8_der = std::fs::read("user.key.pkcs8.der")?;
let credential = PubSecCredential::from_der(&cert_der, &pkcs8_der)?;
let scene = read_pdf_to_scene_with_certificate(&pdf_bytes, &credential)?;
# Ok::<(), Box<dyn std::error::Error>>(())Round 11 lands the symmetric encoder side: the writer emits public-key-encrypted PDFs that round-trip through the reader.
use oxideav_pdf::{
write_pdf_from_scene_pubsec_encrypted, PubSecEncoderConfig, PubSecRecipient,
};
// One recipient — IssuerAndSerial form.
let recipient = PubSecRecipient::from_issuer_and_serial(
issuer_der, // recipient cert's `issuer` SEQUENCE bytes
serial_bytes, // recipient cert's serial INTEGER body
rsa_public_key,
);
let cfg = PubSecEncoderConfig::pkcs7_s5_v5_aes256(vec![recipient]);
let pdf = write_pdf_from_scene_pubsec_encrypted(&scene, &cfg)?;
# Ok::<(), oxideav_pdf::PdfError>(())PubSecRecipient also exposes from_subject_key_identifier(ski, key)
for the CMS v2 form. Round 12 adds per-crypt-filter recipient
lists — write_pdf_from_scene_pubsec_multi_cf + PubSecMultiCfConfig
PubSecCfGroupemit a doc with multiple permission sets (each its own envelope), andopen_with_certificate_with_permissionssurfaces the matched recipient's permission mask. Round 12 lands the CMS KARI decoder (RFC 5652 §6.2.2) — KeyAgree (ECDH/DH) recipients parse structurally. Round 14 closes the unwrap: P-256 ECDH + RFC 5753 §7.1.2 X9.63-SHA-256 KDF + RFC 3394 AES Key Wrap (128/192/256-bit) for thedhSinglePass-stdDH-sha256kdf-schemeKEA OID. Round 15 extends the curve set: P-384 (dhSinglePass-stdDH-sha384kdf-scheme, X9.63-SHA-384) and X25519 (RFC 8418 §2.1, secg-scheme…sha256kdf+id-X25519) join P-256 — passPubSecCredential::from_parsed_ec(cert, KariCurve::P384, scalar)(orP256/X25519) and the KARI envelope opens through the sameread_pdf_to_scene_with_certificateentry point as KTRI. Round 15 also lands the writer-side KARI encode:write_pdf_from_scene_pubsec_kari(scene, &PubSecKariConfig)mirrors the round-11 KTRI writer — eachKariRecipient { curve, … }becomes one CMS KARI envelope with AES-256-WRAP. Round 16 lands P-521 (dhSinglePass-stdDH-sha512kdf-scheme, X9.63-SHA-512) + RFC 8418 §2.2 HKDF binding for X25519 (dhSinglePass-stdDH-hkdf-sha256/384/512-scheme, smime-alg 19/20/21). Round 24 closes the RFC 8418 curve set with X448 (RFC 7748 §5 / RFC 8410 §3 —id-X4481.3.101.111, 56-byte raw u-coordinate, 224-bit security level): passKariCurve::X448and the same writer + reader entry points handle it. Default KDF is X9.63-SHA-512 (security-strength match); HKDF SHA-256/384/512 are also valid via theKariRecipient::x448_hkdf_*constructors. Cross-checked against the RFC 7748 §6.2 Alice/Bob shared-secret vector byte-for-byte. Round 17 closes the long-term-cert originator gap: when a KARI envelope'sOriginatorIdentifierOrKeyisIssuerAndSerialorSubjectKeyIdentifierrather than the in-bandOriginatorPublicKey, the recipient resolves the originator cert through aTrustStore— pass it viaread_pdf_to_scene_with_certificate_and_trust_store(pdf, &cred, &store). Round 17 also adds read-only decode for legacy RC2-CBC (RFC 2268 + RFC 3217) and DES-EDE3-CBC (3DES, RFC 3370 §5.2) envelope content algorithms so PDF 2.0-deprecated archives still open; no encode-side support — the writer always uses AES. Round 18 surfaces previously-discarded CMS metadata: the envelope'sOriginatorInfo(RFC 5652 §10.2.1 —certs[]/crls[]) is now exposed viaEnvelopedData::originator_info(), and theRecipientKeyIdentifier's OPTIONALdate(GeneralizedTime) +other(OtherKeyAttribute) fields are captured by the parser. NewTrustStore::find_with_temporal_validity(ski, instant)uses the RKIDdateto pick the cert generation that was active when the envelope was authored — useful for long-lived archives where multiple cert generations exist for the same SKI. TheCertificateparser now also extracts thevaliditywindow (notBefore / notAfter), normalisingUTCTimetoGeneralizedTimeper RFC 5280 §4.1.2.5.1's 1950..2049 pivot for direct byte-comparison. Round 19 ships two orthogonal additions. Document-level XMP/Metadatastream end-to-end (ISO 32000-1 §14.3.2 + Adobe XMP Spec 2012): writer entrywrite_pdf_from_scene_with_xmp(scene, xmp_bytes)attaches the raw XMP RDF/XML packet to the catalog as a/Type /Metadata /Subtype /XMLstream (no/Filter); reader accessorDocumentReader::xmp_metadata()returnsSome(bytes)for documents that carry one. CMSSignedDataparser scaffolding (RFC 5652 §5 — PKCS#7):pubsec::signed_data::parse_signed_datadecodesid-signedDatablobs into typedSignedData { digest_algorithms, encap_content, certs, crls, signer_infos }+SignerInfo(sid, digest / signature OIDs, signed / unsigned attribute lists with raw-DER values, rawsignatureoctets).
Round 20 closes the round-19 verification deferral. New
pubsec::verify::verify_signature(signer, certs, content) resolves the
signer's certificate from a pool by IssuerAndSerial or
SubjectKeyIdentifier, hashes the canonical (universal-SET-tag)
re-encoding of signedAttrs per digestAlgorithm, and verifies the
hash against signature per signatureAlgorithm (RFC 5652 §5.4 +
§11.2). Hash side: SHA-1 / SHA-256 / SHA-384 / SHA-512. Signature
side: RSA-PKCS#1 v1.5 (the rsaEncryption + four sha*WithRSA OIDs
all map here), RSA-PSS (id-RSASSA-PSS), and ECDSA on P-256 / P-384
/ P-521 (curve dispatch by the cert SPKI's named-curve OID per RFC
5480 §2.1.1.1). When signedAttrs is present, the verifier also
cross-checks the messageDigest attribute against the eContent hash
(RFC 5652 §11.2) — so a tampered eContent fails even when the outer
signature still verifies. Detached signatures (PAdES — eContent absent)
feed the document bytes through AttachedContent::External(&[u8]).
Round-20 also extends x509::Certificate to capture
spki_algorithm_oid + spki_algorithm_params so the verifier can
route ECDSA on the named-curve OID without re-parsing the certificate.
Round 21 closes the reader half of the round-20 follow-up list:
PDF /Sig annotation reader (ISO 32000-1 §12.7.4.5 + §12.8.1).
DocumentReader::signatures() walks the catalog → /AcroForm /Fields
tree (honouring /FT inheritance through non-terminal /Kids
parents per §12.7.3.1) and surfaces one [PdfSignature] per /V
signature dictionary it can parse. Each value carries the
[a, b, c, d] /ByteRange, the hex-decoded /Contents blob, the
/SubFilter (adbe.pkcs7.detached / ETSI.CAdES.detached etc.),
the optional metadata fields (/Name, /Reason, /Location,
/ContactInfo, /M), and — for the CMS-detached SubFilters — the
parsed [pubsec::signed_data::SignedData]. PdfSignature::signed_message(pdf)
concatenates the two /ByteRange-named slices into the byte string
the signing tool hashed; pass it as AttachedContent::External(...)
to the existing [pubsec::verify::verify_signature] for a full
end-to-end verify.
use oxideav_pdf::reader::DocumentReader;
use oxideav_pdf::pubsec::verify::{verify_signature, AttachedContent};
use oxideav_pdf::pubsec::x509::parse_certificate;
let mut r = DocumentReader::open(&pdf_bytes)?;
for sig in r.signatures()? {
if !sig.is_cms_detached() { continue; }
let signed = sig.signed_message(&pdf_bytes)?;
let sd = sig.signed_data.as_ref().expect("CMS-detached parsed");
let certs: Vec<_> = sd.certs.iter()
.filter_map(|der| parse_certificate(der).ok())
.collect();
let ok = verify_signature(
&sd.signer_infos[0],
&certs,
AttachedContent::External(&signed),
)?;
println!("signature verifies: {ok}");
}
# Ok::<(), oxideav_pdf::PdfError>(())The reader is tolerant of unsigned slots (a Sig form field whose /V
is absent — common for "approval line still pending" templates), of
non-terminal parent fields without their own /V, and of malformed
/Contents blobs (the dict surfaces but signed_data is None).
Round 30 closes the symmetric writer half: the new
oxideav_pdf::sig module emits signed PDFs with valid /ByteRange
- PKCS#7 / CMS
SignedData/Contentsblobs (ISO 32000-1 §12.7.4.5 + §12.8.1 + §7.5.6 + RFC 5652 §5 + §5.4 + §11.2). The classic "ByteRange-placeholder fill-in" pattern is implemented end-to-end — build PDF with a fixed-width/ByteRange[?? ?? ?? ??]+ a/Contents <0…0>placeholder (8192 hex chars = 4096 raw bytes, enough for any RSA-2048 / ECDSA-P256 SHA-256 SignedData with a single signer + cert), patch/ByteRangewith the computed offsets, hash the bytes spanned by/ByteRange, wrap into a CAdES-BES-style CMSSignedDatawithsignedAttrs = { contentType, messageDigest }per RFC 5652 §11.1+§11.2, hex-encode, overwrite the placeholder. A [Signer] trait decouples the crypto: bring your ownring/rsa/p256/ HSM impl, or use the reference [RsaPkcs1v15Sha256Signer] / [EcdsaP256Sha256Signer] that wrap the in-crate deps.
use oxideav_pdf::{sign_pdf_from_scene, RsaPkcs1v15Sha256Signer, SignerIdentity};
let private_key = rsa::RsaPrivateKey::new(&mut rsa::rand_core::OsRng, 2048)?;
let signer = RsaPkcs1v15Sha256Signer::new(private_key);
let identity = SignerIdentity::from_signer_cert_der(cert_der)?;
let signed_pdf = sign_pdf_from_scene(&scene, &signer, identity)?;
# Ok::<(), Box<dyn std::error::Error>>(())Round-30 ships RSA-PKCS#1 v1.5 + SHA-256 and ECDSA-P256 + SHA-256.
RSA-PSS, ECDSA on P-384 / P-521, and Ed25519 plug in through the same
[Signer] trait without touching the writer surface. The output is
accepted by qpdf --check and verifies end-to-end against the
round-27 PKCS#7 verify dispatch.
The writer emits password-protected PDFs across the same revision range
the reader handles. [oxideav_pdf::write_pdf_from_scene_encrypted]
takes a [Scene] and an [encrypt::EncryptionConfig] and produces
bytes that round-trip through read_pdf_to_scene_with_password:
use oxideav_pdf::encrypt::EncryptionConfig;
let cfg = EncryptionConfig::aes_256_r6(b"hunter2", b"FILE-ID-16-BYTES");
let pdf = oxideav_pdf::write_pdf_from_scene_encrypted(&scene, &cfg)?;
# Ok::<(), oxideav_pdf::PdfError>(())Writer-side coverage matches the reader: R=2 (RC4-40), R=3 (RC4-128),
R=4 (AES-128 / RC4 via CFM), R=5 (Adobe ext L3), R=6 (ISO 2.0).
/O, /U, /OE, /UE, and /Perms come from the canonical
algorithms (3, 4, 5 for V≤4; 8, 9, 10 for V=5); per-object key
derivation is Algorithm 1 (V≤4) or the file key directly (V=5).
Both reader and writer support the binary cross-reference stream
form introduced in PDF 1.5 (ISO 32000-1 §7.5.8): a /Type /XRef
stream object whose body packs each entry into /W [w1 w2 w3]
big-endian fields, Flate-compressed with /Predictor 12 (PNG-Up).
The classical xref-keyword form (PDF 1.0..1.4) is also accepted
on input and remains the writer's default; opt into the stream form
via [oxideav_pdf::write_pdf_from_scene_xref_stream].
Both reader and writer support PDF 1.5+ object streams
(/Type /ObjStm, ISO 32000-1 §7.5.7). The reader resolves
Compressed xref entries by fetching the containing object stream,
parsing its (obj_num offset) header, and returning the body bytes
from the matching slot. The writer packs every compressible
indirect object (every dict that isn't a stream and isn't the
Catalog) into one ObjStm container — opt in via
[oxideav_pdf::write_pdf_from_scene_object_stream]. Stream objects
(content streams, image XObjects, the xref stream itself) cannot be
compressed per §7.5.7 and remain at their own byte offsets.
[oxideav_pdf::write_pdf_incremental_update] appends new revisions
to a previously-written PDF per ISO 32000-1 §7.5.6 — the new
revision's body is appended verbatim, followed by a new xref
subsection that lists only the changed slots, plus a trailer
carrying /Prev <prev_xref_off> pointing back at the original
revision. The reader follows the /Prev chain and merges entries:
the newest revision wins on overlap.
let original = oxideav_pdf::write_pdf_from_scene(&scene_v1)?;
// ... time passes; user adds two pages ...
let updated = oxideav_pdf::write_pdf_incremental_update(&original, &new_pages)?;
// `updated` starts with `original` byte-for-byte, then appends.ISO 32000-1 §7.6.5 lets a single stream opt out of per-object
encryption by listing /Crypt as its first /Filter with
/DecodeParms /Name /Identity (or no /Name — the default per
§7.4.10 Table 24). The writer leaves such streams untouched while
encrypting the rest of the file; the reader applies the same rule
on input. The classic consumer is XMP metadata streams that need to
remain searchable in encrypted PDFs.
Round 9 emits Linearized PDF per ISO 32000-1 §7.5.6 + Annex F.
[write_pdf_from_scene_linearized] produces a PDF whose first 1024
bytes carry a complete linearization parameter dictionary
(/Linearized 1 + /L + /H + /O + /E + /N + /T); the
on-wire layout follows F.3.1 (header → lin-dict → first-page xref →
catalog → hint stream → first-page section → remaining pages →
main xref). startxref at EOF points at the first-page xref;
the first-page trailer's /Prev points at the main xref. The
output is also a valid plain PDF — readers ignoring /Linearized
walk the same Catalog + Pages tree + page content.
The hint stream emits the page offset table (F.4.1) with full
per-page entries (round 13: items 1, 2, 6, 7 — object count, page
length, content stream offset relative to page start, content stream
length) at fixed 32-bit width, plus minimal shared-object (F.4.2),
thumbnail (F.4.3), and outline (F.4.4) header sections. Entry counts
for the latter three are zero so no per-shared-object / per-thumbnail
/ per-outline bytes are generated. The hint dict carries /S, /T,
/O offsets into the decoded hint stream so a reader walking the
optional tables sees a fully-formed (if empty) layout. Extended
generic (F.4.5) and embedded-file-stream (F.4.6) tables are still
deferred — we generate no interactive forms / structure trees /
embedded files.
[DocumentReader::text_extraction] walks every page's content
stream and emits one [TextRun] per Tj / TJ / ' / " operator,
with the text-matrix origin and Tf font + size resolved per ISO
32000-1 §9.4.4. Encoded glyphs are mapped back to Unicode through
the font's /ToUnicode CMap when present (parsing the bfchar /
bfrange blocks defined in §9.10.3 + Adobe Tech Note #5014); for
Identity-H Type 0 fonts without /ToUnicode the walker falls back
to interpreting each 2-byte CID as a BMP code point. Simple fonts
honour /Encoding /WinAnsiEncoding and /Encoding /MacRomanEncoding
(Annex D.2), with a Latin-1 fallback for everything else.
use oxideav_pdf::reader::DocumentReader;
let pdf = std::fs::read("invoice.pdf")?;
let mut reader = DocumentReader::open(&pdf)?;
let extraction = reader.text_extraction()?;
for run in &extraction.runs {
println!("@({:.0},{:.0}) {}/{}: {}",
run.position.0, run.position.1,
run.font_name, run.font_size, run.text);
}
println!("flat: {}", extraction.flat_text());
# Ok::<(), Box<dyn std::error::Error>>(())Runs come out in stream order — the rendering order the page would have laid down. Reading-order reconstruction (column / paragraph segmentation) is a future-round followup; round 22 gives the raw runs plus matrix positions so a downstream layout pass can do its own segmentation.
[DocumentReader::image_xobjects] walks every page's
/Resources /XObject subdict and surfaces every Image XObject whose
final filter is /DCTDecode (ISO 32000-1 §7.4.8). The returned
[PdfImageXObject] carries the unmodified JPEG bytes — the exact
JPEG-1 / JFIF stream a JPEG decoder needs — plus the dictionary's
/Width, /Height, /ColorSpace (mapped to the [ColorSpace] tag:
DeviceRGB / DeviceCMYK / DeviceGray / Indexed / Other), and
/BitsPerComponent. Wrapping /ASCII85Decode / /ASCIIHexDecode /
/FlateDecode filters preceding /DCTDecode are unwrapped before
the JPEG payload is returned, so callers always get a self-contained
JPEG stream (the standard pdfimages -all shape).
use oxideav_pdf::reader::DocumentReader;
let pdf = std::fs::read("photos.pdf")?;
let mut reader = DocumentReader::open(&pdf)?;
for (id, image) in reader.image_xobjects()? {
let path = format!("xobj-{}.jpg", id.number);
std::fs::write(&path, &image.data)?;
println!("{} ({}x{} {:?}, {} bpc)", path,
image.width, image.height, image.color_space,
image.bits_per_component);
}
# Ok::<(), Box<dyn std::error::Error>>(())The same XObject referenced from multiple pages is returned once
(deduplicated by ObjectId). Image XObjects with non-DCTDecode
filters (FlateDecode-only raster XObjects, JBIG2Decode, JPXDecode,
CCITTFaxDecode) are silently skipped — the round-23 walker is
JPEG-only. Cross-checked against pdfimages -all (poppler-utils):
the bytes are byte-identical.
[DocumentReader::annotations] walks every page's /Annots array and
surfaces every entry as a [PdfAnnotation] (ISO 32000-1 §12.5.6
Tables 169..209). Per-subtype payload covers /Text (sticky notes —
/Open, /Name icon, /State, /StateModel), /FreeText (/DA,
/Q quadding, /RC, /IT intent), /Stamp (icon name), the four
text-markup variants /Highlight / /Underline / /Squiggly /
/StrikeOut (/QuadPoints), /Square + /Circle (/IC, /RD),
/Link (re-uses the round-25 go-to / URI decoder), and /Widget
(/FT, /T, /V). Unknown subtypes (Movie, Sound, 3D, RichMedia,
…) surface as AnnotationKind::Other { subtype }. Common Table 164
fields (/Rect, /Contents, /NM, /M, /F, /C, /Border) are
decoded for every subtype.
use oxideav_pdf::{reader::DocumentReader, AnnotationKind};
let mut r = DocumentReader::open(&pdf_bytes)?;
for a in r.annotations()? {
println!("page {} {:?}: {}", a.source_page_index, a.rect,
a.contents.as_deref().unwrap_or(""));
if let AnnotationKind::Stamp { icon } = &a.kind {
println!(" stamp icon: {icon}");
}
}
# Ok::<(), oxideav_pdf::PdfError>(())[DocumentReader::xmp_packet] parses the document-level XMP packet
round-19 surfaces into a structured [XmpPacket] (ISO 32000-1
§14.3.2 + Adobe XMP Spec 2012 / ISO 16684-1 / ISO 19005-1..3 §6.x).
Covers the most-used Dublin Core (dc:title through rdf:Alt,
dc:creator through rdf:Seq, dc:subject rdf:Bag, dc:rights,
dc:format), XMP Basic (xmp:CreateDate / xmp:ModifyDate /
xmp:MetadataDate / xmp:CreatorTool), PDF schema (pdf:Producer /
pdf:Keywords / pdf:PDFVersion / pdf:Trapped), and PDF/A
identification (pdfaid:part / pdfaid:conformance) fields. Element
and attribute forms both recognised; XML entities (& / < /
> / " / ') plus numeric character references
decode. XmpPacket::is_pdf_a() + pdf_a_conformance() collapse the
pair into a 1B-style PDF/A conformance designator.
let mut r = oxideav_pdf::reader::DocumentReader::open(&pdf_bytes)?;
if let Some(p) = r.xmp_packet()? {
println!("title: {:?}", p.dc_title);
println!("creator: {:?}", p.dc_creator);
println!("producer: {:?}", p.pdf_producer);
if p.is_pdf_a() {
println!("PDF/A conformance: {:?}", p.pdf_a_conformance());
}
}
# Ok::<(), oxideav_pdf::PdfError>(())Simple Type 1 / TrueType / Type 3 fonts may carry their /Encoding as
a dictionary that overlays a /Differences array on top of a named
/BaseEncoding (ISO 32000-1 §9.6.6.1). The reader resolves this
properly: the array's flat [N name1 name2 … M nameK …] form is
parsed (numeric tokens reset the running code; names land at
consecutive slots), and each glyph name maps to its Unicode scalar
through the Adobe Glyph List (subset staged under
docs/document/pdf/agl/subset.txt, ~320 glyph names). The resolver
plugs into the [DocumentReader::text_extraction] path so a
/Differences-using font decodes correctly to Unicode.
use oxideav_pdf::reader::{
apply_encoding_differences, parse_encoding_differences, BaseEncoding,
EncodingMap,
};
// Imagine an inline encoding dict resolved from a PDF font:
// /Encoding << /BaseEncoding /WinAnsiEncoding
// /Differences [24 /breve /caron /circumflex] >>
let diffs = parse_encoding_differences(&diffs_array)?;
let base = EncodingMap::from_base(BaseEncoding::WinAnsi);
let map = apply_encoding_differences(&base, &diffs);
assert_eq!(map.decode(&[0x18]), "\u{02D8}"); // breve
# Ok::<(), oxideav_pdf::PdfError>(())Unknown glyph names emit U+FFFD as a marker (matching what
pdftotext --raw does for un-resolvable glyphs). Multi-character
glyph expansions (/fi → "fi", /fl → "fl") are accommodated. Six
base encodings are recognised: WinAnsi / MacRoman / MacExpert /
Standard / Symbol / ZapfDingbats. Full AGL coverage (CJK,
Cyrillic, Devanagari) is round-29+.
[DocumentReader::read_in_logical_order] walks the catalog's
/StructTreeRoot /K tree and emits text runs in author-intended
reading order rather than the painter's raster order (ISO 32000-1
§14.6 + §14.7 + §14.8 — Tagged PDF). For a 2-column document, naive
raster extraction interleaves column 1's first row, column 2's first
row, column 1's second row, …; the round-29 pass walks [Sect_col1, Sect_col2] and emits all of column 1 before any of column 2. The
walker handles every leaf shape ISO 32000-1 §14.7.4.4 defines:
bare-integer MCID kids (resolve against the ancestor's inheritable
/Pg), <</Type /MCR /Pg p /MCID m>> marked-content references with
their own /Pg overrides (cross-page tables), <</Type /OBJR …>>
object references (skipped — they reference annotations, not text),
and nested /StructElem kids which recurse with a 64-deep cycle
guard.
use oxideav_pdf::reader::{DocumentReader, LayoutMode};
let mut r = DocumentReader::open(&pdf_bytes)?;
let result = r.read_in_logical_order()?;
match result.mode {
LayoutMode::Tagged => println!("logical reading order:"),
LayoutMode::Raster => println!("raster fallback (no /StructTreeRoot):"),
}
for run in &result.runs {
println!(" {}", run.text);
}
# Ok::<(), oxideav_pdf::PdfError>(())Documents without a /StructTreeRoot (or with a malformed / empty
tree) fall back to the existing raster-order extraction with
LayoutMode::Raster set on the return so callers can branch. The
pass also exposes extract_text_marked(reader) which emits every
text run alongside the marked-content /MCID it was painted under
(for callers that want to assemble a custom logical order outside the
StructTreeRoot — e.g. PDF/UA accessibility audits).
- Text emission — writer-side
BT … Tj … ETforNode::Textusing Type 0 fonts with a CIDFont built viaoxideav-ttf/oxideav-otf. The reader-side extraction surface landed in round 22 (see above). - Writer-side JPEG passthrough on
ImageRef(DCTDecode XObject) — needs core IR support for "raw codec bytes" alongside the decoded VideoFrame so the writer can emit/Filter /DCTDecodeinstead of re-encoding every JPEG to FlateDecoded raw RGBA. The reader-side surface landed in round 23 (see above). - Extended generic hint tables (F.4.5) and embedded-file-stream hint tables (F.4.6) for linearized output — we generate no interactive forms / structure trees / embedded files, so the per-table content would be empty anyway.
- Ed25519 / Ed448 signature dispatch in
pubsec::verify— round 20 covers RSA-PKCS#1 v1.5 / RSA-PSS / ECDSA on P-256 / P-384 / P-521; EdDSA needs aned25519-dalek(ored448-goldilocks) dep. - Transparency groups beyond a per-
Group/ca+/CAopacity.
[dependencies]
oxideav-core = "0.1"
oxideav-pdf = "0.0"use oxideav_core::{
FillRule, Group, Node, Paint, Path, PathNode, Point, Rgba, VectorFrame,
};
use oxideav_core::TimeBase;
let mut p = Path::new();
p.move_to(Point::new(10.0, 10.0))
.line_to(Point::new(110.0, 10.0))
.line_to(Point::new(110.0, 60.0))
.line_to(Point::new(10.0, 60.0))
.close();
let frame = VectorFrame {
width: 200.0,
height: 100.0,
view_box: None,
root: Group {
children: vec![Node::Path(PathNode {
path: p,
fill: Some(Paint::Solid(Rgba::opaque(0xFF, 0x80, 0x00))),
stroke: None,
fill_rule: FillRule::NonZero,
})],
..Group::default()
},
pts: None,
time_base: TimeBase::new(1, 1),
};
let pdf = oxideav_pdf::write_pdf(&frame).expect("vector → PDF");
std::fs::write("out.pdf", pdf).unwrap();
# Ok::<(), Box<dyn std::error::Error>>(())MIT — see LICENSE.