import_page_as_xobject already resolves inherited /Rotate by walking the page tree and baking the rotation into the Form XObject's BBox and content stream CTM. The same approach would be valuable for /UserUnit (PDF 1.6+, §7.7.3.3).
Report
/UserUnit is a scaling factor on page dimensions (default 1.0 = 1/72 inch). A page with MediaBox [0 0 306 396] and /UserUnit 2 is physically US Letter (612 × 792 pt), but its content stream operates in 306 × 396 coordinates.
Currently hipdf reads the raw MediaBox without accounting for UserUnit, so it sees a 306 × 396 page. When embedding with full_page_options(612.0, 792.0), the content renders at half size instead of filling the page.
Suggested fix
In import_page_as_xobject (or the MediaBox resolution path), alongside the existing /Rotate handling:
- Resolve /UserUnit from the page dict, walking parent /Pages nodes for inheritance (same pattern as resolve_page_rotate)
- If UserUnit != 1.0:
- Multiply the BBox dimensions by UserUnit so the XObject reports the correct physical size
- Prepend uu 0 0 uu 0 0 cm to the content stream so user-space coordinates map correctly to the scaled BBox
This mirrors the rotation approach: bake the attribute into the XObject so callers don't need to know about it.
Context
/UserUnit is rare in practice but shows up in PDFs from medical records systems and high-resolution scans. We're using hipdf to normalize arbitrary incoming PDFs to US Letter for a document packet pipeline, and this is the one page-level attribute that currently requires manual pre-processing with lopdf before passing to hipdf.
import_page_as_xobject already resolves inherited /Rotate by walking the page tree and baking the rotation into the Form XObject's BBox and content stream CTM. The same approach would be valuable for /UserUnit (PDF 1.6+, §7.7.3.3).
Report
/UserUnit is a scaling factor on page dimensions (default 1.0 = 1/72 inch). A page with MediaBox [0 0 306 396] and /UserUnit 2 is physically US Letter (612 × 792 pt), but its content stream operates in 306 × 396 coordinates.
Currently hipdf reads the raw MediaBox without accounting for UserUnit, so it sees a 306 × 396 page. When embedding with full_page_options(612.0, 792.0), the content renders at half size instead of filling the page.
Suggested fix
In import_page_as_xobject (or the MediaBox resolution path), alongside the existing /Rotate handling:
This mirrors the rotation approach: bake the attribute into the XObject so callers don't need to know about it.
Context
/UserUnit is rare in practice but shows up in PDFs from medical records systems and high-resolution scans. We're using hipdf to normalize arbitrary incoming PDFs to US Letter for a document packet pipeline, and this is the one page-level attribute that currently requires manual pre-processing with lopdf before passing to hipdf.