Skip to content

[feature] embed_pdf: account for /UserUnit when reading page dimensions #1

@segovb

Description

@segovb

import_page_as_xobject already resolves inherited /Rotate by walking the page tree and baking the rotation into the Form XObject's BBox and content stream CTM. The same approach would be valuable for /UserUnit (PDF 1.6+, §7.7.3.3).

Report

/UserUnit is a scaling factor on page dimensions (default 1.0 = 1/72 inch). A page with MediaBox [0 0 306 396] and /UserUnit 2 is physically US Letter (612 × 792 pt), but its content stream operates in 306 × 396 coordinates.

Currently hipdf reads the raw MediaBox without accounting for UserUnit, so it sees a 306 × 396 page. When embedding with full_page_options(612.0, 792.0), the content renders at half size instead of filling the page.

Suggested fix

In import_page_as_xobject (or the MediaBox resolution path), alongside the existing /Rotate handling:

  1. Resolve /UserUnit from the page dict, walking parent /Pages nodes for inheritance (same pattern as resolve_page_rotate)
  2. If UserUnit != 1.0:
  • Multiply the BBox dimensions by UserUnit so the XObject reports the correct physical size
  • Prepend uu 0 0 uu 0 0 cm to the content stream so user-space coordinates map correctly to the scaled BBox

This mirrors the rotation approach: bake the attribute into the XObject so callers don't need to know about it.

Context

/UserUnit is rare in practice but shows up in PDFs from medical records systems and high-resolution scans. We're using hipdf to normalize arbitrary incoming PDFs to US Letter for a document packet pipeline, and this is the one page-level attribute that currently requires manual pre-processing with lopdf before passing to hipdf.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions