Skip to content

Releases: mini-software/MiniPdf

v0.18.2

19 Mar 10:28

Choose a tag to compare

v0.18.1

Highlights

This release significantly improves DOCX header/footer rendering, refines CJK text layout accuracy, adds Excel accounting format support, and enhances image sizing fidelity for both XLSX and DOCX conversions.


DOCX Conversion

  • Header/Footer element rendering: Full support for rendering inline images, text runs, and tables inside DOCX headers and footers, including alignment, font sizing, word-wrap, and vertical alignment within table cells.
  • Improved inter-script spacing: Smarter CJK↔Latin spacing logic that avoids over-spacing for short labels, single-letter prefixes, and pure numeric tokens (e.g., "500字", "A期" no longer get unwanted spaces).
  • Table cell text justification: Added word spacing (Tw) for justified text in DOCX table cells, distributing extra space across words for "both" alignment.
  • Calibri width estimation improvements: Separated Latin and CJK width calculations; applied ~2.3% kerning reduction to Latin characters only, improving text width estimation accuracy.
  • Cell text horizontal scaling: Uses PDF Tz (horizontal scaling) to fit text within table cell boundaries when Helvetica rendering exceeds cell width.

Excel (XLSX) Conversion

  • Accounting prefix rendering: Properly renders accounting number format prefixes (e.g., $, ¥) left-aligned in cells, separate from the right-aligned numeric value — matching Excel/LibreOffice behavior across title rows, overflow rows, and normal rows.
  • Formula evaluation: Added support for evaluating YEAR(TODAY()), MONTH(NOW()), DAY(TODAY()) date-part formulas and simple CONCATENATE/& formulas with quoted literals and date tokens.
  • Image sizing fidelity: Removed aggressive size capping (usableWidth * 0.95, pageHeight * 0.75) for images; now uses actual EMU dimensions and clamps only to remaining drawable space from the anchor point, producing more accurate image placement.
  • Column width handling: Explicit column widths from the spreadsheet are no longer capped by maxColWidth, honoring the author's layout intent.
  • twoCellAnchor image fallback: Images using twoCellAnchor that omit xdr:ext now fall back to a:xfrm/a:ext for size information.

PDF Writer

  • Character spacing state fix: Always emit Tc (character spacing) operator to prevent previous text block's Tc value from leaking through the graphics state to subsequent blocks.

Documentation

  • Added MiniPdf.Cli badge and usage instructions (install, commands, examples) to all README files.
  • Added conversion limitation notice encouraging users to report issues or contribute PRs.

Stats

  • 5 source files changed, 1,049 insertions, 148 deletions

Full Changelog: v0.17.0...v0.18.0

v0.17.0 MiniPdf CLI Tool

18 Mar 11:31

Choose a tag to compare

v0.17.0

🚀 New: MiniPdf CLI Tool

A brand-new command-line tool (MiniPdf.Cli) for converting Excel (.xlsx) and Word (.docx) files to PDF — installable as a .NET global tool via NuGet.

dotnet tool install -g MiniPdf.Cli
minipdf convert input.xlsx -o output.pdf
minipdf convert report.docx --fonts /path/to/fonts
  • Supports --output, --fonts options and shorthand syntax
  • Packaged as a NuGet dotnet tool (PackAsTool)
  • Automated NuGet publish via new GitHub Actions workflow (nuget-publish-cli.yml, triggered by cli-v* tags)

📝 DOCX-to-PDF Improvements

  • CJK font resolution: Per-run font name resolution with East Asia theme font support — reads theme1.xml major/minor script fonts, resolves w:rFonts with East Asia language awareness
  • Preferred CJK font propagation: Document's DefaultEastAsiaFontName is now passed to the PDF writer for correct font selection and prioritization
  • Floating text boxes: Added support for wrapNone floating text boxes with absolute page positioning (new DocxFloatingTextBox record)
  • Field instruction parsing: New GetFieldInstructionType for PAGE/NUMPAGES field support in headers/footers
  • List numbering: Added FormatChineseCounting for Chinese-style ordered list numbering; improved list label font alignment (shares same embedded font slot as body text)
  • Character spacing: Per-run CharSpacing tracking and word-spacing propagation during line rendering
  • Line spacing: Correct handling of exact/atLeast spacing (not snapped to document grid per OOXML spec); auto-spacing uses max run font size for grid-snapped height

🖨️ PDF Writer Enhancements

  • Font name normalization & alias matching: NormalizeFontName and IsFontAliasMatch for flexible font resolution
  • CJK font file mapping: New CjkFontFileMap dictionary for well-known CJK font-to-file mapping (Chinese, Japanese, Korean)
  • Font prioritization: PrioritizePreferredCjkFont reorders candidate font paths based on document preference
  • Multi-font slot support: FindPreferredFontIndex and fontNameToSlot dictionary for per-run font slot selection in content streams

📊 Excel-to-PDF

  • Improved print-scale precision: stores precise float scale to avoid integer rounding loss

🌐 MiniPdf.Web

  • Custom font upload: Users can upload their own .ttf / .otf font files for better CJK rendering
  • Font loading status indicators: Download progress, success, and failure states
  • i18n support: Added I18n.cs with English, Simplified Chinese, and Traditional Chinese translations
  • Enhanced CSS styling for the converter page

📄 Documentation & Licensing

  • Added open-source and licensing information (Apache 2.0) to all README files (EN, zh-CN, zh-TW, ja, ko, fr, it)
  • New README.nuget.cli.md for CLI NuGet package
  • Updated README to reflect lightweight design and serverless capabilities

🧪 Testing & Benchmarks

  • New DOCX-to-PDF unit test: Convert_FooterPageFieldWithSwitch_RendersPageNumber
  • Added test files: Academic Achievement Summary Table (xlsx), SA8000 ch sample, nthu_article (docx)
  • Updated DOCX and XLSX benchmark comparison reports with new test cases
  • Cleaned up outdated benchmark report files and removed obsolete test PDFs (wedding timeline planner, payroll calculator, etc.)

🔧 Other

  • Code refactoring for improved readability and maintainability
  • Updated .gitignore with new patterns
  • Solution file updated to include MiniPdf.Cli project

Full Changelog: v0.16.0...v0.17.0

v0.16.0

17 Mar 06:43

Choose a tag to compare

v0.16.0

Release date: 2026-03-17
Compare: v0.15.0...393846f

Highlights

  • Broad .NET compatibility: Added netstandard2.0 and net462 target frameworks, enabling use on .NET Framework 4.6.2+ and any .NET Standard 2.0 consumer (Unity, legacy ASP.NET, Xamarin, etc.).
  • DOCX rendering — vector shapes & custom geometry: Full support for DrawingML anchor shapes, group shapes (wpg:wgp), ellipses, custom geometry paths (arcs, quadratic/cubic Bézier), and compound polygons with even-odd fill.
  • PNG alpha channel (RGBA) support: PNG images with transparency now emit a proper PDF /SMask object, preserving alpha in both XLSX and DOCX conversions.
  • Excel rendering quality improvements: Uniform-fill row painting eliminates hairline fill seams; improved single-line vertical centering; bold/underline pass-through for cell text; smarter border width scaling.
  • DOCX theme font resolution: Theme-referenced Latin fonts (majorHAnsi / minorHAnsi) are now correctly resolved via theme1.xml.
  • Custom fonts documentation: README files across all languages now include a "Custom Fonts" section with MiniPdf.RegisterFont usage.

Change Summary

  • Commits in range:
    • 7 total commits (6 non-merge commits)
  • Diff summary:
    • 467 files changed
    • 19,033 insertions
    • 4,977 deletions
  • Main affected areas:
    • MiniPdf (core converters, PDF writer, new Compat layer)
    • tests (benchmark images, issue files, comparison reports)
    • README.md + all language variants (custom fonts, DOCX feature)

Notable Technical Changes

  • New files:
    • Compat/NetFxPolyfills.cs — polyfills for Index, Range, GetSubArray, collection expressions, and init accessors for netstandard2.0 / net462
    • Compat/PlatformCompat.cs — portable Math.Clamp, HashCode.Combine, Encoding.Latin1, OperatingSystem.IsWindows/IsMacOS replacements
    • PdfPage.cs — new PdfEllipseBlock, PdfPolygonBlock, PdfPoint records; AddEllipse, AddPolygon, AddCompoundPolygon methods
  • DocxReader.cs (+686 lines):
    • ReadAnchorShapes — parses wp:anchor / wpg:wgp drawing elements into DocxShape models
    • Custom geometry path parser (ParseCustomGeometryPaths) with arc, quadratic Bézier, cubic Bézier, and guide formula evaluation
    • ResolveSolidFill — theme-aware fill color + alpha resolution
    • ReadThemeLatinFonts / ResolveFontNameFromRFonts — theme font lookup
    • New model types: DocxPolygonPoint, DocxCustomPath
  • DocxToPdfConverter.cs (+251 lines):
    • Shape rendering: frame borders, ellipse fills, custom polygon/compound polygon paths
    • CalculateCellContentHeight / CalculateRowInlineImageFloorHeight for improved table cell sizing
  • ExcelToPdfConverter.cs (+267 lines):
    • Uniform-fill row detection paints a single background rectangle to prevent vertical seams
    • Fill seam-overlap logic for neighboring same-fill cells (horizontal + vertical)
    • Single-line center-aligned text positioning improved to match LibreOffice baseline
    • ShouldUsePdfBold — suppresses synthetic bold for large headings (>20pt)
    • CellHasContentOrStyle — smarter empty-row/column trimming that ignores fill-only cells
    • Ascent compensation limited to larger-than-base fonts only
    • Border widths scaled via borderScaleFactor with 0.08f minimum
  • PdfWriter.cs (+232 lines):
    • Ellipse rendering via 4-arc cubic Bézier approximation
    • Polygon/compound polygon rendering with even-odd fill support
    • RGBA PNG alpha → /SMask XObject pipeline (IsRgbaPng, TryDecodePngToRgb alpha output)
    • CompressToZlib replaces WrapDeflateInZlib — uses ZLibStream on .NET 6+ and manual zlib framing + Adler-32 on older targets
    • ComputeAdler32 for net462 / netstandard2.0
  • MiniPdf.csproj:
    • Target frameworks: netstandard2.0;net462;net6.0;net8.0;net9.0
    • Conditional System.Memory, System.ValueTuple references for legacy TFMs

Commits (non-merge)

  • 6686d79 Update benchmark images for DOCX rendering tests
  • e9b70e1 Enhance .NET compatibility by adding support for .NET Standard 2.0 in polyfills and project configuration
  • f85263b Add .NET Framework compatibility polyfills and update existing methods to use them
  • 20edf6d Update benchmark images and remove unused diagnostic image
  • d35a480 Update README files to include Word to PDF conversion feature across multiple languages
  • 5dbe2a5 Add custom fonts section to README files for various languages

v0.15.0

16 Mar 11:16

Choose a tag to compare

MiniPdf v0.15.0

Release date: 2026-03-16
Compare: v0.14.0...af5da03

Highlights

  • Improved Excel and DOCX conversion pipeline and PDF rendering internals.
  • Added/expanded CJK font support and font registration flow (including rendering guidance).
  • Added filtering capability to benchmark and conversion scripts for faster targeted runs.
  • Performed major benchmark/tooling cleanup by removing many obsolete test scripts and artifacts.
  • Added NuGet-focused documentation and updated README links/report references.
  • Updated CI workflows for NuGet publish/version extraction and Pages flow.

Change Summary

  • Commits in range:
    • 28 total commits (23 non-merge commits)
  • Diff summary:
    • 1118 files changed
    • 39982 insertions
    • 38151 deletions
  • Main affected areas:
    • tests (benchmark assets, reports, and cleanup)
    • src (core converters and PDF writer/text/page internals)
    • scripts (benchmark automation and report/image updaters)
    • .github/workflows (nuget-publish, pages)
    • docs/readmes (README.md, README.zh-CN.md, new README.nuget.md)

Notable Technical Changes

  • Core library updates in conversion/rendering paths:
    • DOCX reader/converter updates
    • Excel reader/converter refactor
    • PDF page, text block, writer, and API surface adjustments
  • Benchmark workflow updates:
    • New filtering support in benchmark scripts
    • Deprecated/obsolete benchmark helper scripts removed
  • CI/documentation:
    • NuGet publish workflow version extraction fix
    • NuGet package documentation added

Commits (non-merge)

  • af5da03 Add README.nuget.md for NuGet package documentation and update project file references
  • 507fc5c Update comparison report and add diagnostic image
  • 2a40bf5 Refactor Excel to PDF conversion logic and update test files
  • 2ec8377 Update comparison report and images for MiniPdf vs Reference PDF
  • 8716752 Update comparison report and images for PDF tests
  • cbd6f8a Update comparison report and add new PDF files
  • 710fb53 Update benchmark images for MiniPdf tests
  • 7dd3204 Remove obsolete test files and scripts from MiniPdf.Scripts
  • 7ab517e feat: add filtering capability to benchmark and conversion scripts
  • 37b7c18 chore: add output_docx to .gitignore and document testing workflow
  • c89e9a5 refactor: simplify ItemGroup for Windows platform in MiniPdf.csproj
  • a7d3101 Remove obsolete test scripts and files related to PDF and Excel analysis, including checks for dimensions, styles, and scores. This cleanup enhances project maintainability by eliminating unused code and files.
  • 91eadf5 Remove deprecated benchmark scripts and related functionality from MiniPdf.Benchmark tests
  • e9152c5 Add anonymized PDF images for benchmarking tests
  • 02c01af feat: add DOCX issue file report links to README files
  • de0c6fe Refactor code structure for improved readability and maintainability
  • cf996f4 Refactor code structure for improved readability and maintainability
  • 0c04cfa Refactor code structure for improved readability and maintainability
  • 4e63b46 feat: add CJK font download and registration in test for PDF conversion
  • b302276 feat: add font warning message and styling for optimal rendering notice
  • 04a7cf4 docs: add warning about font limitations for optimal rendering in README files
  • 913b603 feat: add CJK font support for PDF generation and implement font registration
  • f89ef5d fix: enhance version extraction in NuGet publish workflow and add test for DOCX to PDF conversion

v0.14.0

14 Mar 06:25

Choose a tag to compare

v0.14.0

DOCX Conversion Enhancements

  • Anchor shapes & images: Support for DOCX anchor-positioned shapes (filled rectangles) and anchor images with precise EMU offset positioning
  • Vertical merge (vMerge): Table cells with vertical merge (restart/continue) are now correctly rendered
  • Character spacing: DOCX w:spacing/@w:val (character spacing / letter-spacing) is now parsed and applied in PDF output
  • Theme colors: Read and resolve theme colors from theme1.xml for accurate shape/text coloring (including tint/shade transforms)
  • SDT unwrapping: Structured Document Tag (<w:sdt>) elements are now unwrapped, exposing their inner content for rendering
  • Border handling: Improved paragraph and table cell border parsing in DOCX-to-PDF conversion
  • Line spacing: Fixed absolute line spacing (exact/atLeast) vs. proportional line spacing handling
  • Inter-script spacing: Added automatic spacing between CJK and Latin/digit runs

XLSX Conversion Enhancements

  • Conditional formatting: Parse <conditionalFormatting> rules and apply differential styles (dxf) for font/fill color overrides
  • Differential styles (dxf): Read dxf entries from styles.xml to support conditional formatting appearance
  • Image positioning: Improved sub-cell EMU offset calculation for fromColOff/fromRowOff/toColOff, producing more accurate image placement
  • Row height calculation: Better row height handling for image-anchored rows
  • Print title row images: Images anchored within print title rows are now rendered correctly on repeated pages

PDF Writer

  • Bold font support: Added built-in Helvetica-Bold (F1B) font; bold text in Latin-1 range now renders with proper bold typeface instead of faux-bold
  • Character spacing (Tc): PDF text rendering now applies Tc operator for character spacing when specified
  • Text width measurement: MeasureTextWidth accounts for character spacing for accurate layout

Other

  • Codebase refactoring for improved readability and maintainability across multiple modules
  • Updated benchmark images and comparison reports for DOCX and XLSX test suites
  • Enhanced README update script to process multiple benchmark reports

Full Changelog: v0.14.0...v0.15.0

v0.13.0 — Excel Print Fidelity, Theme Colors & Online Demo

12 Mar 06:34

Choose a tag to compare

v0.13.0 — Excel Print Fidelity, Theme Colors & Online Demo

Highlights

This release dramatically improves Excel-to-PDF conversion fidelity with print area support, page setup honoring, theme color rendering, and fit-to-page scaling. A new Blazor WASM online demo is introduced for browser-based conversion. The XLSX benchmark expands from 180 to 191 test cases with average overall score improving from 96.82% → 96.89%.

New Features

Excel Print Area Support

• Parse _xlnm.Print_Area defined names from workbook XML to determine printable cell ranges
• Trim rows, columns, merged cells, images, and charts to the defined print area before rendering
• Column-only ranges (e.g. $A:$L) handled with full-row expansion

Excel Page Setup & Paper Size

• Read pageSetup element: orientation (landscape/portrait), paper size (A4/A3/Letter), print scale, and custom margins
• Apply sheet-level margins (marginLeftPt, marginRightPt, marginTopPt, marginBottomPt) to PDF output
• Support A3 (842×1191pt) and A4 (595×842pt) paper sizes with landscape auto-swap

Theme Color Support for Fills & Fonts

• Parse theme1.xml to extract theme color palette
• Resolve theme + tint color attributes on font and fill elements via HSL luminance adjustment (ECMA-376 §18.8.19)
• New ResolveColorElement() and ApplyTint() helpers for accurate themed color rendering

Horizontal Centering

• Parse horizontalCentered print option from sheet XML
• Center sheet content horizontally within the usable page width when enabled

Online Demo (MiniPdf.Web)

• New Blazor WebAssembly project for browser-based XLSX/DOCX → PDF conversion
• Multi-language support with language switcher (EN/ZH-CN/ZH-TW/JA/KO/FR/IT)
• GitHub Pages deployment with CI workflow
• Accept specific MIME types for file input (.xlsx, .docx)
• Statcounter analytics integration

Improvements

Fit-to-Page Scaling

fitToPage + fitToWidth: auto-scale column widths to fit all columns in one page width, with proportional row height reduction
fitToPage + fitToHeight: recalculate print scale so all rows fit within the target number of vertical pages
• Cell font sizes optionally scaled by print scale factor (ScaleCellFonts) for accurate auto-row-height when compressed

Print Scale Rendering

• Apply print scale factor to font sizes, column padding, column widths, and row heights
• Explicit column widths from CharUnitsToPoints scaled to match print-scaled content
• Cell-level font size scaling for width calculations and text clipping

Chart Rendering Improvements

• Scale dominant twoCellAnchor charts (spanning >50% of sheet rows) to fill usable page width, matching LibreOffice output
• Inline charts (anchored within data area) rendered at anchor position without scale-up
• Overflow pages for dominant charts taller than one page
• Cap non-dominant chart height to 85% of page to prevent overflow

Column & Row Handling

• Trim trailing style-only columns (background/borders only, no text) to prevent excessive blank pages
• Skip hidden columns (width 0) entirely during rendering
• Skip column groups where no row has any text content (avoids blank pages for style-only ranges)
• Track customHeight rows and scope auto-row-height expansion to fitToPage sheets only
• Remove trailing empty pages (no text, images, rectangles, or lines) after column group rendering

Hidden Sheet Filtering

• Read state attribute from sheet elements; skip sheets marked as hidden or veryHidden

Benchmark

Format Cases Avg Score 🟢 Pass 🟡 Warning 🔴 Fail
DOCX 180 97.62% 178 2 0
XLSX 191 (+11) 96.89% (+0.07%) 169 22 0

Other Changes

• Added 11 new XLSX benchmark test cases (classic181–classic191) including payroll calculator scenarios
• New PdfDocument.RemoveEmptyPages() and RemoveLastPage() internal helpers
• Added online demo link to README files in multiple languages
• New Run-Benchmark_issues.ps1 script for issue-specific benchmark testing

Files Changed

ExcelReader.cs — +467/−59 lines: theme colors, print area parsing, page setup, hidden sheet filtering, fill color resolution
ExcelToPdfConverter.cs — +363/−17 lines: print area trimming, fit-to-page scaling, print scale rendering, chart scaling, horizontal centering, column trimming
PdfDocument.cs — +17 lines: RemoveEmptyPages() and RemoveLastPage() methods
• MiniPdf.Web — +1,350 lines: new Blazor WASM online demo with i18n support
• 308 files changed total (including benchmark images, reports, test scripts, web project)

Full Changelog: v0.12.0...v0.13.0

v0.12.0 — Helvetica Font Metrics, Header/Footer & Paragraph Borders

08 Mar 12:46

Choose a tag to compare

v0.12.0 — Helvetica Font Metrics, Header/Footer & Paragraph Borders

Highlights

This release brings precision font metrics, header/footer support, and paragraph borders to the DOCX-to-PDF converter, while also improving Excel-to-PDF layout accuracy. The DOCX benchmark expands from 150 to 180 test cases with average overall score improving from 96.96% → 97.62%.

New Features

DOCX Header & Footer Support

  • Parse headerReference / footerReference from sectPr and read header/footer .xml parts from the DOCX archive
  • Render header text centered in the top margin and footer text centered in the bottom margin on every page (9pt gray)

Paragraph Borders

  • Parse pBdr element with top/bottom/left/right border edges including width (sz in eighths of a point) and color
  • Render paragraph borders as PDF lines around the paragraph bounding box
  • New model records: DocxBorders, DocxBorderEdge

Improvements

Helvetica Font Metrics Engine (replaces fixed-width estimation)

  • NEW: EstimateTextWidth() using real Helvetica character widths (ASCII 32–126 glyph table, CJK full-width 1000 units)
  • NEW: GetHelveticaCharWidth() lookup with proper handling of CJK Unified Ideographs, Hiragana, Katakana, and fullwidth forms
  • Replaced all avgCharWidth = fontSize * 0.47f approximations across paragraphs, tables, tab stops, and word wrapping
  • Tab stop expansion now uses actual text width measurement instead of column-count estimation
  • Word wrapping decisions based on real glyph widths instead of character count × fixed width

Text Positioning & Layout

  • Ascent-aware positioning: text top aligns with margin using AscentRatio (1.075) at top-of-page boundaries
  • Font metrics factor tuned from 1.18 → 1.17 for tighter line spacing
  • Default paragraph spacing changed from fontSize * 0.35f to fixed 8f points for closer match to Word rendering
  • Bullet lists: render bullets as small filled rectangles instead of text glyphs — eliminates Helvetica vs Symbol U+F0B7 text-extraction discrepancy

Tab Stop & Leader Improvements

  • Leader dot/hyphen/underscore fill count computed from actual glyph widths with Calibri-equivalent 0.725× scale factor
  • Tab leader lines use maxWidth (Tz operator) to compress expanded dot runs to fit intended tab position
  • Tab-stop-aware word wrap: extends effective line width when tab positions exceed available width, preventing premature line breaks

Table Rendering

  • Grid-line border drawing: borders drawn once per row boundary (shared edges) instead of per-cell — eliminates double-line artifacts
  • Cell text alignment: center and right alignment now supported within table cells
  • Nested table flattening: each nested row's cell text is joined into a single paragraph (instead of one paragraph per cell)

Excel-to-PDF Improvements

  • Overflow page accumulation: virtual overflow pages now emitted at end of sheet (matching LibreOffice layout) instead of inline per-row
  • Default column width: uses sheet's DefaultColumnWidth when available instead of hardcoded 8.43
  • Wrap-width calculation: accounts for cell content margins (~11pt) and Calibri fitting scale for more accurate row-height estimation
  • Pie chart legend: now renders category name text labels alongside color swatches

Benchmark

Format Test Cases Average Score Excellent (≥90%) Acceptable (80–90%) Needs Improvement (<80%)
DOCX 180 (+30) 97.62% (+0.66%) 178 2 0
XLSX 180 96.82% 165 13 2

Other Changes

  • Removed 26 obsolete XLSX reference PDFs (classic35–classic60) to streamline the benchmark suite
  • Added 30 new DOCX benchmark test cases (classic121–classic150) covering additional paragraph, table, and formatting scenarios

Files Changed

  • DocxReader.cs — +103 lines: header/footer parsing, paragraph borders, nested table flattening
  • DocxToPdfConverter.cs — +299/−80 lines: Helvetica metrics engine, border rendering, ascent positioning, bullet rendering, tab leader improvements
  • ExcelToPdfConverter.cs — +62/−62 lines: overflow page accumulation, pie chart legend text, wrap-width fix
  • 576 files changed total (including benchmark images, reports, test scripts)

Full Changelog: v0.11.0...v0.12.0

v0.11.0

07 Mar 04:41

Choose a tag to compare

v0.11.0 — Word (.docx) to PDF Conversion

Highlights

This release adds DOCX-to-PDF conversion — a brand-new, zero-dependency Word document renderer. MiniPdf can now convert .docx files to PDF with paragraph, table, and image support, achieving a 96.96% average overall score across 150 benchmark test cases compared to LibreOffice reference output.

New Features

DOCX Reader (DocxReader.cs — 791 lines)

  • Full OOXML paragraph parsing: text runs, bold/italic/underline/strikethrough, font sizes, font colors, highlight colors
  • Heading styles (Heading1–Heading9) with automatic font size mapping from styles.xml
  • Paragraph alignment (left, center, right, justify) and indentation (left, right, hanging, firstLine)
  • Bullet and numbered list support with numId/ilvl detection
  • Tab stop parsing with position and alignment (left, center, right, decimal)
  • Paragraph shading / background color support
  • Table parsing with cell content, borders, shading, column spans (gridSpan), and grid column widths
  • Table border support: reads tblBorders and tcBorders from document and style definitions
  • Default paragraph properties (pPrDefault) applied from styles.xml
  • Embedded image extraction via relationships with EMU-to-point dimension conversion
  • Page layout reading from sectPr: page size, margins, orientation
  • Page break detection (w:br type page and lastRenderedPageBreak)

DOCX-to-PDF Converter (DocxToPdfConverter.cs — 730 lines)

  • Paragraph rendering with mixed formatting runs, line wrapping, and proper line spacing
  • Heading rendering with bold weight and scaled font sizes
  • Text alignment: left, center, right, justified
  • List rendering with bullet () and numbered (1., 2., …) prefixes at correct indentation
  • Tab stop handling with leader positioning
  • Paragraph shading rendered as filled rectangles behind text
  • Table rendering with cell borders, shading fills, column-width distribution, and automatic row height
  • Table border rendering with configurable stroke widths
  • Image embedding as inline JPEG XObjects with aspect-ratio-aware scaling
  • Page layout support: reads page dimensions and margins from DOCX sectPr
  • Automatic page breaks: content overflow and explicit w:br type="page" handling
  • Configurable ConversionOptions: font size, margins, line spacing, page dimensions

Unified API

  • MiniPdf.ConvertToPdf() now auto-detects .docx files by extension — no API change needed for existing callers
  • New MiniPdf.ConvertDocxToPdf(Stream) method for stream-based DOCX conversion
  • Updated NuGet description and tags to include word and docx

Tests

  • 9 new unit tests in DocxToPdfConverterTests.cs covering: simple documents, bold text, tables, empty documents, multi-paragraph, stream input, and file output
  • 150 DOCX benchmark test cases with visual comparison against LibreOffice reference PDFs

Benchmark

  • 150 DOCX test cases (classic01–classic120 + themed documents): single paragraph, multiple paragraphs, headings, bold/italic, font sizes, font colors, alignment, bullet lists, numbered lists, simple tables, table shading, merged cells, mixed content, images, long documents, multi-page tables, comprehensive reports, and more
  • Average Overall Score: 0.9696 (text similarity + visual comparison vs LibreOffice)
  • 147 Excellent (≥ 90%), 3 Acceptable (80–90%), 0 Needs Improvement (< 80%)
  • Benchmark scripts: Run-Benchmark_docx.ps1, generate_reference_pdfs_docx.py, compare_pdfs.py with DOCX mode

Other Changes

  • Added .gitattributes to configure GitHub Linguist for Python scripts
  • Updated README badges: replaced .NET badge with Gitee link across all language variants (EN, zh-CN, zh-TW, ja, ko, fr, it)
  • README now includes DOCX benchmark visual comparison table with MiniPdf vs Reference side-by-side images

Files Changed

  • DocxReader.cs — +791 lines (new): OOXML document parser
  • DocxToPdfConverter.cs — +730 lines (new): DOCX-to-PDF rendering engine
  • MiniPdf.cs — +26 lines: .docx auto-detection and ConvertDocxToPdf() API
  • MiniPdf.csproj — updated description and package tags
  • 340 files changed total (including benchmark images, reports, and scripts)

Full Changelog: v0.9.0...v0.10.0

v0.10.0 — Word (.docx) to PDF Conversion

06 Mar 01:15

Choose a tag to compare

v0.10.0 — Word (.docx) to PDF Conversion

Highlights

This release adds DOCX-to-PDF conversion — a brand-new, zero-dependency Word document renderer. MiniPdf can now convert .docx files to PDF with paragraph, table, and image support, achieving a 97.4% average overall score across 60 benchmark test cases compared to LibreOffice reference output.

New Features

DOCX Reader (DocxReader.cs — 727 lines)

• Full OOXML paragraph parsing: text runs, bold/italic/underline/strikethrough, font sizes, font colors, highlight colors
• Heading styles (Heading1–Heading9) with automatic font size mapping from styles.xml
• Paragraph alignment (left, center, right, justify) and indentation (left, right, hanging, firstLine)
• Bullet and numbered list support with numId/ilvl detection
• Tab stop parsing with position and alignment (left, center, right, decimal)
• Paragraph shading / background color support
• Table parsing with cell content, borders, shading, column spans (gridSpan), and grid column widths
• Embedded image extraction via relationships with EMU-to-point dimension conversion
• Page layout reading from sectPr: page size, margins, orientation
• Page break detection (w:br type page and lastRenderedPageBreak)

DOCX-to-PDF Converter (DocxToPdfConverter.cs — 682 lines)

• Paragraph rendering with mixed formatting runs, line wrapping, and proper line spacing
• Heading rendering with bold weight and scaled font sizes
• Text alignment: left, center, right, justified
• List rendering with bullet () and numbered (1., 2., …) prefixes at correct indentation
• Tab stop handling with leader positioning
• Paragraph shading rendered as filled rectangles behind text
• Table rendering with cell borders, shading fills, column-width distribution, and automatic row height
• Image embedding as inline JPEG XObjects with aspect-ratio-aware scaling
• Page layout support: reads page dimensions and margins from DOCX sectPr
• Automatic page breaks: content overflow and explicit w:br type="page" handling
• Configurable ConversionOptions: font size, margins, line spacing, page dimensions

Unified API

MiniPdf.ConvertToPdf() now auto-detects .docx files by extension — no API change needed for existing callers
• New MiniPdf.ConvertDocxToPdf(Stream) method for stream-based DOCX conversion
• Updated NuGet description and tags to include word and docx

Tests

• 9 new unit tests in DocxToPdfConverterTests.cs covering: simple documents, bold text, tables, empty documents, multi-paragraph, stream input, and file output
• 60 DOCX benchmark test cases with visual comparison against LibreOffice reference PDFs

Benchmark

• 60 DOCX test cases (classic01–classic60): single paragraph, multiple paragraphs, headings, bold/italic, font sizes, font colors, alignment, bullet lists, numbered lists, simple tables, table shading, mixed content, images, long documents, multi-page tables, comprehensive reports, and more
Average Overall Score: 0.9739 (text similarity + visual comparison vs LibreOffice)
• Benchmark scripts: Run-Benchmark_docx.ps1, generate_reference_pdfs_docx.py, compare_pdfs.py with DOCX mode

Other Changes

• Added .gitattributes to configure GitHub Linguist for Python scripts
• Updated README badges: replaced .NET badge with Gitee link across all language variants (EN, zh-CN, zh-TW, ja, ko, fr, it)
• README now includes DOCX benchmark visual comparison table with MiniPdf vs Reference side-by-side images

Files Changed

DocxReader.cs — +727 lines (new): OOXML document parser
DocxToPdfConverter.cs — +682 lines (new): DOCX-to-PDF rendering engine
MiniPdf.cs — +23 lines: .docx auto-detection and ConvertDocxToPdf() API
MiniPdf.csproj — updated description and package tags
192 files changed total (including benchmark images, reports, and scripts)

Full Changelog: v0.9.0...v0.10.0

v0.9.0 — Multi-Font Unicode & Horizontal Scaling

05 Mar 10:48

Choose a tag to compare

v0.9.0 — Multi-Font Unicode & Horizontal Scaling

Highlights

This release introduces multi-font embedding for full Unicode coverage and horizontal text scaling to prevent column overflow, significantly expanding support for multilingual, emoji, and symbol-heavy spreadsheets.

New Features

Multi-Font Embedding Engine

  • Replaced the single hardcoded Arial CID font with a dynamic multi-font system that discovers and embeds multiple system fonts at runtime
  • Cross-platform font discovery: Windows (YaHei, JhengHei, Malgun Gothic, Segoe UI, Segoe UI Emoji, Segoe UI Symbol), macOS (PingFang, Apple SD Gothic Neo, Apple Color Emoji), Linux (Noto Sans CJK, Noto Color Emoji, WenQuanYi)
  • Characters are automatically split into runs by font slot — e.g. CJK in F2, Korean in F3, emoji in F4 — with proper Td advances within the same BT/ET block
  • Full TrueType/TTC font parsing: cmap format 4 & 12, hmtx glyph widths, head/OS2/hhea metrics, glyf table subsetting
  • CIDToGIDMap streams for correct glyph mapping with ZLib compression
  • ToUnicode CMap with UTF-16 surrogate pair support for non-BMP code points (emoji, CJK Ext-B)
  • Font subsetting: zeros out unused glyph outlines to reduce embedded font size
  • Glyph outline validation (HasGlyphOutline) to detect placeholder/empty glyphs, enabling proper font fallback
  • Emoji range detection (IsEmojiRange) to prefer dedicated emoji fonts over CJK fonts with placeholder glyphs

Arabic Text Shaping

  • Built-in Arabic Presentation Forms-B shaping engine with contextual form selection (isolated, initial, medial, final)
  • Arabic joining type analysis (Non-Joining, Right-Joining, Dual-Joining, Join-Causing, Transparent)
  • Lam-Alef ligature handling (ﻻ ﻵ ﻷ ﻹ)

Horizontal Text Scaling (Tz Operator)

  • Added MaxWidth property to PdfTextBlock for per-cell width constraints
  • When text exceeds the column width, the PDF Tz (horizontal scaling) operator compresses text to fit — keeping all characters intact for text extraction while preventing visual overflow
  • Helvetica width table (MeasureTextWidth) with standard character widths in 1/1000 em units
  • Applied to both WinAnsi (F1) and Unicode (Fn) text rendering paths

Improvements

  • Adjusted default margins: left/right 50pt → 54pt, column padding 2pt → 3pt for better visual balance
  • Fill rectangles no longer include extra columnPadding width, matching LibreOffice cell boundary rendering more closely
  • Proper Unicode code point enumeration with surrogate pair handling

Benchmark

  • 16 new multilingual & emoji test cases (classic151–classic166):
    • Multilingual greetings, emoji sampler, currency symbols, math symbols, diacritical marks, RTL/BiDi text, CJK extended, emoji skin tones, ZWJ emoji sequences, punctuation marks, box drawing, CJK+emoji styled, Cyrillic alphabets, Indic scripts, Southeast Asian scripts, emoji progress
  • 180 total test cases, average overall score: 0.9652

Files Changed

  • PdfWriter.cs — +676 lines: multi-font engine, Arabic shaping, TrueType parsing, font subsetting
  • ExcelToPdfConverter.cs — horizontal scaling integration, margin adjustments
  • PdfTextBlock.cs — MaxWidth property
  • PdfPage.cs — maxWidth parameter on AddText

Full Changelog: v0.8.0...v0.9.0