v0.10.0 — Word (.docx) to PDF Conversion
v0.10.0 — Word (.docx) to PDF Conversion
Highlights
This release adds DOCX-to-PDF conversion — a brand-new, zero-dependency Word document renderer. MiniPdf can now convert .docx files to PDF with paragraph, table, and image support, achieving a 97.4% average overall score across 60 benchmark test cases compared to LibreOffice reference output.
New Features
DOCX Reader (DocxReader.cs — 727 lines)
• Full OOXML paragraph parsing: text runs, bold/italic/underline/strikethrough, font sizes, font colors, highlight colors
• Heading styles (Heading1–Heading9) with automatic font size mapping from styles.xml
• Paragraph alignment (left, center, right, justify) and indentation (left, right, hanging, firstLine)
• Bullet and numbered list support with numId/ilvl detection
• Tab stop parsing with position and alignment (left, center, right, decimal)
• Paragraph shading / background color support
• Table parsing with cell content, borders, shading, column spans (gridSpan), and grid column widths
• Embedded image extraction via relationships with EMU-to-point dimension conversion
• Page layout reading from sectPr: page size, margins, orientation
• Page break detection (w:br type page and lastRenderedPageBreak)
DOCX-to-PDF Converter (DocxToPdfConverter.cs — 682 lines)
• Paragraph rendering with mixed formatting runs, line wrapping, and proper line spacing
• Heading rendering with bold weight and scaled font sizes
• Text alignment: left, center, right, justified
• List rendering with bullet (•) and numbered (1., 2., …) prefixes at correct indentation
• Tab stop handling with leader positioning
• Paragraph shading rendered as filled rectangles behind text
• Table rendering with cell borders, shading fills, column-width distribution, and automatic row height
• Image embedding as inline JPEG XObjects with aspect-ratio-aware scaling
• Page layout support: reads page dimensions and margins from DOCX sectPr
• Automatic page breaks: content overflow and explicit w:br type="page" handling
• Configurable ConversionOptions: font size, margins, line spacing, page dimensions
Unified API
• MiniPdf.ConvertToPdf() now auto-detects .docx files by extension — no API change needed for existing callers
• New MiniPdf.ConvertDocxToPdf(Stream) method for stream-based DOCX conversion
• Updated NuGet description and tags to include word and docx
Tests
• 9 new unit tests in DocxToPdfConverterTests.cs covering: simple documents, bold text, tables, empty documents, multi-paragraph, stream input, and file output
• 60 DOCX benchmark test cases with visual comparison against LibreOffice reference PDFs
Benchmark
• 60 DOCX test cases (classic01–classic60): single paragraph, multiple paragraphs, headings, bold/italic, font sizes, font colors, alignment, bullet lists, numbered lists, simple tables, table shading, mixed content, images, long documents, multi-page tables, comprehensive reports, and more
• Average Overall Score: 0.9739 (text similarity + visual comparison vs LibreOffice)
• Benchmark scripts: Run-Benchmark_docx.ps1, generate_reference_pdfs_docx.py, compare_pdfs.py with DOCX mode
Other Changes
• Added .gitattributes to configure GitHub Linguist for Python scripts
• Updated README badges: replaced .NET badge with Gitee link across all language variants (EN, zh-CN, zh-TW, ja, ko, fr, it)
• README now includes DOCX benchmark visual comparison table with MiniPdf vs Reference side-by-side images
Files Changed
• DocxReader.cs — +727 lines (new): OOXML document parser
• DocxToPdfConverter.cs — +682 lines (new): DOCX-to-PDF rendering engine
• MiniPdf.cs — +23 lines: .docx auto-detection and ConvertDocxToPdf() API
• MiniPdf.csproj — updated description and package tags
• 192 files changed total (including benchmark images, reports, and scripts)
Full Changelog: v0.9.0...v0.10.0