🎮 Hobby Project — built for fun and learning. Not production-hardened. PRs and issues welcome.
A lightweight .NET library for converting, reading, and creating documents across Word, PDF, Excel, CSV, HTML, and Image formats — with a clean service-based API, three input modes (file · stream · bytes), and full async support.
| Service | What it does |
|---|---|
IWordService |
Create, read, template-fill, convert to PDF |
IPdfService |
Extract text & images, convert to Word / Excel |
IExcelService |
Create, read, template-fill, convert to PDF |
ICsvService |
Create, read, convert to/from Excel and PDF |
IHtmlService |
Extract text & tables, convert to Word / PDF / Excel |
IImageService |
OCR text extraction, embed image in PDF / Word / Excel |
IConversionService |
Top-level orchestrator — routes all of the above |
Word → PDF PDF → Word Excel → PDF
PDF → Excel CSV → Excel Excel → CSV
CSV → PDF HTML → Word HTML → PDF
HTML → Excel Image → PDF Image → Word
Image → Excel
Every method comes in three overloads plus async variants:
// File-based
await _wordService.CreateWordAsync("output.docx", content);
// Stream-based
await _wordService.CreateWordAsync(outputStream, content);
// Bytes-based (great for HTTP responses / blob storage)
byte[] bytes = await _wordService.CreateWordAsync(content);Clone and reference DocFlow.Core directly — no NuGet package yet (hobby project, maybe someday 🙂).
git clone https://github.com/forgeschema-star/docflow.gitThen add a project reference in your .csproj:
<ProjectReference Include="../docflow/DocFlow.Core/DocFlow.Core.csproj" />DocFlow.Core targets net48 and netstandard2.0, so it works with:
- .NET Framework 4.8+
- .NET Core / .NET 5 / .NET 6 / .NET 8+
- ASP.NET Core
// Program.cs / Startup.cs
builder.Services.AddDocFlowCore();
// Or with custom settings
var settings = new DocFlowSettings
{
TempDirectory = "/tmp/docflow",
OcrDataPath = "/usr/share/tessdata", // required for OCR
MaxFileSizeBytes = 50_000_000, // 50 MB
LoggingEnabled = true,
AllowOverwrite = false,
};
builder.Services.AddDocFlowCore(myLogger, settings);var settings = DocFlowSettings.CreateDefault();
var logger = new ConsoleLogger(); // implement ILogger
var wordService = new WordService(logger, settings);
var excelService = new ExcelService(logger, settings);
var pdfService = new PdfService(wordService, excelService, logger, settings);
// ... wire up remaining services
var converter = new ConversionService(wordService, excelService, pdfService,
csvService, htmlService, imageService, logger, settings);byte[] doc = await _wordService.CreateWordAsync("Hello, DocFlow!");// Template contains {{CustomerName}}, {{InvoiceDate}}, {{Total}}
var placeholders = new Dictionary<string, string>
{
{ "CustomerName", "Acme Corp" },
{ "InvoiceDate", "2024-06-10" },
{ "Total", "$4,500.00" },
};
byte[] filled = await _wordService.ReplacePlaceholdersAsync(templateBytes, placeholders);var result = await _conversionService.ConvertAsync(
DocumentType.Word, DocumentType.Pdf, inputBytes);
if (result.Success)
File.WriteAllBytes("output.pdf", result.OutputBytes);
else
Console.WriteLine($"[{result.ErrorCode}] {result.Message}");List<Dictionary<string, string>> rows =
await _excelService.ReadExcelAsync("report.xlsx");
foreach (var row in rows)
Console.WriteLine($"{row["Name"]} — {row["Department"]}");// Requires OcrDataPath set in DocFlowSettings
string text = await _imageService.ReadImageTextAsync("scan.png");var result = await _conversionService.ConvertAsync(from, to, bytes);
switch (result.ErrorCode)
{
case ConversionErrorCode.None: /* success */ break;
case ConversionErrorCode.FileTooLarge: /* > MaxBytes */ break;
case ConversionErrorCode.FileNotFound: /* missing */ break;
case ConversionErrorCode.OutputAlreadyExists: break;
case ConversionErrorCode.UnsupportedConversion: break;
case ConversionErrorCode.ProcessingFailed: /* general */ break;
}DocFlow.Core is built on top of these open-source libraries:
| Library | Version | Purpose | License |
|---|---|---|---|
| DocumentFormat.OpenXml | 2.20.0 | Read/write .docx files |
MIT |
| ClosedXML | 0.104.2 | Read/write .xlsx files |
MIT |
| PdfSharpCore | 1.3.62 | Generate PDF documents (includes MigraDocCore for layouts) | MIT |
| HtmlAgilityPack | 1.12.4 | Parse HTML, extract text & tables | MIT |
| Tesseract | 5.2.0 | OCR text extraction from images | Apache 2.0 |
| UglyToad.PdfPig | 0.1.10 | Read PDF text & extract images | Apache 2.0 |
DocFlow.Core/
├── Interfaces/ IConversionService, IWordService, IExcelService, ...
├── Services/ ConversionService, WordService, ExcelService, ...
├── Models/ DocumentType, ConversionResult, DocFlowSettings, ...
├── Helpers/ CsvHelper, HtmlHelper, PlaceholderHelper, ...
├── Factory/ DocumentFactory
└── Extensions/ ServiceCollectionExtensions (AddDocFlowCore)
The docflow-src private repo contains DocFlow.ConsoleDemo — a smoke-test runner that exercises the full conversion pipeline:
dotnet run --project DocFlow.ConsoleDemoScenarios covered: valid workflow, file-not-found, overwrite protection, file-size limit, CSV/HTML structured formats.
MIT — see LICENSE.
This is a personal hobby project built in spare time to explore .NET document processing.
- No SLA, no production support guarantee
- Breaking changes may happen without notice
- Issues and PRs are welcome — I'll do my best to respond
- Feel free to fork and adapt for your own projects
If you find it useful, drop a ⭐ — it means a lot!