A C# ASP.NET Core Web API wrapper for Microsoft's MarkItDown Python library. Converts various document formats to Markdown.
- Clean Architecture: Domain, Application, Infrastructure, and API layers
- CQRS Pattern: Commands and Queries with MediatR
- File Conversion: Upload files (PDF, DOCX, PPTX, XLSX, images, etc.) and get Markdown
- URL Conversion: Convert webpages and YouTube videos to Markdown
- IIS Compatible: Designed for Windows/IIS deployment
- Python.NET Integration: Direct Python integration without subprocess overhead
- Scalar Documentation: Interactive API documentation
| Category | Extensions |
|---|---|
| Documents | .pdf, .docx, .doc, .pptx, .ppt, .xlsx, .xls |
| Text | .html, .htm, .csv, .json, .xml, .txt, .md |
| Images | .jpg, .jpeg, .png, .gif, .bmp, .tiff, .webp |
| Audio | .mp3, .wav, .m4a |
| Other | .zip, .epub, .msg, .eml |
| URLs | HTTP/HTTPS webpages, YouTube videos |
MarkItDownAPI/
├── src/
│ ├── MarkItDownAPI.Domain/ # Entities, Value Objects
│ ├── MarkItDownAPI.Application/ # Use Cases, CQRS, Validators
│ ├── MarkItDownAPI.Infrastructure/ # Python Integration
│ └── MarkItDownAPI.Api/ # Controllers, Middleware
├── tests/
│ ├── MarkItDownAPI.UnitTests/
│ └── MarkItDownAPI.IntegrationTests/
├── Install.ps1 # Installation script
├── start-dev.ps1 # Development server script
└── CHANGELOG.md
Run the installation script - it will automatically install Python and MarkItDown if needed:
.\Install.ps1Script options:
# Interactive mode (prompts for input)
.\Install.ps1
# Fully automated (for CI/CD)
.\Install.ps1 -NonInteractive
# Specify Python version
.\Install.ps1 -PythonVersion 3.11
# Skip Python installation (if already installed)
.\Install.ps1 -SkipPythonInstallDownload and install Python from python.org. Make sure to:
- Install for all users
- Add Python to PATH
- Note the installation path (e.g.,
C:\Python312)
pip install "markitdown[all]"Download from ffmpeg.org or use winget:
winget install Gyan.FFmpegDownload from dot.net.
Edit src/MarkItDownAPI.Api/appsettings.json:
{
"Python": {
"PythonHome": "C:\\Python312",
"PythonDll": "python312.dll",
"EnableLlmDescriptions": false,
"OpenAIApiKey": "",
"LlmModel": "gpt-4o",
"EnablePlugins": false,
"DocumentIntelligenceEndpoint": ""
}
}| Setting | Description |
|---|---|
PythonHome |
Path to Python installation. Leave empty for auto-detection. |
PythonDll |
Python DLL filename (e.g., python312.dll for Python 3.12) |
EnableLlmDescriptions |
Enable AI image descriptions (requires OpenAI API key) |
OpenAIApiKey |
OpenAI API key for LLM features |
EnablePlugins |
Enable MarkItDown third-party plugins |
DocumentIntelligenceEndpoint |
Azure Document Intelligence endpoint (optional) |
Use the development script:
.\start-dev.ps1This will:
- Validate Python environment
- Build the solution
- Start the API in a new console window
- Open Scalar documentation in your browser
The API will be available at:
- HTTPS: https://localhost:5001
- HTTP: http://localhost:5000
- Docs: https://localhost:5001/scalar/v1
.\start-dev.ps1 # Default (builds and opens browser)
.\start-dev.ps1 -NoBuild # Skip build step
.\start-dev.ps1 -NoBrowser # Don't open browser
.\start-dev.ps1 -Release # Build in Release mode
.\start-dev.ps1 -Port 8080 # Use custom portPOST /api/convert/file
Content-Type: multipart/form-datacURL example:
curl -X POST "https://localhost:5001/api/convert/file" \
-F "file=@document.pdf"POST /api/convert/url
Content-Type: application/json
{
"url": "https://example.com"
}GET /api/convert/supported-formatsGET /api/convert/health{
"success": true,
"markdown": "# Document Title\n\nContent...",
"title": "Document Title",
"error": null,
"processingTimeMs": 1234
}dotnet publish src/MarkItDownAPI.Api -c Release -o ./publish- Install the ASP.NET Core Hosting Bundle
- Create a new IIS site pointing to the
publishfolder - Set the Application Pool to "No Managed Code"
- Ensure the App Pool identity has access to Python installation
Edit web.config in the publish folder to set the correct Python paths:
<environmentVariables>
<environmentVariable name="PYTHONHOME" value="C:\Python312" />
<environmentVariable name="PATH" value="C:\Python312;C:\Python312\Scripts;%PATH%" />
</environmentVariables># Grant IIS_IUSRS access to Python directory
icacls "C:\Python312" /grant "IIS_IUSRS:(OI)(CI)RX" /Tdotnet testMarkItDown uses Google's free speech recognition API to transcribe audio files (MP3, WAV, etc.) to text. This has several limitations:
- External API dependency: Requires internet access to Google's speech recognition service
- File size/duration limits: Long audio files may fail or be truncated
- Rate limits: Google's free API has usage limits
- Accuracy: Depends on audio quality, language, and background noise
If audio transcription fails, you'll receive an error: "Audio transcription failed. MarkItDown uses speech recognition to convert audio files..."
For production use with audio files, consider:
- Using OpenAI Whisper API (configure via
OpenAIApiKey) - Pre-processing audio files to shorter segments
- Accepting that some audio files may not convert
Text extraction from images requires Tesseract OCR to be installed separately.
- Verify
PythonHomeinappsettings.jsonpoints to correct path - Check that environment variables are set in
web.config
- Run
pip install "markitdown[all]"in the Python installation - Ensure pip installed packages are accessible to IIS
- Grant IIS App Pool identity read/execute permissions on Python folder
- Check
stdoutlogs in thelogsfolder
MIT License. See LICENSE for details.
MarkItDown itself is © Microsoft Corporation, licensed under MIT.