Skip to content

Latest commit

 

History

History
88 lines (62 loc) · 3.5 KB

File metadata and controls

88 lines (62 loc) · 3.5 KB

🌉 Kreuzberg

Content Intelligence for AI Engineering Workflows: Open Source and Cloud

Linkedin- Banner

Kreuzberg is a polyglot document intelligence framework built around a high-performance Rust core. It helps developers extract text, structure, metadata, and embeddings from 92+ document formats at native speed, without requiring GPUs.

Kreuzberg is an open-source library. We're currently building a hosted cloud service around it to make document processing reliable, scalable, and easy to integrate into modern pipelines.

What is Kreuzberg

1. Kreuzberg (Open Source)

A high-performance, extensible document intelligence engine.

  • Rust core with streaming parsers and full parallelism
  • Native bindings for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, WASM, TypeScript(Node/Bun/Wasm/Deno)
  • 88+ supported formats including PDF, Office, images, HTML, XML, email, archives, and scientific formats
  • OCR with table extraction (Tesseract, EasyOCR, PaddleOCR, extensible via plugins)
  • Built-in semantic chunking and optional embeddings for RAG pipelines
  • CLI, REST API, Docker images, and MCP server

Read more: https://kreuzberg.dev/

2. Kreuzberg Cloud (Coming Soon)

A fully managed document intelligence API powered by the same engine.

Planned features include:

  • Hosted REST API
  • Async jobs and webhooks
  • Built-in chunking for RAG pipelines
  • Premium OCR backends
  • Usage dashboards and analytics
  • Simple pay-as-you-go pricing

3. html-to-markdown

A high-performance HTML → Markdown converter powered by Rust. Available as a Rust crate, Python package, PHP extension, Ruby gem, Elixir Rustler NIF, Node.js bindings, WebAssembly, and a standalone CLI- with identical rendering behavior across platforms.

Why Choose Kreuzberg

  • Truly polyglot: same engine across languages
  • High throughput: optimized for batch workloads and multi-GB documents
  • Memory efficient: streaming architecture keeps memory usage predictable
  • Flexible deployment: use via CLI, REST API, MCP server and more
  • MIT licensed: safe for enterprise, commercial, and closed-source use
  • Built for RAG: native chunking, embeddings, and customization

Community

Join our dev community to ask questions, share feedback, and show what you’re building.

Discord: https://discord.gg/xzx4KkAPED
Subreddit: https://www.reddit.com/r/kreuzberg_dev/
LinkedIn: https://www.linkedin.com/company/kreuzberg-dev/
X/Twitter: https://x.com/kreuzberg_dev

Contributing

Contributions are welcome.

  1. Open an issue to propose a change
  2. Submit a pull request
  3. Maintainers review and merge

See CONTRIBUTING.md in the relevant repository for details.
Kreuzberg repository: https://github.com/kreuzberg-dev/kreuzberg

Maintainers

Built with love in Kreuzberg, Berlin.