asukhodko · asukhodko · Jan 8, 2026 · Jan 8, 2026
diff --git a/README.md b/README.md
@@ -1,24 +1,39 @@
 # Chunkana
 
-Intelligent Markdown chunking library for RAG systems.
+[![GitHub Repository](https://img.shields.io/badge/GitHub-Chunkana-181717?logo=github)](https://github.com/asukhodko/chunkana)
+[![PyPI version](https://img.shields.io/pypi/v/chunkana.svg)](https://pypi.org/project/chunkana/)
+[![Python versions](https://img.shields.io/pypi/pyversions/chunkana.svg)](https://pypi.org/project/chunkana/)
+[![License](https://img.shields.io/pypi/l/chunkana.svg)](LICENSE)
+[![Downloads](https://img.shields.io/pypi/dm/chunkana.svg)](https://pypi.org/project/chunkana/)
 
-## Features
+**Chunkana** is a high-precision Markdown chunking library for RAG pipelines, search indexing, and LLM ingestion. It produces semantically correct Markdown chunks by respecting headers, code blocks, tables, and LaTeX while keeping the output retrieval-ready.
 
-- 🧠 **Smart chunking**: Automatically selects optimal strategy based on content
-- 📦 **Atomic blocks**: Preserves code blocks, tables, and LaTeX formulas
-- 🌳 **Hierarchical**: Navigate chunks by header structure with tree invariant validation
-- 📊 **Rich metadata**: Header paths, content types, overlap context
-- 🔄 **Streaming**: Process large files (>10MB) efficiently
-- 🎯 **Multiple renderers**: JSON, inline metadata, Dify-compatible
-- ✅ **Quality assurance**: Automatic dangling header prevention and micro-chunk minimization
+If you're looking for a **semantic Markdown chunker**, **Markdown splitter**, or **Markdown document segmenter** that preserves structure for LLM context windows, Chunkana is built for exactly that.
+
+## Why Chunkana
+
+Chunkana turns messy Markdown into clean, structured chunks that retain meaning:
+
+- **Semantic correctness**: preserves headers, lists, tables, code blocks, and math without splitting them mid-block.
+- **RAG-ready metadata**: header paths, content types, line ranges, and overlap context.
+- **Smart strategy selection**: automatically adapts to code-heavy, list-heavy, or structural documents.
+- **Hierarchical navigation**: build a chunk tree for section-aware retrieval.
+- **Streaming for large files**: chunk multi-megabyte documents without loading everything into memory.
+- **Compatibility**: output formats for Dify and JSON APIs.
 
 ## Installation
 
 ```bash
 pip install chunkana
 ```
 
-## Quick Start
+Optional extras:
+
+```bash
+pip install "chunkana[docs]"
+```
+
+## Quick start
 
 ```python
 from chunkana import chunk_markdown
@@ -42,10 +57,12 @@ def hello():
 
 chunks = chunk_markdown(text)
 for chunk in chunks:
-    print(f"Lines {chunk.start_line}-{chunk.end_line}: {chunk.metadata['header_path']}")
+    print(f"{chunk.start_line}-{chunk.end_line}: {chunk.metadata['header_path']}")
 ```
 
-## Configuration
+## Usage examples
+
+### 1) Tune chunk sizes and overlap
 
 ```python
 from chunkana import chunk_markdown, ChunkerConfig
@@ -59,120 +76,100 @@ config = ChunkerConfig(
 chunks = chunk_markdown(text, config)
 ```
 
-### Hierarchical Chunking Configuration
-
-For hierarchical chunking with tree structure validation:
+### 2) Build a hierarchical chunk tree
 
 ```python
 from chunkana import MarkdownChunker, ChunkConfig
 
-config = ChunkConfig(
-    max_chunk_size=1000,
-    min_chunk_size=100,
-    overlap_size=100,
-    validate_invariants=True,  # Enable tree invariant validation (default: True)
-    strict_mode=False,         # Auto-fix violations vs raise exceptions (default: False)
-)
-
-chunker = MarkdownChunker(config)
+chunker = MarkdownChunker(ChunkConfig(validate_invariants=True))
 result = chunker.chunk_hierarchical(text)
 
-# Navigate the hierarchy
 root = result.get_chunk(result.root_id)
 children = result.get_children(result.root_id)
-flat_chunks = result.get_flat_chunks()
+flat_chunks = result.get_flat_chunks()  # leaf + significant parent chunks
 ```
 
-**Configuration options:**
-- `validate_invariants` (default: `True`): Validates tree invariants after construction
-- `strict_mode` (default: `False`): When `True`, raises exceptions on invariant violations; when `False`, auto-fixes issues and logs warnings
-
-## Exception Handling
-
-Chunkana provides a hierarchy of exceptions for error handling:
+### 3) Stream large Markdown files
 
 ```python
-from chunkana import (
-    ChunkanaError,              # Base exception for all chunkana errors
-    HierarchicalInvariantError, # Tree structure violations
-    ValidationError,            # Validation failures
-    ConfigurationError,         # Invalid configuration
-    TreeConstructionError,      # Tree building failures
-)
+from chunkana import MarkdownChunker
 
-try:
-    result = chunker.chunk_hierarchical(text)
-except HierarchicalInvariantError as e:
-    print(f"Invariant violation: {e.invariant}")
-    print(f"Chunk ID: {e.chunk_id}")
-    print(f"Suggested fix: {e.suggested_fix}")
-except ChunkanaError as e:
-    print(f"Chunking error: {e}")
+chunker = MarkdownChunker()
+for chunk in chunker.chunk_file_streaming("docs/handbook.md"):
+    print(chunk.metadata["chunk_index"], chunk.size)
 ```
 
-## Renderers
+### 4) Emit Dify-compatible output
 
 ```python
 from chunkana import chunk_markdown
-from chunkana.renderers import render_dify_style, render_json
+from chunkana.renderers import render_dify_style
 
 chunks = chunk_markdown(text)
-
-# JSON output
-json_output = render_json(chunks)
-
-# Dify-compatible format
-dify_output = render_dify_style(chunks)
+output = render_dify_style(chunks)
 ```
 
-## Quality Features
-
-### Dangling Header Prevention
-
-Chunkana automatically prevents headers from being separated from their content. When a chunk would end with a header (like `#### Details`), the header is moved to the next chunk to maintain semantic coherence.
-
-### Micro-Chunk Minimization
-
-Small chunks are intelligently merged with adjacent content when they lack structural significance, reducing fragmentation while preserving important standalone elements like code blocks and tables.
+### 5) Adaptive chunk sizing for mixed documents
 
-### Tree Invariant Validation
-
-Hierarchical chunking validates:
-- **is_leaf consistency**: Leaf status matches children presence
-- **Parent-child bidirectionality**: All relationships are symmetric
-- **No orphaned chunks**: Every chunk is reachable from root
-
-### Line Range Contract (Hierarchical Mode)
+```python
+from chunkana import chunk_markdown, ChunkerConfig
+from chunkana.adaptive_sizing import AdaptiveSizeConfig
 
-In hierarchical chunking mode, `start_line` and `end_line` follow a specific contract:
+config = ChunkerConfig(
+    use_adaptive_sizing=True,
+    adaptive_config=AdaptiveSizeConfig(
+        base_size=1500,
+        code_weight=0.4,
+        min_size=500,
+        max_size=8000,
+    ),
+)
 
-- **Leaf nodes**: Line range covers only the chunk's own content
-- **Internal nodes**: Line range covers only the node's own content (not children)
-- **Root node**: Line range covers the entire document (1 to last line)
+chunks = chunk_markdown(text, config)
+```
 
-**Important**: The sum of children's line ranges does NOT equal the parent's range. The parent contains only its "header" content, while children contain detailed content. This is by design for hierarchical navigation.
+## Renderers
 
 ```python
-result = chunker.chunk_hierarchical(text)
-root = result.get_chunk(result.root_id)
+from chunkana.renderers import (
+    render_dify_style,
+    render_json,
+    render_inline_metadata,
+    render_with_embedded_overlap,
+)
+```
 
-# Root covers entire document
-print(f"Root: lines {root.start_line}-{root.end_line}")
+- **render_dify_style** — `<metadata>` blocks for Dify.
+- **render_json** — list of dictionaries for JSON APIs.
+- **render_inline_metadata** — HTML comment metadata inline.
+- **render_with_embedded_overlap** — injects overlap into text for RAG windows.
 
-# Children cover their own sections
-for child in result.get_children(result.root_id):
-    print(f"Child: lines {child.start_line}-{child.end_line}")
-```
+## Integrations
+
+- [Dify](docs/integrations/dify.md)
+- [n8n](docs/integrations/n8n.md)
+- [Windmill](docs/integrations/windmill.md)
 
 ## Documentation
 
+- [Overview](docs/overview.md)
 - [Quick Start](docs/quickstart.md)
 - [Configuration](docs/config.md)
 - [Strategies](docs/strategies.md)
 - [Renderers](docs/renderers.md)
 - [Debug Mode](docs/debug_mode.md)
 - [Migration Guide](MIGRATION_GUIDE.md)
 
+## FAQ
+
+**Q: What makes Chunkana different from a basic Markdown splitter?**
+
+Chunkana is a **semantic Markdown chunker** that keeps structure intact (headers, lists, code blocks, tables, LaTeX) and enriches each chunk with retrieval metadata. This yields more accurate search and RAG results than naive line-based splitting.
+
+**Q: Does Chunkana work for RAG and LLM ingestion?**
+
+Yes. Chunkana is optimized for **RAG chunking**, **LLM context window preparation**, and **semantic Markdown segmentation**. It provides overlap metadata and consistent hierarchy paths for retrieval pipelines.
+
 ## License
 
 MIT
diff --git a/docs/config.md b/docs/config.md
@@ -2,7 +2,7 @@
 
 Chunkana uses `ChunkerConfig` (alias: `ChunkConfig`) to control chunking behavior.
 
-## Basic Parameters
+## Basic parameters
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
@@ -12,17 +12,17 @@ Chunkana uses `ChunkerConfig` (alias: `ChunkConfig`) to control chunking behavio
 | `preserve_atomic_blocks` | bool | True | Keep code blocks, tables, LaTeX intact |
 | `extract_preamble` | bool | True | Extract content before first header as preamble |
 
-## Strategy Selection Thresholds
+## Strategy selection thresholds
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `code_threshold` | float | 0.3 | Code ratio threshold for CodeAware strategy |
 | `structure_threshold` | int | 3 | Minimum headers for Structural strategy |
 | `list_ratio_threshold` | float | 0.4 | List content ratio for ListAware strategy |
 | `list_count_threshold` | int | 5 | Minimum lists for ListAware strategy |
-| `strategy_override` | str\|None | None | Force specific strategy: "code_aware", "list_aware", "structural", "fallback" |
+| `strategy_override` | str\|None | None | Force strategy: "code_aware", "list_aware", "structural", "fallback" |
 
-## Code-Context Binding
+## Code-context binding
 
 These parameters control how code blocks are bound to surrounding explanations:
 
@@ -35,7 +35,7 @@ These parameters control how code blocks are bound to surrounding explanations:
 | `bind_output_blocks` | bool | True | Bind code with its output blocks |
 | `preserve_before_after_pairs` | bool | True | Keep before/after code pairs together |
 
-## Adaptive Sizing
+## Adaptive sizing
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
@@ -55,7 +55,7 @@ adaptive_config = AdaptiveSizeConfig(
 )
 ```
 
-## Table Grouping
+## Table grouping
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
@@ -73,7 +73,7 @@ table_config = TableGroupingConfig(
 )
 ```
 
-## Overlap Behavior
+## Overlap behavior
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
@@ -82,21 +82,21 @@ table_config = TableGroupingConfig(
 
 The overlap is stored in metadata (`previous_content`, `next_content`), not embedded in `chunk.content`.
 
-## LaTeX Handling
+## LaTeX handling
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `preserve_latex_blocks` | bool | True | Keep LaTeX blocks intact |
 
 When enabled, LaTeX blocks (`$$...$$`, `\[...\]`, `\begin{...}...\end{...}`) are treated as atomic units.
 
-## Computed Fields
+## Computed fields
 
 | Field | Description |
 |-------|-------------|
 | `enable_overlap` | Computed as `overlap_size > 0` |
 
-## Factory Methods
+## Factory methods
 
 ```python
 from chunkana import ChunkerConfig
@@ -120,9 +120,19 @@ config = ChunkerConfig.from_dict(config_dict)
 
 Round-trip is guaranteed: `ChunkerConfig.from_dict(config.to_dict()) == config`
 
-## Example Configurations
+## Recommended presets
 
-### Documentation Sites
+### RAG pipelines
+
+```python
+config = ChunkerConfig(
+    max_chunk_size=4096,
+    min_chunk_size=512,
+    overlap_size=200,
+)
+```
+
+### Documentation sites
 
 ```python
 config = ChunkerConfig(
@@ -133,7 +143,7 @@ config = ChunkerConfig(
 )
 ```
 
-### Code Repositories
+### Code repositories
 
 ```python
 config = ChunkerConfig(
@@ -145,7 +155,7 @@ config = ChunkerConfig(
 )
 ```
 
-### Changelogs / Release Notes
+### Changelogs / release notes
 
 ```python
 config = ChunkerConfig(
@@ -156,7 +166,7 @@ config = ChunkerConfig(
 )
 ```
 
-### Scientific Documents (LaTeX)
+### Scientific documents (LaTeX)
 
 ```python
 config = ChunkerConfig(
@@ -166,6 +176,6 @@ config = ChunkerConfig(
 )
 ```
 
-## Plugin Compatibility
+## Plugin compatibility
 
 All 17 fields from dify-markdown-chunker's `ChunkConfig` are supported. See [Parity Matrix](migration/parity_matrix.md) for details.