Skip to content

Commit 0a9aaa8

Browse files
docs: reposition Inkforge as long-form document-level handwriting engine
- README: add Calligrapher AI comparison, document-level features table, 4-step generation pipeline (word chunking, LSTM state passing, typewriter layout, sine-wave drift) - ARCHITECTURE: update to 4-tier arch with Document Layout Engine, expanded data flow - API: add document layout params, realistic multi-paragraph example - inference.yaml: add document_layout config section
1 parent 1e153bf commit 0a9aaa8

4 files changed

Lines changed: 257 additions & 40 deletions

File tree

README.md

Lines changed: 155 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@
33
</p>
44

55
<h1 align="center">✍ INKFORGE</h1>
6-
<h3 align="center">Human-Like Handwriting Synthesis Engine</h3>
6+
<h3 align="center">Long-Form Human Handwriting Synthesis Engine</h3>
77

88
<p align="center">
9-
<strong>Stroke-level generative ML model trained on real human handwriting — not a font.</strong>
9+
<strong>Generate full pages of realistic handwriting — paragraphs, essays, letters — with natural fatigue, drift, and writer personality. Not a font. Not a single-line demo. A complete document-level handwriting engine.</strong>
1010
</p>
1111

1212
<p align="center">
@@ -41,39 +41,76 @@
4141

4242
## Executive Summary
4343

44-
Every existing text-to-handwriting tool is, under the hood, a **font renderer**. Static handwriting fonts produce a single fixed glyph per character — perfectly uniform stroke width, zero baseline drift, no ligatures, no pressure variance. The human eye detects this inauthenticity instantly.
44+
Every existing text-to-handwriting tool falls into one of two traps:
4545

46-
**Inkforge replaces the font rendering pipeline entirely** with a stroke-level generative ML model trained on real human handwriting corpora.
46+
1. **Font renderers** — Static handwriting fonts that produce identical glyphs every time. The human eye detects this instantly.
47+
2. **Single-line demo generators** (e.g., Calligrapher AI) — RNN-based tools that generate one short line at a time with basic style controls. They can produce a convincing sentence, but ask them for a full page? They have no concept of paragraphs, margins, page layout, or the way human handwriting evolves over long passages.
48+
49+
**Inkforge is different.** It generates **entire documents** — full paragraphs, multi-page letters, long-form essays — where the handwriting looks like it was written by a real human sitting at a desk for 20 minutes, not generated one line at a time and stitched together.
50+
51+
### How Inkforge Differs from Calligrapher AI & Others
52+
53+
| Feature | Calligrapher AI / Font Tools | **Inkforge** |
54+
|---------|-----------------------------|--------------|
55+
| Scope | Single line or short snippet | **Full documents — paragraphs, pages, essays** |
56+
| Line Wrapping | None — user manually splits lines | **Automatic word-wrap with natural margin awareness** |
57+
| Writing Fatigue | None | **Progressive degradation over long passages** — stroke quality, size, spacing all evolve |
58+
| Paragraph Structure | None | **Indentation, paragraph spacing, section breaks** |
59+
| Inter-line Consistency | Each line generated independently | **Lines are coherent within a page** — consistent writer personality with natural drift |
60+
| Character Memory | Stateless per generation | **Writer-consistent evolution** — the same character looks subtly different each time, but consistently "from the same hand" |
61+
| Page Layout | Not applicable | **Full page composition** — margins, headers, line spacing, multi-page support |
62+
| Output Length | ~1 line (typically <100 chars) | **Up to 2,000+ characters** — full page A4/Letter output |
63+
| Export | SVG only | **PNG (300 DPI), PDF (A4/US Letter), SVG** |
4764

4865
### Target Users
4966

5067
| Persona | Goal | Pain Point |
5168
|---------|------|------------|
52-
| **Portfolio Builder** | Technically impressive ML project with live demo | Existing generators all use fonts — no real ML differentiator |
53-
| **D2C Marketer** | Personalised handwritten notes in shipments at scale | Font-based tools look fake; robot pens cost $5–8/letter |
54-
| **Real Estate Agent** | Handwritten outreach letters for 3–5x response rates | No tool combines realistic generation with mail pipeline |
69+
| **Portfolio Builder** | Technically impressive ML project with live demo | Existing generators produce short demos — no long-form document generation |
70+
| **D2C Marketer** | Personalised handwritten notes in shipments at scale | Font-based tools look fake; single-line generators can't produce full letters |
71+
| **Real Estate Agent** | Handwritten outreach letters for 3–5x response rates | No tool generates a convincing full-page handwritten letter |
72+
| **Student / Creator** | Handwritten essays, assignments, or journal pages | Need realistic multi-paragraph output, not one line at a time |
5573

5674
---
5775

5876
## The Problem
5977

60-
Real human handwriting is defined by its **imperfections**:
78+
Real human handwriting over a **full page** is defined by its **imperfections at every scale**:
6179

80+
### Character Level
6281
- Pressure builds and releases mid-stroke
6382
- Letters lean inconsistently
64-
- The baseline wanders
83+
- The same letter is written slightly differently every time — but consistently "from the same hand"
84+
85+
### Line Level
86+
- The baseline wanders across a line
6587
- Adjacent characters influence each other through natural ligatures
66-
- Writing degrades in consistency over long passages
88+
- Word spacing varies naturally — tighter in fast sections, looser in deliberate ones
89+
90+
### Document Level (what no other tool handles)
91+
- Writing quality **degrades over long passages** — fatigue is real
92+
- Letter size subtly **grows or shrinks** over paragraphs
93+
- Margins aren't perfectly straight — the left edge drifts
94+
- Line spacing isn't uniform — it loosens as the writer reaches the bottom of a page
95+
- Paragraph indentation varies between paragraphs
96+
- The overall slant may shift across the page as the writer's hand position changes
6797

68-
**No font can replicate this** because fonts are context-free and deterministic by design.
98+
**No font can replicate this.** And no single-line generator even attempts it.
6999

70100
---
71101

72102
## The Solution
73103

74-
Inkforge synthesizes handwriting as **sequences of pen strokes** with learned distributions over pressure, velocity, slant, and inter-character spacing. Every generation is unique. Every line drifts naturally. Every character is subtly different from its previous instance.
104+
Inkforge synthesizes handwriting as **sequences of pen strokes** with learned distributions over pressure, velocity, slant, and inter-character spacing — but unlike short-snippet generators, it operates at the **document level**.
75105

76-
> **This is not a filter applied to a font. It is synthesized handwriting — generated stroke-by-stroke by a deep learning model trained on thousands of real human writers.**
106+
When you feed Inkforge a 500-word essay:
107+
- It plans the **page layout** — margins, line count, paragraph breaks
108+
- It generates each line within the context of the **full document** — the model knows where it is on the page
109+
- It simulates **writing fatigue** — the 30th line isn't as crisp as the 1st
110+
- It maintains **writer consistency** — every character comes from the same "hand", with natural per-instance variation
111+
- Every generation is **unique** — regenerating the same text produces a completely different manuscript
112+
113+
> **This is not a filter applied to a font. This is not a single-line demo. It is a full document synthesis engine — generating pages of handwriting stroke-by-stroke, with the realism of a human writer sitting at a desk.**
77114
78115
---
79116

@@ -90,17 +127,32 @@ Each parameter is implemented at the **model level**, not as post-processing. Th
90127
| **Slant Angle** | Global slant bias + per-word variance from learned distribution | -30° to +30° ||
91128
| **Baseline Drift** | Slow-varying sinusoidal noise on y-axis across a line | 0.0 – 1.0 | 0.3 |
92129
| **Ligature Formation** | Contextual stroke connections between adjacent characters | On / Off | On |
93-
| **Fatigue Simulation** | Increasing noise in latent space over token position | On / Off | Off |
130+
| **Fatigue Simulation** | Progressive degradation over long passages — stroke precision decreases, letter size drifts, spacing loosens | 0.0 – 1.0 | 0.3 |
94131
| **Ink Bleed** | Post-render Gaussian diffusion on stroke edges | 0.0 – 1.0 | 0.2 |
95132

133+
### Document-Level Layout Features
134+
135+
These features are what separate Inkforge from single-line generators. They enable full-page, multi-paragraph output.
136+
137+
| Feature | Description | Default |
138+
|---------|-------------|---------|
139+
| **Auto Line Wrapping** | Text automatically wraps at page margins with natural word-boundary detection | On |
140+
| **Paragraph Indentation** | First line of each paragraph indented with natural variation | 1.5 cm ± natural drift |
141+
| **Paragraph Spacing** | Variable vertical spacing between paragraphs | 1.2× line height |
142+
| **Margin Awareness** | Left/right margins with natural drift — not ruler-straight | 2 cm ± subtle variation |
143+
| **Inter-line Spacing** | Line spacing varies subtly across the page — loosens toward bottom | ~8mm ± 0.5mm drift |
144+
| **Page Composition** | Full A4/Letter page layout with configurable margins and line density | A4, 25–30 lines/page |
145+
| **Writer Hand Position Shift** | Slant and baseline shift as the writer's hand moves down the page | Subtle, progressive |
146+
96147
### User Interface
97148

98-
- **Text Input** — Multi-line, up to 2,000 characters, with paste-from-clipboard support
149+
- **Text Input** — Multi-line, up to 2,000+ characters, with paste-from-clipboard support. Supports full paragraphs, essays, and letters
99150
- **Style Presets** — "Neat Cursive", "Casual Print", "Rushed Notes", "Doctor's Scrawl", "Elegant Formal"
100151
- **Paper Textures** — Lined, Blank, Graph, Aged Parchment
101152
- **Ink Colors** — Black, Blue, Dark Blue, Sepia
102-
- **Live Canvas Preview** — Animated stroke-by-stroke playback with speed control
103-
- **Export** — PNG (300 DPI), PDF (A4/US Letter), SVG (vector)
153+
- **Live Canvas Preview** — Full page animated stroke-by-stroke playback with speed control
154+
- **Page Preview** — WYSIWYG preview showing exactly how the full document will look on paper
155+
- **Export** — PNG (300 DPI), PDF (A4/US Letter), SVG (vector) — full page output, not just a single line
104156

105157
---
106158

@@ -147,15 +199,85 @@ p₃ = end-of-sequence sentinel
147199
| Layer | Type | Configuration | Purpose |
148200
|-------|------|---------------|---------|
149201
| Input | Embedding | Char one-hot → d=256 | Character encoding |
150-
| Style | Concat | Latent `z ∈ ℝ¹²⁸` | Style injection per timestep |
202+
| Style | Concat | Latent `z ∈ ℝ¹²⁸` | Writer personality vector |
203+
| Position | Encoding | Page position (line #, char position) | Document-level context |
151204
| Encoder L1 | LSTM | hidden=512, dropout=0.2 | Sequence context |
152205
| Encoder L2 | LSTM | hidden=512, dropout=0.2 | Higher-order patterns |
153-
| Encoder L3 | LSTM | hidden=512, dropout=0.2 | Long-range dependencies |
206+
| Encoder L3 | LSTM | hidden=512, dropout=0.2 | Long-range dependencies (cross-line coherence) |
154207
| Output | MDN | M=20 Gaussian mixtures | Stroke distribution sampling |
155208
| Pen State | Bernoulli | Sigmoid × 3 | Pen up/down/end |
156209

157210
**MDN output per timestep:** `(π, μx, μy, σx, σy, ρ, e)` for M=20 components. Temperature `τ` controls generation randomness.
158211

212+
**Document-Level Generation Pipeline — How It Actually Works:**
213+
214+
Unlike single-line generators (Calligrapher AI, etc.) that generate each line in isolation and discard all state, Inkforge uses a **4-step pipeline** that maintains writer consistency across the entire document:
215+
216+
#### Step 1 — Smart Text Chunking
217+
218+
The full input text (up to 2,000+ characters) is broken into **individual words or short phrases**. Each chunk becomes a separate inference call to the LSTM — but critically, these calls are **not independent**.
219+
220+
```
221+
"Thank you for meeting with us last Thursday."
222+
223+
▼ Tokenizer
224+
["Thank", "you", "for", "meeting", "with", "us", "last", "Thursday."]
225+
```
226+
227+
#### Step 2 — LSTM State Passing (The Secret Sauce)
228+
229+
This is what makes Inkforge fundamentally different from tools that generate text line-by-line. When the model generates strokes for "Word 1", the **final LSTM hidden state `h_t`** is captured and used as the **initial hidden state for "Word 2"**.
230+
231+
```
232+
Word 1: "Thank"
233+
LSTM processes → generates strokes → final state h₁
234+
235+
Word 2: "you" │
236+
LSTM starts with h₁ → generates strokes → final state h₂
237+
238+
Word 3: "for" │
239+
LSTM starts with h₂ → generates strokes → final state h₃
240+
241+
... and so on for the entire document
242+
```
243+
244+
**Why this matters:** The hidden state carries all the accumulated "writer personality" — slant tendencies, pressure habits, letter-formation quirks, and fatigue. Every word inherits the full writing history, so word 50 naturally looks like it was written by the same hand that wrote word 1 — just a bit more tired.
245+
246+
#### Step 3 — 2D Typewriter Layout Algorithm
247+
248+
A **classical Python layout engine** (outside the ML model) acts like a typewriter to place each generated word on the page:
249+
250+
```
251+
For each generated word chunk:
252+
1. Measure the rendered stroke width of the word
253+
2. Place it at current cursor position (x, y)
254+
3. Advance x by: word_width + random_space(base=10px, noise=±3px)
255+
4. If x > right_margin:
256+
→ Line break: reset x to left_margin + slight_random_offset
257+
→ Shift y down by: line_height + random_noise(±0.5mm)
258+
→ Apply subtle baseline drift to new line
259+
5. If paragraph break detected:
260+
→ Extra y shift (1.2× line height)
261+
→ Apply paragraph indent to x (1.5cm ± natural variation)
262+
```
263+
264+
This keeps all layout logic **deterministic and debuggable** — no ML model is wasting capacity learning where to put spaces and line breaks.
265+
266+
#### Step 4 — Global Baseline Variance (Sine-Wave Drift)
267+
268+
A slow-moving **sinusoidal function** is applied to the y-axis across the entire page, making lines gently curve up and down rather than sitting on perfectly ruled baselines:
269+
270+
```
271+
y_offset(line_n) = A × sin(2π × line_n / period + phase)
272+
273+
Where:
274+
A = amplitude (1–3px) — subtle enough to look natural
275+
period = 8–12 lines — one full wave across ~half a page
276+
phase = random per generation — so no two pages curve the same way
277+
```
278+
279+
This is applied **on top of** the per-line baseline drift from the LSTM, creating two layers of natural variation: the model's own stroke-level jitter, plus a global page-level undulation that mimics how a human's hand position shifts as they write down a page.
280+
159281
**Training Data:** [IAM On-Line Handwriting Database](https://fki.tic.heia-fr.ch/databases/iam-on-line-handwriting-database) — 13,049 texts, 221 writers. Writer-level train/val/test split (80/10/10).
160282

161283
### Backend Stack
@@ -275,11 +397,11 @@ Starts all services (API, Celery worker, Redis, frontend) in one command. Recomm
275397

276398
| Version | Milestone | Key Deliverables | Target |
277399
|---------|-----------|-----------------|--------|
278-
| **v1.0** | MVP — Portfolio Launch | LSTM+MDN model, 5 presets, 7 params, PNG/PDF/SVG export, Canvas preview | Week 4 |
279-
| **v1.5** | Style Transfer | CNN reference encoder, handwriting upload, multi-language (Latin) | Week 10 |
280-
| **v2.0** | Diffusion Upgrade | Diffusion backbone, conditional inpainting, quality leap | Week 18 |
281-
| **v2.5** | API & Integrations | Public REST API, Zapier/Make, Lob.com direct mail, bulk endpoint | Week 26 |
282-
| **v3.0** | Enterprise | White-label SDK, custom fine-tuning, on-premise, SLA | Week 40 |
400+
| **v1.0** | MVP — Long-Form Generation | LSTM+MDN model, full-page output, document layout engine, 5 presets, 7 params, fatigue simulation, PNG/PDF/SVG export, Canvas preview | Week 4 |
401+
| **v1.5** | Style Transfer + Upload | CNN reference encoder, handwriting sample upload for custom style cloning, multi-language (Latin) | Week 10 |
402+
| **v2.0** | Diffusion Upgrade | Diffusion backbone, conditional inpainting, quality leap for ultra-long documents | Week 18 |
403+
| **v2.5** | API & Integrations | Public REST API, bulk generation (100+ letters), Zapier/Make, Lob.com direct mail pipeline | Week 26 |
404+
| **v3.0** | Enterprise | White-label SDK, custom fine-tuning on client handwriting, on-premise, SLA | Week 40 |
283405

284406
---
285407

@@ -317,7 +439,15 @@ This project is licensed under the MIT License — see [LICENSE](LICENSE) for de
317439

318440
---
319441

442+
## How We Differ from Existing Tools
443+
444+
> **Calligrapher AI** and similar tools are impressive single-line demos. They generate a short sentence with style controls — and that's where they stop.
445+
>
446+
> **Inkforge generates documents.** Feed it an entire essay and get back a realistic handwritten manuscript — with natural paragraph breaks, margin awareness, writing fatigue, and the kind of page-level coherence that only comes from treating the document as a whole, not as a collection of independent lines.
447+
448+
---
449+
320450
<p align="center">
321451
<strong>Built with ❤️ using PyTorch · FastAPI · React</strong><br/>
322-
<sub>Inkforge — because handwriting should never be a font.</sub>
452+
<sub>Inkforge — because handwriting should never be a font, and a real letter is more than one line.</sub>
323453
</p>

configs/inference.yaml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,10 @@ model:
99

1010
# --- Generation Defaults ---
1111
generation:
12-
max_seq_len: 2000 # Max stroke sequence length
12+
max_seq_len: 2000 # Max stroke sequence length per line
1313
temperature: 0.4 # Default sampling temperature τ
1414
batch_size: 1 # Single-sample inference
15+
fatigue_simulation: 0.3 # Progressive degradation (0.0–1.0)
1516

1617
# --- Style Presets ---
1718
# Precomputed style embeddings from clustered IAM writers
@@ -39,3 +40,17 @@ rendering:
3940
default_paper_texture: "lined"
4041
default_font_size: "medium"
4142
stroke_width_base: 1.5
43+
44+
# --- Document Layout (Long-Form Generation) ---
45+
document_layout:
46+
default_page_size: "a4" # a4 | us_letter
47+
margin_left_cm: 2.0 # Left margin (with natural drift ±0.1cm)
48+
margin_right_cm: 1.5 # Right margin
49+
margin_top_cm: 2.5 # Top margin
50+
margin_bottom_cm: 2.0 # Bottom margin
51+
line_spacing_mm: 8.0 # Base inter-line spacing (varies ±0.5mm)
52+
paragraph_indent_cm: 1.5 # First-line indent per paragraph (±natural variation)
53+
paragraph_spacing_multiplier: 1.2 # Vertical space between paragraphs (× line height)
54+
max_lines_per_page: 30 # Maximum lines before page break
55+
margin_drift_enabled: true # Left margin drifts naturally (not ruler-straight)
56+
line_spacing_drift_enabled: true # Spacing loosens toward bottom of page

0 commit comments

Comments
 (0)