Add typography and glyph-group reconstruction track

## Goal
Add typography / glyph-group reconstruction as a dedicated Morphēa track: preserve letters, words, and text-like runs as coherent SVG groups without requiring OCR or editable font text.

## Why this matters
Many diagrams, charts, logos, UI screenshots, and brand marks contain text. Morphēa already has `text_like_fragment_group` handling for sparse glyph-sized fallback paths, but the current story is still defensive: identify text-like fragments so they do not count as random structural debt.

A stronger capability would be more useful: if a bitmap contains a word, the output should group the letter shapes together. The letters can remain vector paths. Morphēa does not need to recognize that the word says "Revenue" or "Coca-Cola". It just needs to preserve that these shapes belong together and should move/edit as a unit.

## Scope boundaries
- Do not implement OCR as part of this issue.
- Do not require converting bitmap text back into live `<text>` elements.
- Do not promise font identification.
- Focus on grouping and editability: glyphs -> word groups -> text-line groups.

## Candidate synthetic data
Use generated text-like samples to cover:

- Sans-serif uppercase/lowercase words.
- Serif fonts with thin strokes and small counters.
- Bold, condensed, italic, and rounded fonts.
- Single words, short labels, chart legends, button labels, and logo-like wordmarks.
- Different antialiasing, scale, spacing, kerning, rotation, and contrast.
- Multi-color or outlined text later, after monochrome grouping is stable.

Synthetic text research such as SynthText/SynthTIGER shows that generating text images with bounding boxes is a mature approach. Morphēa can use the simpler version: local font rendering with known glyph/word/line boxes and source SVG paths.

## Proposed pipeline
- Add a text/glyph synthetic corpus generator that renders words from local fonts into PNG plus source SVG/targets.
- Store target boxes and grouping ids for glyph, word, and line levels.
- Extend scene metrics to distinguish:
  - isolated glyph-like fallbacks;
  - coherent word groups;
  - coherent text-line groups;
  - accidental merger with boxes, icons, or chart marks.
- Train/evaluate MLX raster-target models on text grouping targets, not text content.
- Add curated real-image checks only after synthetic grouping works across several font families.

## Acceptance criteria
- Add one synthetic typography smoke with at least sans-serif and serif examples.
- Reports show text-like group counts, glyph/word grouping quality, and accidental merge/split diagnostics.
- Existing UI screenshot text-like handling remains green.
- No OCR dependency is introduced.
- Documentation clearly says text is preserved as grouped vector geometry, not recognized as editable text.

## Notes
This should pair naturally with diagram and chart cases. Text labels can remain bounded/grouped first; later work can decide whether OCR or live text export is worth adding.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add typography and glyph-group reconstruction track #10

Goal

Why this matters

Scope boundaries

Candidate synthetic data

Proposed pipeline

Acceptance criteria

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add typography and glyph-group reconstruction track #10

Description

Goal

Why this matters

Scope boundaries

Candidate synthetic data

Proposed pipeline

Acceptance criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions