Skip to content

Add typography and glyph-group reconstruction track #10

@swernerx

Description

@swernerx

Goal

Add typography / glyph-group reconstruction as a dedicated Morphēa track: preserve letters, words, and text-like runs as coherent SVG groups without requiring OCR or editable font text.

Why this matters

Many diagrams, charts, logos, UI screenshots, and brand marks contain text. Morphēa already has text_like_fragment_group handling for sparse glyph-sized fallback paths, but the current story is still defensive: identify text-like fragments so they do not count as random structural debt.

A stronger capability would be more useful: if a bitmap contains a word, the output should group the letter shapes together. The letters can remain vector paths. Morphēa does not need to recognize that the word says "Revenue" or "Coca-Cola". It just needs to preserve that these shapes belong together and should move/edit as a unit.

Scope boundaries

  • Do not implement OCR as part of this issue.
  • Do not require converting bitmap text back into live <text> elements.
  • Do not promise font identification.
  • Focus on grouping and editability: glyphs -> word groups -> text-line groups.

Candidate synthetic data

Use generated text-like samples to cover:

  • Sans-serif uppercase/lowercase words.
  • Serif fonts with thin strokes and small counters.
  • Bold, condensed, italic, and rounded fonts.
  • Single words, short labels, chart legends, button labels, and logo-like wordmarks.
  • Different antialiasing, scale, spacing, kerning, rotation, and contrast.
  • Multi-color or outlined text later, after monochrome grouping is stable.

Synthetic text research such as SynthText/SynthTIGER shows that generating text images with bounding boxes is a mature approach. Morphēa can use the simpler version: local font rendering with known glyph/word/line boxes and source SVG paths.

Proposed pipeline

  • Add a text/glyph synthetic corpus generator that renders words from local fonts into PNG plus source SVG/targets.
  • Store target boxes and grouping ids for glyph, word, and line levels.
  • Extend scene metrics to distinguish:
    • isolated glyph-like fallbacks;
    • coherent word groups;
    • coherent text-line groups;
    • accidental merger with boxes, icons, or chart marks.
  • Train/evaluate MLX raster-target models on text grouping targets, not text content.
  • Add curated real-image checks only after synthetic grouping works across several font families.

Acceptance criteria

  • Add one synthetic typography smoke with at least sans-serif and serif examples.
  • Reports show text-like group counts, glyph/word grouping quality, and accidental merge/split diagnostics.
  • Existing UI screenshot text-like handling remains green.
  • No OCR dependency is introduced.
  • Documentation clearly says text is preserved as grouped vector geometry, not recognized as editable text.

Notes

This should pair naturally with diagram and chart cases. Text labels can remain bounded/grouped first; later work can decide whether OCR or live text export is worth adding.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions