Status: PARKED (2026-04-29)
A new context-aware Umbraco.AI is in development and the current surface is acknowledged as not the right foundation to build against. Building now would mean rework when the new surface ships.
Pick this up when the new Umbraco.AI is available. First action is the spike (see planning doc). One hour with the new API will tell us whether this plan still fits.
Prerequisite issues to be created when this unparks:
- Targeted destination fields (
targeted: boolean flag on destination.json).
- Web/markdown source consistency with PDF (invert from include-by-default to identify-via-rules).
The pinch point
Setting up a workflow takes hours, and almost all of that time is spent on one thing: identifying source content. Looking at a sample extraction with hundreds of elements, working out which ones are headings vs body vs noise, and writing rules that pick them out reliably across font-size variation. The mapping at the end (this section goes in this field) is trivial in comparison. Any time saved authoring workflows comes from making source identification faster, not from making mapping faster.
The deliverable
A button on the Destination tab that does what the workflow author currently does manually: writes rules to source.json and mappings to map.json. The author marks which destination fields they want populated (a targeted: true flag), clicks "find this in the source", and AI proposes the section name, the rules that pick out the source elements, and the wiring to the destination field. The author reviews the proposed JSON edits and accepts or rejects. AI is just another client of the same JSON files the UI already reads and writes. No new format, no new persistence, no new schema.
Per-field button first (one field at a time, small blast radius). Batch "find all targeted fields" button second, sharing the same backend.
Why this works with what we already have
AI complements the deterministic toolkit differently per source type. Web is the sharpest pain: there is no spatial picker for the DOM, and traversing it to find the right nodes is exactly the tedious task LLMs are good at. PDF is augmentation: the area picker, container overrides and column detection already let the author narrow the search region, and AI identifies elements within that pre-narrowed region with sensible tolerance (fontSizeRange, not fontSizeEquals). Markdown is the lightest touch: content is already structured, so AI is nice-to-have rather than load-bearing. Build order follows the pain: web first.
Runtime is unchanged
AI is a one-shot authoring assistant. Once the workflow is committed, content editors using "Create from Source" hit the same deterministic pipeline as today. No AI calls per document creation, no API key required to run a workflow, no behavioural drift between runs. The runtime stays fast, predictable, fully repeatable, fully auditable. AI only ever runs in Settings, only ever during workflow setup.
Scope
- New
targeted: true flag on destination fields (prerequisite — separate issue).
- Web/markdown consistency change to match PDF's identify-via-rules model (prerequisite — separate issue).
- Per-field "find this" button on Destination tab.
- Batch "find all targeted fields" button on Destination tab.
- Review UI showing proposed JSON edits before commit.
- Optional dependency on Umbraco.AI; package installs and runs without it.
- Web source first, PDF second, markdown third.
Out of scope
- Any AI involvement at content-creation time.
- New persistence formats or AI-specific config files.
- Auto-commit without review.
See planning/AI_SOURCE_IDENTIFICATION.md for full design detail, phasing, risks, and open questions.
The pinch point
Setting up a workflow takes hours, and almost all of that time is spent on one thing: identifying source content. Looking at a sample extraction with hundreds of elements, working out which ones are headings vs body vs noise, and writing rules that pick them out reliably across font-size variation. The mapping at the end (this section goes in this field) is trivial in comparison. Any time saved authoring workflows comes from making source identification faster, not from making mapping faster.
The deliverable
A button on the Destination tab that does what the workflow author currently does manually: writes rules to
source.jsonand mappings tomap.json. The author marks which destination fields they want populated (atargeted: trueflag), clicks "find this in the source", and AI proposes the section name, the rules that pick out the source elements, and the wiring to the destination field. The author reviews the proposed JSON edits and accepts or rejects. AI is just another client of the same JSON files the UI already reads and writes. No new format, no new persistence, no new schema.Per-field button first (one field at a time, small blast radius). Batch "find all targeted fields" button second, sharing the same backend.
Why this works with what we already have
AI complements the deterministic toolkit differently per source type. Web is the sharpest pain: there is no spatial picker for the DOM, and traversing it to find the right nodes is exactly the tedious task LLMs are good at. PDF is augmentation: the area picker, container overrides and column detection already let the author narrow the search region, and AI identifies elements within that pre-narrowed region with sensible tolerance (
fontSizeRange, notfontSizeEquals). Markdown is the lightest touch: content is already structured, so AI is nice-to-have rather than load-bearing. Build order follows the pain: web first.Runtime is unchanged
AI is a one-shot authoring assistant. Once the workflow is committed, content editors using "Create from Source" hit the same deterministic pipeline as today. No AI calls per document creation, no API key required to run a workflow, no behavioural drift between runs. The runtime stays fast, predictable, fully repeatable, fully auditable. AI only ever runs in Settings, only ever during workflow setup.
Scope
targeted: trueflag on destination fields (prerequisite — separate issue).Out of scope
See
planning/AI_SOURCE_IDENTIFICATION.mdfor full design detail, phasing, risks, and open questions.