AI-assisted source identification when authoring workflows

> **Status: PARKED (2026-04-29)**
>
> A new context-aware Umbraco.AI is in development and the current surface is acknowledged as not the right foundation to build against. Building now would mean rework when the new surface ships.
>
> Pick this up when the new Umbraco.AI is available. First action is the spike (see planning doc). One hour with the new API will tell us whether this plan still fits.
>
> Prerequisite issues to be created when this unparks:
> 1. Targeted destination fields (`targeted: boolean` flag on `destination.json`).
> 2. Web/markdown source consistency with PDF (invert from include-by-default to identify-via-rules).

---

## The pinch point

Setting up a workflow takes hours, and almost all of that time is spent on one thing: **identifying source content**. Looking at a sample extraction with hundreds of elements, working out which ones are headings vs body vs noise, and writing rules that pick them out reliably across font-size variation. The mapping at the end (this section goes in this field) is trivial in comparison. Any time saved authoring workflows comes from making source identification faster, not from making mapping faster.

## The deliverable

A button on the Destination tab that does what the workflow author currently does manually: writes rules to `source.json` and mappings to `map.json`. The author marks which destination fields they want populated (a `targeted: true` flag), clicks "find this in the source", and AI proposes the section name, the rules that pick out the source elements, and the wiring to the destination field. The author reviews the proposed JSON edits and accepts or rejects. AI is just another client of the same JSON files the UI already reads and writes. No new format, no new persistence, no new schema.

Per-field button first (one field at a time, small blast radius). Batch "find all targeted fields" button second, sharing the same backend.

## Why this works with what we already have

AI complements the deterministic toolkit differently per source type. **Web** is the sharpest pain: there is no spatial picker for the DOM, and traversing it to find the right nodes is exactly the tedious task LLMs are good at. **PDF** is augmentation: the area picker, container overrides and column detection already let the author narrow the search region, and AI identifies elements within that pre-narrowed region with sensible tolerance (`fontSizeRange`, not `fontSizeEquals`). **Markdown** is the lightest touch: content is already structured, so AI is nice-to-have rather than load-bearing. Build order follows the pain: web first.

## Runtime is unchanged

AI is a one-shot authoring assistant. Once the workflow is committed, content editors using "Create from Source" hit the same deterministic pipeline as today. No AI calls per document creation, no API key required to run a workflow, no behavioural drift between runs. The runtime stays fast, predictable, fully repeatable, fully auditable. AI only ever runs in Settings, only ever during workflow setup.

## Scope

- New `targeted: true` flag on destination fields *(prerequisite — separate issue)*.
- Web/markdown consistency change to match PDF's identify-via-rules model *(prerequisite — separate issue)*.
- Per-field "find this" button on Destination tab.
- Batch "find all targeted fields" button on Destination tab.
- Review UI showing proposed JSON edits before commit.
- Optional dependency on Umbraco.AI; package installs and runs without it.
- Web source first, PDF second, markdown third.

## Out of scope

- Any AI involvement at content-creation time.
- New persistence formats or AI-specific config files.
- Auto-commit without review.

See `planning/AI_SOURCE_IDENTIFICATION.md` for full design detail, phasing, risks, and open questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI-assisted source identification when authoring workflows #31

The pinch point

The deliverable

Why this works with what we already have

Runtime is unchanged

Scope

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AI-assisted source identification when authoring workflows #31

Description

The pinch point

The deliverable

Why this works with what we already have

Runtime is unchanged

Scope

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions