diff --git a/CHANGELOG.md b/CHANGELOG.md
index 50547de..ab47b85 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,12 +6,31 @@ All notable changes to RESYNTH are documented here. The format follows
 
 ## [Unreleased]
 
-### Changed
-- Default model for the Claude Code operator is now Fable 5
-  (`claude-fable-5`), Anthropic's newest and fastest top-tier model;
-  override per workspace in `operator.yaml` as before.
+## [0.2.0] - 2026-06-12
 
 ### Added
+- Source resolution (`resynth resolve`): follows links and file references
+  found inside ingested sources and registers what it fetches as new first
+  class sources with provenance. Five target kinds: html articles, pdf
+  links, local files, YouTube videos and Vimeo videos. Public video
+  captions become timestamped transcripts.
+- Transcript pending stubs: a video without public captions still becomes
+  a real source, and a later successful fetch upgrades the stub in place,
+  keeping its source id.
+- The resolution manifest at `index/resolution.jsonl`: records every
+  target and its outcome so re-runs are idempotent. Fetched and duplicate
+  targets are never retried, failed and pending ones are.
+- Schema v2 source frontmatter: `source_type`, `url`, `resolved_from` and,
+  for video sources, `transcript_status`.
+- Optional `source_locator` on claims, a deep link into the source built
+  from a url, page, timestamp or anchor, validated by `extract-verify`.
+- `resynth migrate`: explicit upgrade of a project's sources to schema v2.
+  Bodies and content hashes are untouched, re-sealing stays a separate
+  operator step.
+- `resynth --version`.
+- MASTER.json format `resynth-master/2` with a sources array, plus a
+  `load_master` reader that accepts both `/1` and `/2`.
+- The guided wizard offers source resolution straight after intake.
 - Completion ping: when a delegated AI step runs longer than 90 seconds,
   RESYNTH plays a sound and shows a desktop notification (Windows toast /
   macOS notification) when it finishes, and again when the master document
@@ -21,6 +40,13 @@ All notable changes to RESYNTH are documented here. The format follows
   the assistant's output streamed as it arrives, instead of a silent prompt
   until completion.
 
+### Changed
+- Default model for the Claude Code operator is now Fable 5
+  (`claude-fable-5`), Anthropic's newest and fastest top-tier model;
+  override per workspace in `operator.yaml` as before.
+- The MASTER.md source register gains Type and Link columns, so every
+  source's kind and origin url are visible in the sealed master.
+
 ### Fixed
 - Sealing failed with "paths are ignored by one of your .gitignore files"
   in workspaces where `projects/*` is gitignored (any workspace cloned from
diff --git a/DECISIONS.md b/DECISIONS.md
index b3f985f..ea92335 100644
--- a/DECISIONS.md
+++ b/DECISIONS.md
@@ -27,3 +27,9 @@ Every architectural decision with a one line rationale.
 - The wired assistant defaults to Claude Code with claude-opus-4-8 at high reasoning effort, adjustable with resynth operator.
 - Delegated operator steps verify against the stage gate and retry up to three times with the gate reasons fed back, then fall back to manual mode.
 - The brief step asks whether reports already exist and skips prompt generation when they do, consolidation only is a first class flow.
+- resolve is a stage 01 verb that re-evaluates the intake gate, not a sixth gate, so already sealed projects never grow a phantom PENDING gate.
+- Fetchers use only the Python standard library (urllib, html.parser, xml.etree), so the four dependency decision holds.
+- Video transcripts are best effort, a pending stub is a real source that upgrades in place and keeps its source id, so claim ids built on it stay stable.
+- Source schema v2 is adopted only through an explicit resynth migrate, never silently, and re-sealing afterwards stays an operator act.
+- MASTER.json format resynth-master/2 ships together with a load_master reader that accepts both /1 and /2, so downstream consumers never break on old exports.
+- Resolution is depth one by design, fetched sources are not scanned for further links unless the operator forces a re-scan with --source.
diff --git a/README.md b/README.md
index 7857c77..ce70980 100644
--- a/README.md
+++ b/README.md
@@ -78,11 +78,47 @@ agent then drives the pipeline below, and the result is one BEST master
 document, readable by humans, verifiable by machine, and exportable as JSON
 for a downstream AI agent to action.
 
+After intake there is one optional extra step. RESYNTH offers to fetch the
+links and file references found inside your reports and register them as
+extra sources, so the things your reports cite become evidence too.
+
 ```
 chat -> brief -> per platform prompts -> research reports -> intake ->
-extract -> reconcile -> synthesise -> audit -> seal -> MASTER.md + MASTER.json
+resolve (optional) -> extract -> reconcile -> synthesise -> audit -> seal ->
+MASTER.md + MASTER.json
+```
+
+## Fetching linked sources
+
+Research reports cite things. `resynth resolve <project>` follows those
+citations and turns them into first class sources of their own. It scans
+every ingested report for links and file references, fetches each one, and
+registers the result with provenance back to the report that mentioned it.
+It handles html articles, pdf links, local files, and YouTube and Vimeo
+videos. Public video captions become timestamped transcripts.
+
+When a video has no public captions, resolve still creates the source as a
+pending transcript stub. You can paste a transcript into the stub yourself,
+or re-run resolve later to retry. A later successful fetch upgrades the stub
+in place and keeps the same source id.
+
+Resolution is idempotent. Every outcome is recorded in
+`index/resolution.jsonl`, fetched and duplicate targets are never fetched
+twice, and failed or pending targets are retried on the next run. Fetching
+is polite: robots.txt is honoured, requests to the same host are spaced one
+second apart, and responses are capped at 10 MiB. Resolution goes one level
+deep only. Fetched sources are not scanned for further links unless you
+force a re-scan with `--source`.
+
+```
+resynth resolve myproject
+resynth resolve myproject --only youtube    # just targets matching a substring
+resynth resolve myproject --source S03      # re-scan one source, even a fetched one
 ```
 
+The full reference, including the manifest format, the source schema and
+the migration guide, lives in [docs/SOURCE-RESOLUTION.md](docs/SOURCE-RESOLUTION.md).
+
 ## Install
 
 ```
@@ -135,6 +171,7 @@ scripts/run_demo.py and in the end to end test.
 resynth init <project>            create project skeleton plus default merge-rules.yaml
 resynth brief <project> --topic   capture the research question, generate the prompt workspace
 resynth intake <project> --source <file> ...   stage 1, repeatable per file
+resynth resolve <project>         fetch links and file references inside sources as new first class sources
 resynth extract <project>         stage 2 workspace generation
 resynth extract-verify <project>  stage 2 gate
 resynth reconcile <project>       stage 3, also evaluates the gate
@@ -144,6 +181,7 @@ resynth audit <project>           stage 5 coverage, drift, traceability
 resynth seal <project>            hash everything, commit SEAL.yaml, tag the repo
 resynth export <project>          machine readable output/MASTER.json for agents
 resynth status <project>          gate dashboard
+resynth migrate <project>         upgrade a project's sources to the current schema (v2)
 resynth operator                  show or set the wired AI assistant, model and effort
 resynth doctor                    environment probe
 ```
@@ -173,6 +211,11 @@ Then run: resynth extract-verify <project> --json and fix every violation
 until the gate reports PASS.
 ```
 
+Each claim may also carry an optional `source_locator`, a deep link into the
+source built from a url, a page number, a timestamp or an anchor. The full
+claim and source schemas live in
+[docs/SOURCE-RESOLUTION.md](docs/SOURCE-RESOLUTION.md).
+
 ### Agent prompt for stage 3, reconciliation
 
 ```
diff --git a/docs/SOURCE-RESOLUTION.md b/docs/SOURCE-RESOLUTION.md
new file mode 100644
index 0000000..ce4b362
--- /dev/null
+++ b/docs/SOURCE-RESOLUTION.md
@@ -0,0 +1,280 @@
+# RESYNTH Source Resolution
+
+> [!abstract] Purpose
+> The full reference for `resynth resolve` and everything it touches: the
+> resolution flow, the manifest, transcript handling, source frontmatter
+> schema v2, the claim `source_locator`, MASTER.json formats and the
+> migration guide for pre 0.2.0 projects.
+
+## What resolve does
+
+Research reports cite things, and `resynth resolve <project>` turns those
+citations into evidence. It scans every ingested source for links and file
+references, fetches each one over the network or from disk, and registers
+the result as a new first class source with provenance back to the source
+that mentioned it. Fetched sources carry the same frontmatter, content hash
+and gate checks as any hand ingested report, so everything downstream of
+intake treats them identically. Resolve is a stage 1 verb. It re-evaluates
+gate 01-intake when it finishes and adds no gate of its own.
+
+## The resolution flow
+
+1. **Discover.** Every source without a `resolved_from` parent is scanned.
+   Targets come from markdown link destinations, bare urls and backtick
+   spans. A local path is only accepted when it has a supported suffix
+   (.md .txt .docx .pdf) and the file actually exists, either as an
+   absolute path or relative to the folder the parent source came from.
+   Nothing else is guessed.
+2. **Classify.** Each target becomes one of four kinds: `youtube` and
+   `vimeo` by hostname, `local` for an existing file on disk, and `url`
+   for every other http or https link.
+3. **Fetch.** The matching fetcher retrieves the content. Web fetching
+   respects the etiquette rules below. Failures are recorded, never fatal.
+4. **Register.** The fetched content goes through the same registration as
+   intake: it is hashed, deduplicated against every existing source, given
+   the next free source id, and written to `sources/` with schema v2
+   frontmatter including `resolved_from` set to the parent source id.
+5. **Manifest.** Every outcome is written to `index/resolution.jsonl` so
+   the next run knows what to skip and what to retry. Gate 01-intake is
+   then re-evaluated.
+
+Resolution is depth one by design. Fetched sources are never scanned for
+further links on a normal run. To go deeper deliberately, name the fetched
+source explicitly: `resynth resolve <project> --source S04`.
+
+## Supported targets
+
+| Kind | What is fetched | Resulting source_type |
+| --- | --- | --- |
+| html page | readable article text, reduced from the main or article region, navigation and boilerplate dropped | html-article |
+| pdf link | the pdf body converted with pdftotext (detected by content type or a .pdf path) | pdf |
+| local file | the file converted exactly as intake would convert it | pdf for .pdf, notes otherwise |
+| YouTube video | the public caption track as a timestamped transcript, English preferred | video-transcript |
+| Vimeo video | the public text track (WebVTT) as a timestamped transcript, English preferred | video-transcript |
+
+An html page that yields fewer than 200 characters of text fails with
+`page yielded no extractable text (login wall or script rendered)`. Any
+other content type fails with `unsupported content type`.
+
+## Network etiquette
+
+| Rule | Value |
+| --- | --- |
+| User agent | `resynth/<version> (+https://github.com/Markus-Doc/resynth) research consolidation tool` |
+| robots.txt | honoured per host, a disallowed url fails with `disallowed by robots.txt` |
+| Rate limit | at most one request per second to the same host |
+| Timeout | 30 seconds per request |
+| Size cap | 10 MiB per response, larger responses fail with `response exceeds 10 MiB` |
+
+Fetching uses only the Python standard library. There are no extra
+dependencies and no API keys.
+
+## The resolution manifest
+
+`index/resolution.jsonl` holds one JSON object per discovered target.
+Lines starting with `#` are comments. The target string is the key, so a
+target keeps a single record across runs.
+
+| Field | Meaning |
+| --- | --- |
+| target | the discovered url or absolute local path |
+| kind | url, local, youtube or vimeo |
+| status | fetched, duplicate, transcript_pending or failed |
+| source_id | the source the target became, null when no source exists yet |
+| resolved_from | the source id the target was discovered in |
+| sha256 | content hash of the registered body, null on failure |
+| fetched_at | ISO date the record last changed |
+| note | the short failure reason, null otherwise |
+
+Example line:
+
+```
+{"target": "https://example.com/articles/pipeline-reliability", "kind": "url", "status": "fetched", "source_id": "S04", "resolved_from": "S01", "sha256": "3f8c0d2ab1...", "fetched_at": "2026-06-12", "note": null}
+```
+
+Retry semantics are simple. `fetched` and `duplicate` are terminal, those
+targets are reported as cached and never fetched again. `failed` and
+`transcript_pending` are retried on every run. A re-run that changes
+nothing rewrites each record byte for byte, including its original
+`fetched_at` date, so unchanged projects stay diff clean.
+
+## Transcript handling
+
+For videos, resolve tries the platform's public caption sources. On
+YouTube that is the timedtext caption listing, preferring a track whose
+language code starts with `en`, otherwise the first track. On Vimeo it is
+the player text tracks fetched as WebVTT, with the same English preference.
+Cues become a `## Transcript` section of timestamped lines, for example
+`[00:14:32] ...`, with a paragraph break wherever the audio gaps for more
+than eight seconds.
+
+When no public captions exist, the video still becomes a real source: a
+pending stub with `transcript_status: pending` and this body.
+
+```
+# {title}
+
+> [!info] Video transcript pending
+> RESYNTH could not retrieve a public caption track for this video.
+> Link: {url}
+> Re-run `resynth resolve <project>` to retry, or paste the transcript
+> below this callout. The next resolve run can also upgrade this stub.
+```
+
+Re-running resolve retries pending stubs. When captions have appeared, the
+stub is upgraded in place: the same file gains the fetched transcript, a
+fresh `sha256` and `transcript_status: fetched`, while `source_id`,
+`date_ingested`, `recency_rank` and `resolved_from` are preserved. Because
+the source id never changes, any claim ids already extracted against the
+stub stay stable.
+
+To paste a transcript yourself, open the stub under `sources/`, paste the
+transcript below the callout, set `transcript_status: fetched` and update
+the `sha256` field to the SHA-256 hex digest of the new body (everything
+after the closing `---` line). Gate 01 reports `sha256 does not match body
+content` until the hash is correct, so the gate tells you when it is done.
+A wired AI assistant can make these edits for you. Note that a later
+resolve run will replace a pasted body if the platform fetch succeeds, so
+remove the manifest line for that target if you want your paste to stand.
+
+To force a refresh of a target already recorded as `fetched` or
+`duplicate`, delete its line from `index/resolution.jsonl` and remove the
+fetched source file from `sources/`, then re-run resolve. The target is
+discovered again and fetched fresh under a new source id. Deleting only
+the manifest line is not enough, the re-fetch would deduplicate against
+the existing file and record a `duplicate`.
+
+## Source frontmatter schema v2
+
+Every source written by RESYNTH 0.2.0 carries these fields.
+
+| Field | Since | Type | Meaning |
+| --- | --- | --- | --- |
+| source_id | v1 | string SNN | stable id, S01, S02 and so on |
+| title | v1 | string | first heading of the body, or the file or page title |
+| origin | v1 | string | the path or url the content came from |
+| author_or_tool | v1 | string | author, channel or generating tool, unknown when unstated |
+| date_authored | v1 | string | ISO date when known, otherwise unknown |
+| date_ingested | v1 | string | ISO date the source entered the project |
+| authority_tier | v1 | enum | primary, secondary, tertiary or unknown |
+| recency_rank | v1 | integer | intake order, used as a tie breaker |
+| sha256 | v1 | string | SHA-256 of the body, verified by gate 01 and the audit |
+| schema_version | v2 | integer | always 2 |
+| source_type | v2 | enum | one of the source types below |
+| url | v2 | string or null | the canonical url for fetched web content |
+| resolved_from | v2 | string or null | the source id this source was resolved out of |
+| transcript_status | v2 | enum | fetched or pending, present only on video-transcript sources |
+
+`source_type` is one of: `report`, `html-article`, `pdf`,
+`video-transcript`, `webinar`, `study-notes`, `dataset`, `notes`, `other`.
+A `resolved_from` value must name a source that exists in the project.
+
+A full example, a Vimeo transcript resolved out of report S01:
+
+```
+---
+source_id: S03
+title: Designing Reliable Pipelines
+origin: https://vimeo.com/76979871
+author_or_tool: Conference Channel
+date_authored: '2024-03-18'
+date_ingested: '2026-06-12'
+authority_tier: unknown
+recency_rank: 3
+sha256: 3f8c0d2ab1e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0
+schema_version: 2
+source_type: video-transcript
+url: https://vimeo.com/76979871
+resolved_from: S01
+transcript_status: fetched
+---
+# Designing Reliable Pipelines
+
+## Transcript
+
+[00:00:00] Welcome to the talk.
+```
+
+## Claim source_locator
+
+Claims may carry one optional field beyond the required schema: a
+`source_locator` object that deep links the claim into its source.
+
+| Key | Type | Meaning |
+| --- | --- | --- |
+| url | non empty string | the url the claim is anchored to |
+| page | positive integer | page number, for pdf sources |
+| timestamp | string H:MM or HH:MM:SS | position in a video transcript |
+| anchor | non empty string | a section slug or fragment identifier |
+
+Validation rules, enforced by `resynth extract-verify`:
+
+- `source_locator` must be an object with at least one of the four keys.
+- No other keys are allowed.
+- Each present key must match the type rules above.
+- A claim against a video-transcript source without a timestamp draws a
+  warning, not a failure.
+- A locator url that differs from the source's own `url` draws a warning.
+
+Example claim line:
+
+```
+{"claim_id": "S03-C002", "source_id": "S03", "claim_text": "Retry queues should cap at three attempts before alerting.", "claim_type": "recommendation", "topic_tags": ["reliability"], "supporting_quote_location": "Transcript at 14:32", "confidence_as_stated": "high", "depends_on": [], "source_locator": {"url": "https://vimeo.com/76979871", "timestamp": "00:14:32"}}
+```
+
+## MASTER.json formats
+
+`resynth export` writes format `resynth-master/2`. The only difference
+from `resynth-master/1` is a top level `sources` array carrying every
+source's frontmatter in a uniform v2 shape, sorted by source id. Sources
+that were never migrated appear with `schema_version` 1 and defaults of
+`source_type` report, `url` null and `resolved_from` null.
+
+Downstream consumers should read the file through `load_master`, which
+accepts both formats:
+
+```python
+from pathlib import Path
+from resynth.export import load_master
+
+master = load_master(Path("projects/myproject/output/MASTER.json"))
+master["format_version"]   # 1 or 2
+master["sources"]          # always present, empty for a /1 file
+```
+
+Any other format tag raises an error rather than guessing.
+
+## Migration guide
+
+Projects sealed before 0.2.0 hold schema v1 sources. They keep working as
+they are. Gate 01 reports a warning, not a failure, and suggests the
+migration. Upgrading is always an explicit act:
+
+```
+resynth migrate <project>
+```
+
+What migrate changes: each v1 source's frontmatter gains
+`schema_version: 2`, a `source_type` (pdf when the origin ends in .pdf,
+otherwise report), `url: null` and `resolved_from: null`.
+
+What it never touches: source bodies, the stored `sha256` (it hashes the
+body only, so it stays valid), claims, the index, the output, the seal
+file and the git tags. Migration is idempotent, sources already on v2 are
+reported as `already schema v2` and left alone.
+
+One consequence needs care. The seal hashes whole files, frontmatter
+included, so after migration the existing `SEAL.yaml` no longer matches
+the source files. The sealed git tag still pins the old state exactly.
+Re-sealing is deliberately left to the operator, because a seal is a
+statement that a human or agent verified the project at that point. The
+worked sequence for a sealed project is:
+
+```
+resynth migrate myproject --dry-run    # see what would change
+resynth migrate myproject
+resynth audit myproject
+resynth seal myproject                 # produces the next version tag
+```
+
+The new tag pins the migrated state and the old tag remains as history.
diff --git a/pyproject.toml b/pyproject.toml
index 4c3558e..427562d 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "resynth"
-version = "0.1.0"
+version = "0.2.0"
 description = "CLI research consolidation platform built on systematic review gates"
 readme = "README.md"
 requires-python = ">=3.11"
diff --git a/src/resynth/__init__.py b/src/resynth/__init__.py
index b04e0f8..cb0f565 100644
--- a/src/resynth/__init__.py
+++ b/src/resynth/__init__.py
@@ -4,4 +4,4 @@
 AI agent, supplies judgement. RESYNTH has zero runtime AI dependency.
 """
 
-__version__ = "0.1.0"
+__version__ = "0.2.0"
diff --git a/src/resynth/cli.py b/src/resynth/cli.py
index b3b225e..e6e7266 100644
--- a/src/resynth/cli.py
+++ b/src/resynth/cli.py
@@ -10,13 +10,16 @@
 from rich.console import Console
 from rich.table import Table
 
+from . import __version__
 from . import audit as audit_mod
 from . import config
 from . import doctor as doctor_mod
 from . import export as export_mod
 from . import extract as extract_mod
+from . import migrate as migrate_mod
 from . import project as project_mod
 from . import reconcile as reconcile_mod
+from . import resolve as resolve_mod
 from . import synthesise as synth_mod
 from .errors import ResynthError
 from .gates import all_gates
@@ -105,6 +108,7 @@ def main(self, *args, standalone_mode=True, **kwargs):
 
 
 @click.group(cls=_GuardedGroup, invoke_without_command=True)
+@click.version_option(version=__version__, prog_name="resynth")
 @click.pass_context
 def main(ctx):
     """RESYNTH, research consolidation with systematic review gates.
@@ -151,6 +155,34 @@ def intake(project, sources, as_json, dry_run):
     _run("intake", project, as_json, dry_run, run_intake, project, list(sources), dry_run=dry_run)
 
 
+@main.command()
+@click.argument("project")
+@click.option("--source", "source_ids", multiple=True, help="Scan only these source ids (allows re-scanning resolved sources).")
+@click.option("--only", default=None, help="Only targets containing this substring.")
+@common
+def resolve(project, source_ids, only, as_json, dry_run):
+    """Fetch links and file references inside sources as new first class sources."""
+    _run(
+        "resolve",
+        project,
+        as_json,
+        dry_run,
+        resolve_mod.run_resolve,
+        project,
+        only=only,
+        source_ids=list(source_ids) or None,
+        dry_run=dry_run,
+    )
+
+
+@main.command()
+@click.argument("project")
+@common
+def migrate(project, as_json, dry_run):
+    """Upgrade a project's sources to the current schema (v2). Re-seal is a separate step."""
+    _run("migrate", project, as_json, dry_run, migrate_mod.run_migrate, project, dry_run=dry_run)
+
+
 @main.command()
 @click.argument("project")
 @common
diff --git a/src/resynth/export.py b/src/resynth/export.py
index 83d3b9f..e649d72 100644
--- a/src/resynth/export.py
+++ b/src/resynth/export.py
@@ -1,14 +1,37 @@
-"""Machine readable export of the sealed master for downstream AI agents."""
+"""Machine readable export of the sealed master for downstream AI agents.
+
+Downstream consumers read the file back with :func:`load_master`, which
+accepts both the resynth-master/1 and resynth-master/2 payload formats.
+"""
 
 from __future__ import annotations
 
 import json
+from pathlib import Path
 
 from . import config
+from .errors import ResynthError
 from .fsutil import safe_write
 from .gates import require_gate
+from .intake import load_sources
 from .synthesise import _plan, _split_sections
 
+FORMAT_V1 = "resynth-master/1"
+FORMAT_V2 = "resynth-master/2"
+
+
+def _export_sources(pdir: Path) -> list[dict]:
+    """Source frontmatter dicts in a uniform v2 shape, sorted by source_id."""
+    out = []
+    for fm in load_sources(pdir):
+        src = {k: v for k, v in fm.items() if k not in {"_file", "_body"}}
+        src.setdefault("schema_version", 1)
+        src.setdefault("source_type", "report")
+        src.setdefault("url", None)
+        src.setdefault("resolved_from", None)
+        out.append(src)
+    return sorted(out, key=lambda s: s["source_id"])
+
 
 def run_export(project: str, dry_run: bool = False) -> dict:
     pdir = config.project_dir(project)
@@ -20,8 +43,9 @@ def run_export(project: str, dry_run: bool = False) -> dict:
     ]
     payload = {
         "project": project,
-        "format": "resynth-master/1",
+        "format": FORMAT_V2,
         "sections": sections,
+        "sources": _export_sources(pdir),
         "claims": sorted(plan["claims"].values(), key=lambda c: c["claim_id"]),
         "decisions": plan["decisions"],
         "winning_claims": plan["winners"],
@@ -36,3 +60,17 @@ def run_export(project: str, dry_run: bool = False) -> dict:
         "events": [{"file": "MASTER.json", "action": outcome}],
         "messages": [f"output/MASTER.json: {outcome}"],
     }
+
+
+def load_master(path: Path) -> dict:
+    """Read a MASTER.json of format resynth-master/1 or /2."""
+    data = json.loads(Path(path).read_text(encoding="utf-8"))
+    tag = data.get("format") if isinstance(data, dict) else None
+    if tag == FORMAT_V1:
+        data.setdefault("sources", [])
+        data["format_version"] = 1
+    elif tag == FORMAT_V2:
+        data["format_version"] = 2
+    else:
+        raise ResynthError(f"unsupported master format {tag}")
+    return data
diff --git a/src/resynth/extract.py b/src/resynth/extract.py
index 555dffb..6341c8f 100644
--- a/src/resynth/extract.py
+++ b/src/resynth/extract.py
@@ -28,6 +28,9 @@
     "confidence_as_stated",
     "depends_on",
 }
+OPTIONAL_FIELDS = {"source_locator"}
+LOCATOR_KEYS = {"url", "page", "timestamp", "anchor"}
+TIMESTAMP_RE = re.compile(r"^\d{1,2}:\d{2}(:\d{2})?$")
 COVERAGE_MIN_BYTES = 2048
 COVERAGE_MIN_CLAIMS = 3
 
@@ -61,6 +64,8 @@ def _workspace_header(sid: str) -> str:
         f"# One JSON object per line. Lines starting with # are ignored.\n"
         f"# Schema template, copy the line below, remove the leading #, fill it in:\n"
         f"# {example}\n"
+        f'# optional: "source_locator": {{"url": "https://...", "page": 12, '
+        f'"timestamp": "00:14:32", "anchor": "section-slug"}}\n'
     )
 
 
@@ -94,10 +99,32 @@ def run_extract(project: str, dry_run: bool = False) -> dict:
     }
 
 
+def _validate_locator(loc) -> list[str]:
+    if not isinstance(loc, dict):
+        return ["source_locator must be an object"]
+    errors = []
+    if not loc:
+        errors.append("source_locator must have at least one of url, page, timestamp, anchor")
+    errors.extend(f"unknown source_locator key {k}" for k in sorted(loc.keys() - LOCATOR_KEYS))
+    if "url" in loc and (not isinstance(loc["url"], str) or not loc["url"].strip()):
+        errors.append("source_locator.url must be a non-empty string")
+    if "page" in loc and (
+        not isinstance(loc["page"], int) or isinstance(loc["page"], bool) or loc["page"] < 1
+    ):
+        errors.append("source_locator.page must be a positive integer")
+    if "timestamp" in loc and (
+        not isinstance(loc["timestamp"], str) or not TIMESTAMP_RE.match(loc["timestamp"])
+    ):
+        errors.append("source_locator.timestamp must look like H:MM or HH:MM:SS")
+    if "anchor" in loc and (not isinstance(loc["anchor"], str) or not loc["anchor"].strip()):
+        errors.append("source_locator.anchor must be a non-empty string")
+    return errors
+
+
 def validate_claim(obj: dict, sid: str) -> list[str]:
     errors = []
     missing = REQUIRED_FIELDS - obj.keys()
-    extra = obj.keys() - REQUIRED_FIELDS
+    extra = obj.keys() - REQUIRED_FIELDS - OPTIONAL_FIELDS
     errors.extend(f"missing field {f}" for f in sorted(missing))
     errors.extend(f"unknown field {f}" for f in sorted(extra))
     if missing or extra:
@@ -134,6 +161,8 @@ def validate_claim(obj: dict, sid: str) -> list[str]:
         isinstance(d, str) and CLAIM_ID_RE.match(d) for d in deps
     ):
         errors.append("depends_on must be a list of claim ids in SNN-CNNN format")
+    if "source_locator" in obj:
+        errors.extend(_validate_locator(obj["source_locator"]))
     return errors
 
 
@@ -168,6 +197,8 @@ def run_extract_verify(project: str, dry_run: bool = False) -> dict:
             reasons.append(f"{sid}: claims file missing, run resynth extract")
             continue
         count = 0
+        src_type = fm.get("source_type")
+        src_url = fm.get("url")
         for lineno, _raw, obj, err in iter_jsonl(path):
             where = f"{path.name}:{lineno}"
             if err:
@@ -181,6 +212,12 @@ def run_extract_verify(project: str, dry_run: bool = False) -> dict:
                     reasons.append(f"{where}: duplicate claim_id {cid}, first seen {seen_ids[cid]}")
                 else:
                     seen_ids[cid] = where
+            loc = obj.get("source_locator")
+            loc = loc if isinstance(loc, dict) else {}
+            if src_type == "video-transcript" and not loc.get("timestamp"):
+                warnings.append(f"{cid}: video source claim without a timestamp locator")
+            if src_url and loc.get("url") and loc["url"] != src_url:
+                warnings.append(f"{cid}: locator url does not match the source url")
             count += 1
             all_claims.append(obj)
         claims_by_source[sid] = count
diff --git a/src/resynth/intake.py b/src/resynth/intake.py
index 83be453..aa3c1a7 100644
--- a/src/resynth/intake.py
+++ b/src/resynth/intake.py
@@ -29,6 +29,22 @@
 AUTHORITY_TIERS = {"primary", "secondary", "tertiary", "unknown"}
 SUPPORTED = {".md", ".txt", ".docx", ".pdf"}
 
+SCHEMA_VERSION = 2
+SOURCE_TYPES = {
+    "report",
+    "html-article",
+    "pdf",
+    "video-transcript",
+    "webinar",
+    "study-notes",
+    "dataset",
+    "notes",
+    "other",
+}
+TRANSCRIPT_STATUSES = {"fetched", "pending"}
+RESOLVED_FROM_RE = re.compile(r"^S\d{2}$")
+V2_FIELDS = ["schema_version", "source_type", "url", "resolved_from"]
+
 
 def slugify(name: str) -> str:
     slug = re.sub(r"[^a-z0-9]+", "-", name.lower()).strip("-")
@@ -76,12 +92,71 @@ def load_sources(pdir: Path) -> list[dict]:
     return out
 
 
-def _frontmatter_block(fm: dict) -> str:
+def frontmatter_block(fm: dict) -> str:
+    """Render the YAML frontmatter body with keys in canonical order."""
     import yaml
 
-    ordered = {k: fm[k] for k in FRONTMATTER_FIELDS}
-    block = yaml.safe_dump(ordered, sort_keys=False, allow_unicode=True, default_flow_style=False)
-    return f"---\n{block}---\n"
+    keys = [*FRONTMATTER_FIELDS, *V2_FIELDS]
+    if "transcript_status" in fm:
+        keys.append("transcript_status")
+    ordered = {k: fm[k] for k in keys if k in fm}
+    return yaml.safe_dump(ordered, sort_keys=False, allow_unicode=True, default_flow_style=False)
+
+
+def register_source(
+    pdir: Path,
+    body: str,
+    *,
+    title: str,
+    origin: str,
+    source_type: str = "report",
+    url: str | None = None,
+    resolved_from: str | None = None,
+    author_or_tool: str = "unknown",
+    date_authored: str = "unknown",
+    transcript_status: str | None = None,
+    dry_run: bool = False,
+) -> dict:
+    """Number, dedup and write a source file with schema-v2 frontmatter."""
+    digest = sha256_text(body)
+    existing = load_sources(pdir)
+    for prior in existing:
+        if prior.get("sha256") == digest:
+            return {
+                "action": "duplicate",
+                "source_id": prior["source_id"],
+                "file": prior["_file"],
+                "sha256": digest,
+            }
+    numbers = [
+        int(m.group(1))
+        for f in (pdir / "sources").glob("S*.md")
+        if (m := re.match(r"^S(\d+)", f.name))
+    ]
+    n = max(numbers, default=0) + 1
+    sid = f"S{n:02d}"
+    fm = {
+        "source_id": sid,
+        "title": title,
+        "origin": origin,
+        "author_or_tool": author_or_tool,
+        "date_authored": date_authored,
+        "date_ingested": date.today().isoformat(),
+        "authority_tier": "unknown",
+        "recency_rank": n,
+        "sha256": digest,
+        "schema_version": SCHEMA_VERSION,
+        "source_type": source_type,
+        "url": url,
+        "resolved_from": resolved_from,
+    }
+    if transcript_status is not None:
+        fm["transcript_status"] = transcript_status
+    dest = pdir / "sources" / f"{sid}-{slugify(title)}.md"
+    if dry_run:
+        return {"action": "dry-run", "source_id": sid, "file": dest.name, "sha256": digest}
+    safe_write(dest, f"---\n{frontmatter_block(fm)}---\n" + body, pdir)
+    return {"action": "created", "source_id": sid, "file": dest.name, "sha256": digest}
 
 
 def _title_of(body: str, fallback: str) -> str:
@@ -91,35 +166,72 @@ def _title_of(body: str, fallback: str) -> str:
     return fallback
 
 
+def _check_v2(fm: dict, known_ids: set) -> list[str]:
+    problems = []
+    stype = fm.get("source_type")
+    if stype and stype not in SOURCE_TYPES:
+        problems.append(f"invalid source_type '{stype}'")
+    for key in ("url", "resolved_from"):
+        if key not in fm:
+            problems.append(f"missing frontmatter field {key}")
+    ref = fm.get("resolved_from")
+    if ref is not None:
+        if not isinstance(ref, str) or not RESOLVED_FROM_RE.match(ref):
+            problems.append(f"invalid resolved_from '{ref}'")
+        elif ref not in known_ids:
+            problems.append(f"resolved_from references unknown source {ref}")
+    if "transcript_status" in fm:
+        if fm["transcript_status"] not in TRANSCRIPT_STATUSES:
+            problems.append(f"invalid transcript_status '{fm['transcript_status']}'")
+        if stype != "video-transcript":
+            problems.append("transcript_status only allowed for video-transcript sources")
+    return problems
+
+
 def check_intake_gate(pdir: Path, dry_run: bool = False) -> dict:
     reasons: list[str] = []
+    warnings: list[str] = []
     checks: dict = {"sources": {}}
     sources = load_sources(pdir)
+    known_ids = {fm.get("source_id") for fm in sources}
     if not sources:
         reasons.append("no sources ingested")
+    legacy = 0
     for fm in sources:
         sid = fm.get("source_id", fm["_file"])
         problems = []
-        for field in FRONTMATTER_FIELDS:
+        version = fm.get("schema_version")
+        if "schema_version" not in fm:
+            legacy += 1
+        elif version != SCHEMA_VERSION:
+            problems.append(f"unsupported schema_version {version}")
+        required = list(FRONTMATTER_FIELDS)
+        if version == SCHEMA_VERSION:
+            required.append("source_type")
+        for field in required:
             if field not in fm or fm[field] in (None, ""):
                 problems.append(f"missing frontmatter field {field}")
         tier = fm.get("authority_tier")
         if tier and tier not in AUTHORITY_TIERS:
             problems.append(f"invalid authority_tier '{tier}'")
+        if version == SCHEMA_VERSION:
+            problems.extend(_check_v2(fm, known_ids))
         actual = sha256_text(fm["_body"])
         if fm.get("sha256") != actual:
             problems.append("sha256 does not match body content")
         checks["sources"][sid] = "ok" if not problems else problems
         reasons.extend(f"{sid}: {p}" for p in problems)
+    if legacy:
+        warnings.append(
+            f"{legacy} source(s) use the pre 0.2.0 schema, run: resynth migrate {pdir.name}"
+        )
     checks["source_count"] = len(sources)
-    return write_gate(pdir, "01-intake", reasons, checks, dry_run=dry_run)
+    return write_gate(pdir, "01-intake", reasons, checks, warnings=warnings, dry_run=dry_run)
 
 
 def run_intake(project: str, source_paths: list[str], dry_run: bool = False) -> dict:
     pdir = config.project_dir(project)
-    existing = load_sources(pdir)
-    by_hash = {fm["sha256"]: fm["source_id"] for fm in existing}
-    next_n = len(existing) + 1
+    by_hash = {fm["sha256"]: fm["source_id"] for fm in load_sources(pdir)}
     events = []
     for raw in source_paths:
         src = Path(raw)
@@ -141,23 +253,18 @@ def run_intake(project: str, source_paths: list[str], dry_run: bool = False) ->
                 }
             )
             continue
-        sid = f"S{next_n:02d}"
-        fm = {
-            "source_id": sid,
-            "title": _title_of(body, src.stem),
-            "origin": str(src),
-            "author_or_tool": "unknown",
-            "date_authored": "unknown",
-            "date_ingested": date.today().isoformat(),
-            "authority_tier": "unknown",
-            "recency_rank": next_n,
-            "sha256": digest,
-        }
-        dest = pdir / "sources" / f"{sid}-{slugify(src.stem)}.md"
-        outcome = safe_write(dest, _frontmatter_block(fm) + body, pdir, dry_run=dry_run)
-        events.append({"source": src.name, "action": outcome, "source_id": sid})
-        by_hash[digest] = sid
-        next_n += 1
+        result = register_source(
+            pdir,
+            body,
+            title=_title_of(body, src.stem),
+            origin=str(src),
+            source_type="pdf" if src.suffix.lower() == ".pdf" else "report",
+            dry_run=dry_run,
+        )
+        events.append(
+            {"source": src.name, "action": result["action"], "source_id": result["source_id"]}
+        )
+        by_hash[digest] = result["source_id"]
     gate = check_intake_gate(pdir, dry_run=dry_run)
     return {
         "ok": gate["status"] == "PASS",
diff --git a/src/resynth/migrate.py b/src/resynth/migrate.py
new file mode 100644
index 0000000..ef5adc9
--- /dev/null
+++ b/src/resynth/migrate.py
@@ -0,0 +1,67 @@
+"""Project upgrader for RESYNTH v0.2.0. Rewrites schema v1 source
+frontmatter to schema v2 in place, preserving each source body byte for
+byte so the stored content hashes remain valid."""
+
+from __future__ import annotations
+
+from . import config, intake
+from .errors import ResynthError
+from .fsutil import parse_frontmatter, safe_write
+
+
+def run_migrate(project: str, dry_run: bool = False) -> dict:
+    """Upgrade every schema v1 source in a project to schema v2.
+
+    Adds schema_version, source_type, url and resolved_from to the
+    frontmatter and leaves the body untouched, so the stored content hash
+    stays valid. Sources already on v2 are left unchanged, which makes
+    the command idempotent. The seal is never touched here, re-sealing
+    is a separate operator act. Returns ok, gate, events and messages.
+    """
+    pdir = config.project_dir(project)
+    files = sorted((pdir / "sources").glob("S*.md"))
+    if not files:
+        raise ResynthError("no sources to migrate, run resynth intake first")
+    events: list[dict] = []
+    messages: list[str] = []
+    upgraded = 0
+    for f in files:
+        fm, body = parse_frontmatter(f.read_text(encoding="utf-8"), f.name)
+        sid = fm.get("source_id", f.name)
+        if "schema_version" in fm:
+            events.append({"source": sid, "action": "unchanged"})
+            messages.append(f"{sid}: already schema v2")
+            continue
+        origin = str(fm.get("origin", ""))
+        stype = "pdf" if origin.lower().endswith(".pdf") else "report"
+        fm["schema_version"] = intake.SCHEMA_VERSION
+        fm["source_type"] = stype
+        fm["url"] = None
+        fm["resolved_from"] = None
+        content = f"---\n{intake.frontmatter_block(fm)}---\n" + body
+        outcome = safe_write(f, content, pdir, dry_run=dry_run)
+        events.append({"source": sid, "action": outcome, "source_type": stype})
+        if outcome == "dry-run":
+            messages.append(f"{sid}: would upgrade to schema v2 (source_type {stype})")
+        else:
+            upgraded += 1
+            messages.append(f"{sid}: upgraded to schema v2 (source_type {stype})")
+    if upgraded:
+        messages.extend(
+            [
+                "Frontmatter has changed, so the existing seal no longer matches these files.",
+                "The sealed git tag still pins the old state.",
+                f"When you are ready, re-seal with: resynth audit {project} "
+                f"then resynth seal {project}",
+            ]
+        )
+    gate = None
+    if not dry_run:
+        gate = intake.check_intake_gate(pdir)
+        messages.append(f"gate 01-intake: {gate['status']}")
+    return {
+        "ok": True if dry_run else gate["status"] == "PASS",
+        "gate": gate,
+        "events": events,
+        "messages": messages,
+    }
diff --git a/src/resynth/reconcile.py b/src/resynth/reconcile.py
index 70e3a4d..4cbec11 100644
--- a/src/resynth/reconcile.py
+++ b/src/resynth/reconcile.py
@@ -58,6 +58,20 @@ def _candidates(claims: list[dict]) -> list[dict]:
     return out
 
 
+def _locator_hint(claim: dict) -> str:
+    """Short deep link hint for the claims index, empty when absent."""
+    loc = claim.get("source_locator")
+    if not isinstance(loc, dict):
+        return ""
+    if loc.get("timestamp"):
+        return f" @ {loc['timestamp']}"
+    if loc.get("page"):
+        return f" p. {loc['page']}"
+    if loc.get("anchor"):
+        return f" #{loc['anchor']}"
+    return ""
+
+
 def _claims_index_md(project: str, claims: list[dict]) -> str:
     by_tag: dict[str, list[dict]] = {}
     for c in claims:
@@ -77,7 +91,7 @@ def _claims_index_md(project: str, claims: list[dict]) -> str:
         for c in sorted(by_tag[tag], key=lambda c: c["claim_id"]):
             lines.append(
                 f"- {c['claim_id']} ({c['source_id']}, {c['claim_type']}, "
-                f"confidence {c['confidence_as_stated']}) {c['claim_text']}"
+                f"confidence {c['confidence_as_stated']}){_locator_hint(c)} {c['claim_text']}"
             )
         lines.append("")
     return "\n".join(lines)
diff --git a/src/resynth/resolve/__init__.py b/src/resynth/resolve/__init__.py
new file mode 100644
index 0000000..e410b25
--- /dev/null
+++ b/src/resynth/resolve/__init__.py
@@ -0,0 +1,278 @@
+"""Source resolution, a stage 1 verb. Follow links inside ingested
+sources, fetch the linked material and register it as new sources with
+provenance. Resolution re-evaluates gate 01-intake rather than adding a
+gate of its own.
+
+Outcomes are tracked in index/resolution.jsonl so re-runs are cheap and
+byte identical when nothing changed.
+"""
+
+from __future__ import annotations
+
+import json
+from datetime import date
+from pathlib import Path
+
+from .. import config, intake
+from ..errors import ResynthError
+from ..fsutil import iter_jsonl, parse_frontmatter, safe_write, sha256_text
+from .discover import discover_targets
+from .fetchers import classify_target, fetch_local, fetch_url, fetch_vimeo, fetch_youtube
+from .net import FetchError
+
+MANIFEST = "resolution.jsonl"
+
+_HEADER = (
+    "# RESYNTH source resolution manifest. One JSON object per line records a\n"
+    "# discovered link target and its outcome: fetched, duplicate,\n"
+    "# transcript_pending or failed. Maintained by `resynth resolve`."
+)
+_KEYS = ["target", "kind", "status", "source_id", "resolved_from", "sha256", "fetched_at", "note"]
+_FETCHERS = {"local": fetch_local, "youtube": fetch_youtube, "vimeo": fetch_vimeo, "url": fetch_url}
+
+
+def manifest_path(pdir: Path) -> Path:
+    """Path to the resolution manifest inside a project directory."""
+    return pdir / "index" / MANIFEST
+
+
+def _load_manifest(pdir: Path) -> dict[str, dict]:
+    path = manifest_path(pdir)
+    out: dict[str, dict] = {}
+    if path.is_file():
+        for _lineno, _raw, obj, err in iter_jsonl(path):
+            if obj is not None and obj.get("target"):
+                out[obj["target"]] = obj
+    return out
+
+
+def _write_manifest(pdir: Path, records: list[dict]) -> None:
+    lines = [_HEADER]
+    lines += [json.dumps({k: rec.get(k) for k in _KEYS}, ensure_ascii=False) for rec in records]
+    safe_write(manifest_path(pdir), "\n".join(lines) + "\n", pdir)
+
+
+def _record(
+    target: str,
+    kind: str,
+    status: str,
+    *,
+    source_id: str | None = None,
+    resolved_from: str | None = None,
+    sha256: str | None = None,
+    note: str | None = None,
+    prior: dict | None = None,
+) -> dict:
+    rec = {
+        "target": target,
+        "kind": kind,
+        "status": status,
+        "source_id": source_id,
+        "resolved_from": resolved_from,
+        "sha256": sha256,
+        "fetched_at": None,
+        "note": note,
+    }
+    if (
+        prior is not None
+        and prior.get("fetched_at")
+        and all(prior.get(k) == rec[k] for k in _KEYS if k != "fetched_at")
+    ):
+        return prior
+    rec["fetched_at"] = date.today().isoformat()
+    return rec
+
+
+def _scan_targets(scan: list[dict]) -> list[dict]:
+    out: list[dict] = []
+    seen: set[str] = set()
+    for fm in scan:
+        origin = fm.get("origin")
+        origin = origin if isinstance(origin, str) else ""
+        for t in discover_targets(fm.get("_body", ""), origin):
+            if t["raw"] in seen:
+                continue
+            seen.add(t["raw"])
+            out.append(
+                {"raw": t["raw"], "kind": classify_target(t["raw"]), "parent": fm.get("source_id")}
+            )
+    return out
+
+
+def preview_targets(project: str) -> list[dict]:
+    """Discovery only, no network: targets with parent sid and kind."""
+    pdir = config.project_dir(project)
+    scan = [fm for fm in intake.load_sources(pdir) if not fm.get("resolved_from")]
+    return _scan_targets(scan)
+
+
+def _upgrade_source(pdir: Path, sid: str, doc: dict) -> str:
+    """Rewrite an existing pending stub in place, keeping its identity."""
+    matches = sorted((pdir / "sources").glob(f"{sid}-*.md"))
+    if not matches:
+        raise ResynthError(f"source file for {sid} not found, cannot upgrade transcript")
+    path = matches[0]
+    fm, _old = parse_frontmatter(path.read_text(encoding="utf-8"), path.name)
+    body = doc["body_markdown"]
+    fm["title"] = doc["title"]
+    fm["sha256"] = sha256_text(body)
+    fm["transcript_status"] = doc["transcript_status"]
+    safe_write(path, f"---\n{intake.frontmatter_block(fm)}---\n{body}", pdir)
+    return fm["sha256"]
+
+
+def run_resolve(
+    project: str,
+    only: str | None = None,
+    source_ids: list[str] | None = None,
+    dry_run: bool = False,
+) -> dict:
+    """Discover, fetch and register link targets found inside sources.
+
+    By default every source without a resolved_from parent is scanned, so
+    fetched sources are never scanned in turn. Passing source_ids scans
+    exactly those sources instead, including already resolved ones. The
+    only filter keeps just the targets containing that substring. Targets
+    recorded in the manifest as fetched or duplicate are skipped, failed
+    and transcript_pending targets are retried. Re-evaluates gate
+    01-intake and returns ok, gate, counts, events and messages.
+    """
+    pdir = config.project_dir(project)
+    sources = intake.load_sources(pdir)
+    if not sources:
+        raise ResynthError(
+            f"project '{project}' has no sources, run: resynth intake {project} <files>"
+        )
+    if source_ids:
+        by_id = {fm.get("source_id"): fm for fm in sources}
+        missing = [sid for sid in source_ids if sid not in by_id]
+        if missing:
+            raise ResynthError(f"unknown source id(s): {', '.join(missing)}")
+        scan = [by_id[sid] for sid in source_ids]
+    else:
+        scan = [fm for fm in sources if not fm.get("resolved_from")]
+    targets = _scan_targets(scan)
+    prior_manifest = _load_manifest(pdir)
+    manifest = dict(prior_manifest)
+    counts = {"fetched": 0, "cached": 0, "duplicate": 0, "transcript_pending": 0, "failed": 0}
+    messages: list[str] = []
+    events: list[dict] = []
+
+    for t in targets:
+        raw, kind, parent = t["raw"], t["kind"], t["parent"]
+        if only and only.lower() not in raw.lower():
+            continue
+        prior = prior_manifest.get(raw)
+        if prior and prior.get("status") in ("fetched", "duplicate"):
+            counts["cached"] += 1
+            messages.append(f"{raw}: cached ({prior.get('source_id')})")
+            events.append({"target": raw, "action": "cached", "source_id": prior.get("source_id")})
+            continue
+        if dry_run:
+            counts["fetched"] += 1
+            messages.append(f"{raw}: would fetch ({kind})")
+            events.append({"target": raw, "action": "would-fetch", "kind": kind})
+            continue
+        try:
+            doc = _FETCHERS[kind](raw)
+        except FetchError as err:
+            note = str(err)
+            manifest[raw] = _record(
+                raw,
+                kind,
+                "failed",
+                source_id=prior.get("source_id") if prior else None,
+                resolved_from=parent,
+                note=note,
+                prior=prior,
+            )
+            counts["failed"] += 1
+            messages.append(f"{raw}: failed ({note})")
+            events.append({"target": raw, "action": "failed", "note": note})
+            continue
+        pending = doc["transcript_status"] == "pending"
+        prior_sid = prior.get("source_id") if prior else None
+        if prior_sid and prior.get("status") in ("transcript_pending", "failed"):
+            resolved_from = prior.get("resolved_from") or parent
+            if pending:
+                manifest[raw] = _record(
+                    raw,
+                    kind,
+                    "transcript_pending",
+                    source_id=prior_sid,
+                    resolved_from=resolved_from,
+                    sha256=prior.get("sha256"),
+                    prior=prior,
+                )
+                counts["transcript_pending"] += 1
+                messages.append(f"{raw}: transcript still pending ({prior_sid})")
+                events.append(
+                    {"target": raw, "action": "transcript-pending", "source_id": prior_sid}
+                )
+                continue
+            digest = _upgrade_source(pdir, prior_sid, doc)
+            manifest[raw] = _record(
+                raw,
+                kind,
+                "fetched",
+                source_id=prior_sid,
+                resolved_from=resolved_from,
+                sha256=digest,
+                prior=prior,
+            )
+            counts["fetched"] += 1
+            messages.append(f"{raw}: fetched as {prior_sid} ({doc['source_type']})")
+            events.append({"target": raw, "action": "upgraded", "source_id": prior_sid})
+            continue
+        result = intake.register_source(
+            pdir,
+            doc["body_markdown"],
+            title=doc["title"],
+            origin=doc["origin"],
+            source_type=doc["source_type"],
+            url=doc["url"],
+            resolved_from=parent,
+            author_or_tool=doc["author_or_tool"],
+            date_authored=doc["date_authored"],
+            transcript_status=doc["transcript_status"],
+        )
+        sid = result["source_id"]
+        if result["action"] == "duplicate":
+            status = "duplicate"
+            messages.append(f"{raw}: duplicate of {sid}")
+        elif pending:
+            status = "transcript_pending"
+            messages.append(f"{raw}: transcript pending, stub created as {sid}")
+        else:
+            status = "fetched"
+            messages.append(f"{raw}: fetched as {sid} ({doc['source_type']})")
+        counts[status] += 1
+        manifest[raw] = _record(
+            raw, kind, status, source_id=sid, resolved_from=parent, sha256=result["sha256"],
+            prior=prior,
+        )
+        events.append({"target": raw, "action": status, "source_id": sid})
+
+    if not dry_run:
+        ordered: list[dict] = []
+        emitted: set[str] = set()
+        for t in targets:
+            rec = manifest.get(t["raw"])
+            if rec is not None and t["raw"] not in emitted:
+                ordered.append(rec)
+                emitted.add(t["raw"])
+        for raw, rec in prior_manifest.items():
+            if raw not in emitted:
+                ordered.append(rec)
+                emitted.add(raw)
+        if ordered:
+            _write_manifest(pdir, ordered)
+    gate = intake.check_intake_gate(pdir, dry_run=dry_run)
+    messages.append(f"gate 01-intake: {gate['status']}")
+    return {
+        "ok": gate["status"] == "PASS",
+        "gate": gate,
+        "counts": counts,
+        "events": events,
+        "messages": messages,
+    }
diff --git a/src/resynth/resolve/discover.py b/src/resynth/resolve/discover.py
new file mode 100644
index 0000000..5e26a07
--- /dev/null
+++ b/src/resynth/resolve/discover.py
@@ -0,0 +1,81 @@
+"""Discover fetchable targets referenced inside a source body.
+
+URLs are taken from bare links and markdown link destinations. Local paths
+are only accepted from markdown destinations or backtick spans, with a
+supported suffix, and only when the file actually exists. Nothing else is
+guessed.
+"""
+
+from __future__ import annotations
+
+import re
+from pathlib import Path
+
+from ..intake import SUPPORTED
+
+_TARGET_RE = re.compile(
+    r"\]\(((?:[^()\s]|\([^()]*\))+)\)"  # markdown link destination
+    r"|(https?://[^\s<>\"'`\]]+)"  # bare url
+    r"|`([^`\n]+)`"  # backtick span
+)
+_TRAILING = ")>,.;:]\"'"
+_ABS_RE = re.compile(r"^(?:[A-Za-z]:[\\/]|/)")
+_SCHEME_RE = re.compile(r"^[a-z][a-z0-9+.-]*://", re.IGNORECASE)
+
+
+def _strip_url(url: str) -> str:
+    while url:
+        ch = url[-1]
+        if ch not in _TRAILING:
+            break
+        if ch == ")" and url.count("(") >= url.count(")"):
+            break
+        url = url[:-1]
+    return url
+
+
+def _resolve_local(raw: str, origin: str) -> str | None:
+    cand = raw.strip()
+    if not cand or _SCHEME_RE.match(cand):
+        return None
+    if Path(cand).suffix.lower() not in SUPPORTED:
+        return None
+    if _ABS_RE.match(cand):
+        path = Path(cand)
+        if path.is_file():
+            return str(path.resolve())
+    if origin and not _SCHEME_RE.match(origin):
+        path = Path(origin).parent / cand
+        if path.is_file():
+            return str(path.resolve())
+    return None
+
+
+def discover_targets(body: str, origin: str) -> list[dict]:
+    """Return ordered, deduped targets: {"raw": str, "kind": "url"|"local"}."""
+    out: list[dict] = []
+    seen: set[str] = set()
+
+    def add(raw: str, kind: str) -> None:
+        if raw and raw not in seen:
+            seen.add(raw)
+            out.append({"raw": raw, "kind": kind})
+
+    for match in _TARGET_RE.finditer(body):
+        dest, bare, span = match.groups()
+        if dest is not None:
+            if dest.lower().startswith(("http://", "https://")):
+                add(dest, "url")
+            elif not _SCHEME_RE.match(dest):
+                local = _resolve_local(dest, origin)
+                if local:
+                    add(local, "local")
+        elif bare is not None:
+            url = _strip_url(bare)
+            if url:
+                add(url, "url")
+        else:
+            local = _resolve_local(span, origin)
+            if local:
+                add(local, "local")
+    return out
diff --git a/src/resynth/resolve/fetchers.py b/src/resynth/resolve/fetchers.py
new file mode 100644
index 0000000..a5682f4
--- /dev/null
+++ b/src/resynth/resolve/fetchers.py
@@ -0,0 +1,277 @@
+"""Fetchers turn a resolved target into a FetchedDoc dict ready for intake."""
+
+from __future__ import annotations
+
+import html as htmllib
+import json
+import os
+import re
+import tempfile
+import xml.etree.ElementTree as ET
+from pathlib import Path
+from urllib.parse import parse_qs, quote, urljoin, urlsplit
+
+from .. import intake
+from ..errors import ResynthError
+from . import net
+from .net import FetchError
+from .reduce_html import reduce_html
+
+PENDING_STUB = (
+    "# {title}\n"
+    "\n"
+    "> [!info] Video transcript pending\n"
+    "> RESYNTH could not retrieve a public caption track for this video.\n"
+    "> Link: {url}\n"
+    "> Re-run `resynth resolve <project>` to retry, or paste the transcript\n"
+    "> below this callout. The next resolve run can also upgrade this stub.\n"
+)
+
+_YT_HOSTS = {"youtube.com", "www.youtube.com", "m.youtube.com", "youtu.be"}
+_VIMEO_HOSTS = {"vimeo.com", "www.vimeo.com", "player.vimeo.com"}
+
+_VTT_TIME = re.compile(
+    r"(?:(\d+):)?(\d{1,2}):(\d{2})[.,](\d{3})\s+-->\s+(?:(\d+):)?(\d{1,2}):(\d{2})[.,](\d{3})"
+)
+
+
+def classify_target(raw: str) -> str:
+    """Classify a raw target as youtube, vimeo, url or local."""
+    if re.match(r"^https?://", raw, re.IGNORECASE):
+        host = (urlsplit(raw).hostname or "").lower()
+        if host in _YT_HOSTS:
+            return "youtube"
+        if host in _VIMEO_HOSTS:
+            return "vimeo"
+        return "url"
+    if Path(raw).exists():
+        return "local"
+    return "url"
+
+
+def _doc(
+    body: str,
+    title: str,
+    source_type: str,
+    url: str | None,
+    origin: str,
+    author_or_tool: str = "unknown",
+    date_authored: str = "unknown",
+    transcript_status: str | None = None,
+) -> dict:
+    return {
+        "body_markdown": body,
+        "title": title,
+        "source_type": source_type,
+        "url": url,
+        "origin": origin,
+        "author_or_tool": author_or_tool,
+        "date_authored": date_authored,
+        "transcript_status": transcript_status,
+    }
+
+
+def _heading_title(body: str) -> str | None:
+    for line in body.splitlines():
+        if line.startswith("# "):
+            return line[2:].strip()
+    return None
+
+
+def fetch_local(path: str) -> dict:
+    """Convert a local file to a doc. PDFs become source_type pdf,
+    everything else becomes notes."""
+    src = Path(path)
+    try:
+        body = intake._convert(src)
+    except ResynthError as err:
+        raise FetchError(str(err)) from err
+    source_type = "pdf" if src.suffix.lower() == ".pdf" else "notes"
+    return _doc(body, _heading_title(body) or src.stem, source_type, None, str(src))
+
+
+def fetch_url(url: str) -> dict:
+    """Fetch a web url. PDF responses are converted with pdftotext and
+    HTML pages are reduced to clean text. Anything else is an error."""
+    payload, content_type, final_url = net.http_get(url)
+    ct = (content_type or "").split(";")[0].strip().lower()
+    path = urlsplit(final_url or url).path.lower()
+    if ct == "application/pdf" or path.endswith(".pdf"):
+        handle, name = tempfile.mkstemp(suffix=".pdf")
+        tmp = Path(name)
+        try:
+            with os.fdopen(handle, "wb") as fh:
+                fh.write(payload)
+            try:
+                body = intake._convert(tmp)
+            except ResynthError as err:
+                raise FetchError(str(err)) from err
+        finally:
+            tmp.unlink(missing_ok=True)
+        title = _heading_title(body) or Path(path).stem or url
+        return _doc(body, title, "pdf", url, url)
+    if "html" in ct:
+        text = net.decode(payload, content_type)
+        body, title = reduce_html(text, final_url or url)
+        if len(body.strip()) < 200:
+            raise FetchError(
+                "page yielded no extractable text (login wall or script rendered)"
+            )
+        return _doc(body, title or url, "html-article", url, url)
+    raise FetchError(f"unsupported content type {ct or 'unknown'}")
+
+
+def _hms(seconds: float) -> str:
+    s = int(seconds)
+    return f"{s // 3600:02d}:{s % 3600 // 60:02d}:{s % 60:02d}"
+
+
+def _render_transcript(cues: list[tuple[float, float, str]]) -> str:
+    paras: list[list[str]] = [[]]
+    prev_end: float | None = None
+    for start, end, text in cues:
+        text = re.sub(r"\s+", " ", text).strip()
+        if not text:
+            continue
+        if prev_end is not None and start - prev_end > 8:
+            paras.append([])
+        paras[-1].append(f"[{_hms(start)}] {text}")
+        prev_end = end
+    body = "\n\n".join("\n".join(p) for p in paras if p)
+    return f"## Transcript\n\n{body}\n"
+
+
+def _oembed(endpoint: str) -> dict:
+    try:
+        body, _ct, _final = net.http_get(endpoint)
+        data = json.loads(body.decode("utf-8", errors="replace"))
+        return data if isinstance(data, dict) else {}
+    except (FetchError, ValueError):
+        return {}
+
+
+def _video_doc(
+    url: str,
+    title: str,
+    author: str,
+    date_authored: str,
+    cues: list[tuple[float, float, str]],
+) -> dict:
+    if cues:
+        body = f"# {title}\n\n{_render_transcript(cues)}"
+        status = "fetched"
+    else:
+        body = PENDING_STUB.format(title=title, url=url)
+        status = "pending"
+    return _doc(body, title, "video-transcript", url, url, author, date_authored, status)
+
+
+def _youtube_id(url: str) -> str | None:
+    parts = urlsplit(url)
+    host = (parts.hostname or "").lower()
+    if host == "youtu.be":
+        seg = parts.path.strip("/").split("/")[0]
+        return seg or None
+    qs = parse_qs(parts.query)
+    if qs.get("v"):
+        return qs["v"][0]
+    match = re.match(r"^/(?:shorts|embed|live)/([^/?#]+)", parts.path)
+    return match.group(1) if match else None
+
+
+def fetch_youtube(url: str) -> dict:
+    """Fetch a YouTube video as a timestamped transcript source. Falls
+    back to a pending stub when no public caption track is available."""
+    vid = _youtube_id(url)
+    if not vid:
+        raise FetchError("could not determine youtube video id")
+    meta = _oembed(f"https://www.youtube.com/oembed?url={quote(url, safe='')}&format=json")
+    title = meta.get("title") or url
+    author = meta.get("author_name") or "unknown"
+    cues: list[tuple[float, float, str]] = []
+    try:
+        listing, _ct, _final = net.http_get(
+            f"https://www.youtube.com/api/timedtext?type=list&v={vid}"
+        )
+        codes = [t.get("lang_code") or "" for t in ET.fromstring(listing).findall(".//track")]
+        codes = [c for c in codes if c]
+        code = next((c for c in codes if c.lower().startswith("en")), codes[0] if codes else None)
+        if code:
+            track, _ct, _final = net.http_get(
+                f"https://www.youtube.com/api/timedtext?lang={quote(code)}&v={vid}"
+            )
+            for el in ET.fromstring(track).findall(".//text"):
+                start = float(el.get("start") or 0)
+                dur = float(el.get("dur") or 0)
+                cues.append((start, start + dur, htmllib.unescape("".join(el.itertext()))))
+    except (FetchError, ET.ParseError, ValueError):
+        cues = []
+    return _video_doc(url, title, author, "unknown", cues)
+
+
+def _vimeo_id(url: str) -> str | None:
+    for seg in urlsplit(url).path.split("/"):
+        if seg.isdigit():
+            return seg
+    return None
+
+
+def _vtt_seconds(h: str | None, m: str, s: str, ms: str) -> float:
+    return int(h or 0) * 3600 + int(m) * 60 + int(s) + int(ms) / 1000
+
+
+def _parse_vtt(text: str) -> list[tuple[float, float, str]]:
+    cues: list[tuple[float, float, str]] = []
+    lines = text.splitlines()
+    i = 0
+    while i < len(lines):
+        line = lines[i].strip()
+        if line.startswith(("NOTE", "STYLE", "REGION")):
+            i += 1
+            while i < len(lines) and lines[i].strip():
+                i += 1
+            continue
+        if not line or line.startswith("WEBVTT"):
+            i += 1
+            continue
+        match = _VTT_TIME.search(line)
+        if not match:
+            i += 1
+            continue
+        start = _vtt_seconds(*match.groups()[:4])
+        end = _vtt_seconds(*match.groups()[4:])
+        i += 1
+        texts = []
+        while i < len(lines) and lines[i].strip():
+            texts.append(lines[i].strip())
+            i += 1
+        cue_text = htmllib.unescape(re.sub(r"<[^>]+>", "", " ".join(texts)))
+        cues.append((start, end, cue_text))
+    return cues
+
+
+def fetch_vimeo(url: str) -> dict:
+    """Fetch a Vimeo video as a timestamped transcript source. Falls
+    back to a pending stub when no public text track is available."""
+    vid = _vimeo_id(url)
+    if not vid:
+        raise FetchError("could not determine vimeo video id")
+    meta = _oembed(f"https://vimeo.com/api/oembed.json?url={quote(url, safe='')}")
+    title = meta.get("title") or url
+    author = meta.get("author_name") or "unknown"
+    date_authored = str(meta.get("upload_date") or "unknown")[:10] or "unknown"
+    cues: list[tuple[float, float, str]] = []
+    try:
+        body, _ct, _final = net.http_get(f"https://player.vimeo.com/video/{vid}/config")
+        cfg = json.loads(body.decode("utf-8", errors="replace"))
+        tracks = (cfg.get("request") or {}).get("text_tracks") or []
+        track = next(
+            (t for t in tracks if str(t.get("lang", "")).lower().startswith("en")),
+            tracks[0] if tracks else None,
+        )
+        if track and track.get("url"):
+            vtt, _ct, _final = net.http_get(urljoin("https://player.vimeo.com", track["url"]))
+            cues = _parse_vtt(vtt.decode("utf-8", errors="replace"))
+    except (FetchError, ValueError, AttributeError):
+        cues = []
+    return _video_doc(url, title, author, date_authored, cues)
diff --git a/src/resynth/resolve/net.py b/src/resynth/resolve/net.py
new file mode 100644
index 0000000..687d7e5
--- /dev/null
+++ b/src/resynth/resolve/net.py
@@ -0,0 +1,107 @@
+"""HTTP access for source resolution.
+
+Every request goes through this module's `urlopen` attribute so tests can
+patch a single seam. Robots.txt is honoured and requests to one host are
+rate limited.
+"""
+
+from __future__ import annotations
+
+import re
+import time
+import urllib.error
+import urllib.request
+from urllib.parse import urlsplit
+from urllib.robotparser import RobotFileParser
+
+from .. import __version__
+
+urlopen = urllib.request.urlopen
+monotonic = time.monotonic
+sleep = time.sleep
+
+USER_AGENT = (
+    f"resynth/{__version__} (+https://github.com/Markus-Doc/resynth) "
+    "research consolidation tool"
+)
+TIMEOUT = 30
+MAX_BYTES = 10 * 1024 * 1024
+HOST_DELAY = 1.0
+
+_robots: dict[str, RobotFileParser | None] = {}
+_last_hit: dict[str, float] = {}
+
+_CHARSET_RE = re.compile(r"charset=[\"']?([\w.:-]+)", re.IGNORECASE)
+
+
+class FetchError(Exception):
+    """A target could not be fetched. str(err) is the short reason."""
+
+
+def _request(url: str) -> urllib.request.Request:
+    return urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
+
+
+def _robot_parser(base: str) -> RobotFileParser | None:
+    if base in _robots:
+        return _robots[base]
+    parser: RobotFileParser | None
+    try:
+        with urlopen(_request(base + "/robots.txt"), timeout=TIMEOUT) as resp:
+            raw = resp.read(MAX_BYTES)
+        parser = RobotFileParser()
+        parser.parse(raw.decode("utf-8", errors="replace").splitlines())
+    except (urllib.error.URLError, TimeoutError, OSError, ValueError):
+        parser = None
+    _robots[base] = parser
+    return parser
+
+
+def _check_robots(url: str) -> None:
+    parts = urlsplit(url)
+    parser = _robot_parser(f"{parts.scheme}://{parts.netloc}")
+    if parser is not None and not parser.can_fetch(USER_AGENT, url):
+        raise FetchError("disallowed by robots.txt")
+
+
+def _throttle(host: str) -> None:
+    last = _last_hit.get(host)
+    if last is not None:
+        wait = HOST_DELAY - (monotonic() - last)
+        if wait > 0:
+            sleep(wait)
+    _last_hit[host] = monotonic()
+
+
+def http_get(url: str) -> tuple[bytes, str, str]:
+    """GET a url, returning (body, content-type header, final url)."""
+    _check_robots(url)
+    _throttle(urlsplit(url).netloc.lower())
+    try:
+        with urlopen(_request(url), timeout=TIMEOUT) as resp:
+            body = resp.read(MAX_BYTES + 1)
+            content_type = resp.headers.get("Content-Type") or ""
+            final_url = getattr(resp, "url", None) or resp.geturl()
+    except urllib.error.HTTPError as err:
+        raise FetchError(f"http {err.code} {err.reason}") from err
+    except urllib.error.URLError as err:
+        raise FetchError(f"unreachable: {err.reason}") from err
+    except (TimeoutError, OSError) as err:
+        raise FetchError(str(err) or err.__class__.__name__) from err
+    if len(body) > MAX_BYTES:
+        raise FetchError("response exceeds 10 MiB")
+    return body, content_type, final_url
+
+
+def decode(body: bytes, content_type: str) -> str:
+    """Decode a response body using the declared or sniffed charset,
+    falling back to utf-8 with replacement."""
+    match = _CHARSET_RE.search(content_type or "")
+    if not match:
+        match = _CHARSET_RE.search(body[:2048].decode("ascii", errors="ignore"))
+    if match:
+        try:
+            return body.decode(match.group(1), errors="replace")
+        except LookupError:
+            pass
+    return body.decode("utf-8", errors="replace")
diff --git a/src/resynth/resolve/reduce_html.py b/src/resynth/resolve/reduce_html.py
new file mode 100644
index 0000000..c1a50b2
--- /dev/null
+++ b/src/resynth/resolve/reduce_html.py
@@ -0,0 +1,199 @@
+"""Reduce noisy HTML to markdown-ish clean text using the stdlib parser."""
+
+from __future__ import annotations
+
+import re
+from html.parser import HTMLParser
+from urllib.parse import urljoin
+
+DROP = {
+    "script",
+    "style",
+    "noscript",
+    "template",
+    "svg",
+    "form",
+    "nav",
+    "header",
+    "footer",
+    "aside",
+    "iframe",
+}
+REGION = {"main", "article"}
+HEADINGS = {f"h{n}": n for n in range(1, 7)}
+BLOCK_PREFIX = {"p": "", "blockquote": "> ", "li": "- "}
+
+
+def _collapse(text: str) -> str:
+    return re.sub(r"\s+", " ", text).strip()
+
+
+class _Reducer(HTMLParser):
+    def __init__(self, base_url: str):
+        super().__init__(convert_charrefs=True)
+        self.base_url = base_url
+        self.blocks: list[tuple[bool, str]] = []
+        self.cur: list | None = None  # [prefix, parts, is_pre]
+        self.drop = 0
+        self.region_seen = False
+        self.in_region = False
+        self.region_depth = 0
+        self.links: list[str | None] = []
+        self.cells: list[str] | None = None
+
+    def _open(self, prefix: str, pre: bool = False) -> None:
+        self._flush()
+        self.cur = [prefix, [], pre]
+
+    def _flush(self) -> None:
+        if self.cur is None:
+            return
+        prefix, parts, pre = self.cur
+        self.cur = None
+        raw = "".join(parts)
+        if pre:
+            text = raw.strip("\n")
+            if text.strip():
+                self.blocks.append((self.in_region, f"```\n{text}\n```"))
+            return
+        text = _collapse(raw)
+        if not text:
+            return
+        if self.cells is not None:
+            self.cells.append(text)
+            return
+        self.blocks.append((self.in_region, prefix + text))
+
+    def handle_starttag(self, tag, attrs):
+        if tag in DROP:
+            self.drop += 1
+            return
+        if self.drop:
+            return
+        if tag in REGION:
+            if not self.region_seen:
+                self.region_seen = True
+                self.in_region = True
+                self.region_depth = 1
+            elif self.in_region:
+                self.region_depth += 1
+            return
+        if tag in HEADINGS:
+            self._open("#" * HEADINGS[tag] + " ")
+        elif tag == "pre":
+            self._open("", pre=True)
+        elif tag in BLOCK_PREFIX:
+            if tag == "p" and self.cur is not None and self.cur[0] in ("> ", "- "):
+                self.cur[1].append(" ")
+            else:
+                self._open(BLOCK_PREFIX[tag])
+        elif tag == "tr":
+            self._flush()
+            self.cells = []
+        elif tag in ("td", "th"):
+            if self.cells is not None:
+                self._open("")
+        elif tag == "br":
+            if self.cur is not None:
+                self.cur[1].append(" ")
+        elif tag == "a":
+            href = dict(attrs).get("href")
+            self.links.append(urljoin(self.base_url, href) if href else None)
+
+    def handle_endtag(self, tag):
+        if tag in DROP:
+            if self.drop:
+                self.drop -= 1
+            return
+        if self.drop:
+            return
+        if tag in REGION:
+            if self.in_region:
+                self.region_depth -= 1
+                if self.region_depth <= 0:
+                    self.in_region = False
+            return
+        if tag in HEADINGS or tag == "pre":
+            self._flush()
+        elif tag == "p":
+            if self.cur is not None and self.cur[0] == "":
+                self._flush()
+        elif tag == "blockquote":
+            if self.cur is not None and self.cur[0] == "> ":
+                self._flush()
+        elif tag == "li":
+            if self.cur is not None and self.cur[0] == "- ":
+                self._flush()
+        elif tag in ("td", "th"):
+            self._flush()
+        elif tag == "tr":
+            self._flush()
+            if self.cells is not None:
+                row = " | ".join(cell for cell in self.cells if cell)
+                if row:
+                    self.blocks.append((self.in_region, row))
+                self.cells = None
+        elif tag == "table":
+            self.cells = None
+        elif tag == "a":
+            if self.links:
+                href = self.links.pop()
+                if href and self.cur is not None:
+                    self.cur[1].append(f" ({href})")
+
+    def handle_data(self, data):
+        if self.drop or self.cur is None:
+            return
+        self.cur[1].append(data)
+
+
+class _TitleParser(HTMLParser):
+    def __init__(self):
+        super().__init__(convert_charrefs=True)
+        self.og: str | None = None
+        self.title: str | None = None
+        self.h1: str | None = None
+        self._stack: list[list] = []
+
+    def handle_starttag(self, tag, attrs):
+        a = dict(attrs)
+        if tag == "meta" and self.og is None:
+            prop = a.get("property") or a.get("name")
+            if prop == "og:title" and a.get("content"):
+                self.og = _collapse(a["content"])
+        elif tag in ("title", "h1"):
+            self._stack.append([tag, []])
+
+    def handle_data(self, data):
+        if self._stack:
+            self._stack[-1][1].append(data)
+
+    def handle_endtag(self, tag):
+        if self._stack and self._stack[-1][0] == tag:
+            name, parts = self._stack.pop()
+            text = _collapse("".join(parts))
+            if name == "title" and self.title is None and text:
+                self.title = text
+            if name == "h1" and self.h1 is None and text:
+                self.h1 = text
+
+
+def extract_title(html: str) -> str | None:
+    """Page title, preferring og:title, then <title>, then the first h1."""
+    parser = _TitleParser()
+    parser.feed(html)
+    parser.close()
+    return parser.og or parser.title or parser.h1
+
+
+def reduce_html(html: str, base_url: str) -> tuple[str, str | None]:
+    """Return (markdown_body, title) for an HTML page."""
+    reducer = _Reducer(base_url)
+    reducer.feed(html)
+    reducer.close()
+    reducer._flush()
+    blocks = [
+        text for in_region, text in reducer.blocks if in_region or not reducer.region_seen
+    ]
+    body = "\n\n".join(blocks)
+    return (body + "\n" if body else ""), extract_title(html)
diff --git a/src/resynth/synthesise.py b/src/resynth/synthesise.py
index 98ecddf..a04bd9b 100644
--- a/src/resynth/synthesise.py
+++ b/src/resynth/synthesise.py
@@ -87,6 +87,8 @@ def run_synthesise(project: str, dry_run: bool = False, force: bool = False) ->
             {
                 "source_id": fm["source_id"],
                 "title": fm["title"],
+                "source_type": fm.get("source_type") or "report",
+                "url": fm.get("url"),
                 "authority_tier": fm["authority_tier"],
                 "date_authored": fm["date_authored"],
                 "sha256_short": str(fm["sha256"])[:12],
diff --git a/src/resynth/wizard.py b/src/resynth/wizard.py
index ee99b95..0c0c86f 100644
--- a/src/resynth/wizard.py
+++ b/src/resynth/wizard.py
@@ -30,6 +30,7 @@
 from .intake import SUPPORTED, run_intake
 from .project import run_brief, run_init
 from .reconcile import run_reconcile
+from .resolve import preview_targets, run_resolve
 from .synthesise import run_synth_verify, run_synthesise
 
 console = Console()
@@ -463,9 +464,42 @@ def _step_intake(project: str) -> bool:
     for event in result["events"]:
         console.print(f"  {event['source']}: {event['action']}")
     _show_reasons(result)
+    if result.get("ok"):
+        _offer_resolve(project)
     return True
 
 
+def _offer_resolve(project: str) -> None:
+    """Offer to fetch links and file references found inside the new sources."""
+    try:
+        targets = preview_targets(project)
+    except ResynthError as exc:
+        console.print(f"[red]{exc}[/red]")
+        return
+    if not targets:
+        return
+    n = len(targets)
+    if not Confirm.ask(
+        f"I found {n} links and file references inside your reports. "
+        "Fetch them as extra sources now?",
+        default=True,
+    ):
+        return
+    try:
+        resolved = run_resolve(project)
+    except ResynthError as exc:
+        console.print(f"[red]{exc}[/red]")
+        return
+    for line in resolved["messages"]:
+        console.print(f"  {line}")
+    _show_reasons(resolved)
+    if resolved["counts"]["transcript_pending"] > 0:
+        console.print(
+            "Some videos have no public captions yet. You can paste a transcript\n"
+            "into the stub file, or re-run resolve later to retry."
+        )
+
+
 def _step_operator(
     project: str,
     pdir: Path,
diff --git a/templates/extraction-instructions.md.j2 b/templates/extraction-instructions.md.j2
index 02d0d90..b62d62c 100644
--- a/templates/extraction-instructions.md.j2
+++ b/templates/extraction-instructions.md.j2
@@ -27,6 +27,13 @@ Every claim line must contain exactly these fields.
 - confidence_as_stated: one of high, medium, low, unstated
 - depends_on: a list of claim ids this claim depends on, empty list if none
 
+One optional field may be added.
+
+- source_locator: a structured pointer to the exact origin, an object with any
+  of url, page (a PDF page number), timestamp (a video HH:MM:SS time), anchor
+  (an HTML heading slug). Add a timestamp for video-transcript sources and a
+  page number for PDF sources.
+
 ## Rules
 
 1. One claim per line. Split compound statements into separate claims.
diff --git a/templates/master.md.j2 b/templates/master.md.j2
index c20863a..a099a40 100644
--- a/templates/master.md.j2
+++ b/templates/master.md.j2
@@ -35,8 +35,8 @@ No conflicts were recorded during reconciliation.
 
 ## Appendix: Source Register
 
-| Source | Title | Authority | Authored | Content hash |
-| --- | --- | --- | --- | --- |
+| Source | Title | Type | Authority | Authored | Link | Content hash |
+| --- | --- | --- | --- | --- | --- | --- |
 {% for s in sources %}
-| {{ s.source_id }} | {{ s.title }} | {{ s.authority_tier }} | {{ s.date_authored }} | {{ s.sha256_short }} |
+| {{ s.source_id }} | {{ s.title }} | {{ s.source_type }} | {{ s.authority_tier }} | {{ s.date_authored }} | {{ s.url or "-" }} | {{ s.sha256_short }} |
 {% endfor %}
diff --git a/tests/fixtures/resolve/article.html b/tests/fixtures/resolve/article.html
new file mode 100644
index 0000000..39fcfdf
--- /dev/null
+++ b/tests/fixtures/resolve/article.html
@@ -0,0 +1,37 @@
+<!DOCTYPE html>
+<html>
+<head>
+<meta charset="utf-8">
+<meta property="og:title" content="Field Guide to Widgets">
+<title>Field Guide to Widgets - Example Articles</title>
+<script>var scriptVar = "do not extract";</script>
+<style>.x { color: red }</style>
+</head>
+<body>
+<header>Site header boilerplate</header>
+<nav><a href="/home">Site navigation home</a></nav>
+<div class="promo"><p>Outside main, must be ignored.</p></div>
+<main>
+<article>
+<h1>Field Guide to Widgets</h1>
+<p>Widgets are small reusable components that appear in nearly every
+modern interface. This guide walks through selection, installation and
+maintenance of widgets in production systems, with enough detail to
+satisfy the extractable-text threshold used by the resolver.</p>
+<h2>Setup</h2>
+<p>Start by reading <a href="spec.html">the spec</a> and the upstream
+<a href="https://example-articles.test/manual">manual</a> before touching
+anything.</p>
+<ul>
+<li>First step: inventory existing widgets</li>
+<li>Second step: remove broken ones</li>
+</ul>
+<blockquote>Quoted wisdom about widgets from the maintainers.</blockquote>
+<pre>widgetctl install --all</pre>
+<table><tr><th>Name</th><th>Status</th></tr><tr><td>Alpha</td><td>stable</td></tr></table>
+</article>
+</main>
+<footer>Footer copyright notice</footer>
+<aside>Related links sidebar</aside>
+</body>
+</html>
diff --git a/tests/fixtures/resolve/notes-with-links.md b/tests/fixtures/resolve/notes-with-links.md
new file mode 100644
index 0000000..973d678
--- /dev/null
+++ b/tests/fixtures/resolve/notes-with-links.md
@@ -0,0 +1,15 @@
+# Study notes with links
+
+Collected references for the widget research project.
+
+Reading list:
+
+- Main article: https://example-articles.test/guide
+- Mirror copy: https://example-articles.test/guide-copy
+- Conference talk: https://vimeo.com/123456
+- Deep dive video: https://www.youtube.com/watch?v=abc123XYZ
+- Paywalled piece: https://blocked.test/secret
+- [Extra local notes](extra-notes.md)
+
+The article at https://example-articles.test/guide is the primary
+reference and appears twice on purpose.
diff --git a/tests/fixtures/resolve/robots_disallow.txt b/tests/fixtures/resolve/robots_disallow.txt
new file mode 100644
index 0000000..1f53798
--- /dev/null
+++ b/tests/fixtures/resolve/robots_disallow.txt
@@ -0,0 +1,2 @@
+User-agent: *
+Disallow: /
diff --git a/tests/fixtures/resolve/vimeo_config.json b/tests/fixtures/resolve/vimeo_config.json
new file mode 100644
index 0000000..19b91ee
--- /dev/null
+++ b/tests/fixtures/resolve/vimeo_config.json
@@ -0,0 +1 @@
+{"request": {"text_tracks": [{"lang": "en", "label": "English", "url": "/texttrack/123.vtt?token=x"}]}}
diff --git a/tests/fixtures/resolve/vimeo_oembed.json b/tests/fixtures/resolve/vimeo_oembed.json
new file mode 100644
index 0000000..3b658c0
--- /dev/null
+++ b/tests/fixtures/resolve/vimeo_oembed.json
@@ -0,0 +1 @@
+{"title": "Vimeo Talk", "author_name": "Speaker Person", "upload_date": "2024-05-01 10:00:00"}
diff --git a/tests/fixtures/resolve/vimeo_track.vtt b/tests/fixtures/resolve/vimeo_track.vtt
new file mode 100644
index 0000000..964951d
--- /dev/null
+++ b/tests/fixtures/resolve/vimeo_track.vtt
@@ -0,0 +1,15 @@
+WEBVTT
+
+NOTE auto generated
+
+1
+00:00:00.000 --> 00:00:04.000
+Welcome to the talk.
+
+2
+00:00:04.500 --> 00:00:08.000
+We discuss <c>widgets</c> at length.
+
+3
+00:00:20.000 --> 00:00:24.000
+Closing remarks after a pause.
diff --git a/tests/fixtures/resolve/youtube_oembed.json b/tests/fixtures/resolve/youtube_oembed.json
new file mode 100644
index 0000000..d31aed0
--- /dev/null
+++ b/tests/fixtures/resolve/youtube_oembed.json
@@ -0,0 +1 @@
+{"title": "Deep Dive Video", "author_name": "Chan Academy", "type": "video"}
diff --git a/tests/fixtures/resolve/youtube_timedtext_en.xml b/tests/fixtures/resolve/youtube_timedtext_en.xml
new file mode 100644
index 0000000..09bab96
--- /dev/null
+++ b/tests/fixtures/resolve/youtube_timedtext_en.xml
@@ -0,0 +1,5 @@
+<transcript>
+<text start="0" dur="4.2">Welcome to the deep dive.</text>
+<text start="4.2" dur="3.8">Today we cover widgets &amp;amp; gadgets.</text>
+<text start="20" dur="5">After a long pause we resume.</text>
+</transcript>
diff --git a/tests/fixtures/resolve/youtube_timedtext_list.xml b/tests/fixtures/resolve/youtube_timedtext_list.xml
new file mode 100644
index 0000000..cbfda17
--- /dev/null
+++ b/tests/fixtures/resolve/youtube_timedtext_list.xml
@@ -0,0 +1,4 @@
+<transcript_list docid="123">
+<track id="0" name="" lang_code="de" lang_original="Deutsch"/>
+<track id="1" name="" lang_code="en" lang_original="English" lang_default="true"/>
+</transcript_list>
diff --git a/tests/test_e2e.py b/tests/test_e2e.py
index cd8229d..763a3f3 100644
--- a/tests/test_e2e.py
+++ b/tests/test_e2e.py
@@ -2,7 +2,9 @@
 simulated operator inputs, finishing sealed with every gate PASS."""
 
 import json
+from pathlib import Path
 
+import pytest
 import yaml
 
 from click.testing import CliRunner
@@ -12,8 +14,9 @@
 from resynth import config, demo_operator
 from resynth.audit import run_audit, run_seal
 from resynth.cli import main as cli_main
-from resynth.export import run_export
-from resynth.extract import run_extract, run_extract_verify
+from resynth.errors import ResynthError
+from resynth.export import load_master, run_export
+from resynth.extract import load_all_claims, run_extract, run_extract_verify
 from resynth.gates import all_gates
 from resynth.intake import run_intake
 from resynth.project import run_brief, run_init
@@ -49,8 +52,22 @@ def test_full_pipeline(ws):
     assert (pdir / "output" / "SEAL.yaml").is_file()
     assert run_export("demo")["ok"]
     exported = json.loads((pdir / "output" / "MASTER.json").read_text(encoding="utf-8"))
-    assert exported["format"] == "resynth-master/1"
+    assert exported["format"] == "resynth-master/2"
     assert len(exported["claims"]) == 11
+    assert [s["source_id"] for s in exported["sources"]] == ["S01", "S02", "S03"]
+    for src in exported["sources"]:
+        assert src["schema_version"] == 2
+        assert src["source_type"] == "report"
+        assert "url" in src and "resolved_from" in src
+        assert "_file" not in src and "_body" not in src
+    # claims are dumped whole, so optional fields like source_locator ride along
+    assert exported["claims"] == sorted(
+        load_all_claims(pdir), key=lambda c: c["claim_id"]
+    )
+
+    loaded = load_master(pdir / "output" / "MASTER.json")
+    assert loaded["format_version"] == 2
+    assert loaded["sources"] == exported["sources"]
 
     runner = CliRunner()
     result = runner.invoke(cli_main, ["status", "demo", "--json"])
@@ -79,6 +96,24 @@ def test_dry_run_writes_nothing(ws):
     assert (config.runs_dir()).is_dir(), "dry runs still produce a run log"
 
 
+V1_FIXTURE = Path(__file__).resolve().parents[1] / "projects" / "demo" / "output" / "MASTER.json"
+
+
+def test_load_master_v1_fixture():
+    loaded = load_master(V1_FIXTURE)
+    assert loaded["format"] == "resynth-master/1"
+    assert loaded["format_version"] == 1
+    assert loaded["sources"] == []
+    assert loaded["claims"], "v1 payload content passes through untouched"
+
+
+def test_load_master_unknown_format(tmp_path):
+    bad = tmp_path / "MASTER.json"
+    bad.write_text(json.dumps({"format": "resynth-master/9"}), encoding="utf-8")
+    with pytest.raises(ResynthError, match="unsupported master format resynth-master/9"):
+        load_master(bad)
+
+
 def test_doctor_json(ws):
     result = CliRunner().invoke(cli_main, ["doctor", "--json"])
     payload = json.loads(result.output)
diff --git a/tests/test_extract.py b/tests/test_extract.py
index 5f1e29c..be06b71 100644
--- a/tests/test_extract.py
+++ b/tests/test_extract.py
@@ -57,6 +57,51 @@ def test_missing_and_unknown_fields_fail():
     assert any("unknown field extra" in e for e in errors)
 
 
+@pytest.mark.parametrize(
+    "locator",
+    [
+        {"url": "https://example.com/talk"},
+        {"page": 12},
+        {"timestamp": "00:14:32"},
+        {"timestamp": "4:05"},
+        {"anchor": "section-slug"},
+        {"url": "https://example.com/talk", "page": 3, "timestamp": "1:02:03", "anchor": "intro"},
+    ],
+)
+def test_valid_source_locator_accepted(locator):
+    claim = dict(VALID)
+    claim["source_locator"] = locator
+    assert validate_claim(claim, "S01") == []
+
+
+def test_claim_without_locator_still_valid():
+    claim = dict(VALID)
+    assert "source_locator" not in claim
+    assert validate_claim(claim, "S01") == []
+
+
+@pytest.mark.parametrize(
+    "locator,fragment",
+    [
+        ({"chapter": 3}, "unknown source_locator key chapter"),
+        ({}, "at least one of url, page, timestamp, anchor"),
+        ("page 12", "source_locator must be an object"),
+        ({"timestamp": "12m30s"}, "H:MM or HH:MM:SS"),
+        ({"timestamp": "1:2:03"}, "H:MM or HH:MM:SS"),
+        ({"page": 0}, "positive integer"),
+        ({"page": -4}, "positive integer"),
+        ({"page": "12"}, "positive integer"),
+        ({"url": ""}, "source_locator.url"),
+    ],
+)
+def test_bad_source_locator_rejected(locator, fragment):
+    claim = dict(VALID)
+    claim["source_locator"] = locator
+    errors = validate_claim(claim, "S01")
+    assert errors, f"expected violation for source_locator={locator!r}"
+    assert any(fragment in e for e in errors)
+
+
 def test_workspace_generation(ws):
     pdir = make_project()
     run_extract("demo")
@@ -112,3 +157,81 @@ def test_coverage_heuristic_warns(ws, tmp_path):
     result = run_extract_verify("cov")
     assert result["ok"]
     assert any("coverage" in w for w in result["gate"]["warnings"])
+
+
+VIDEO_URL = "https://example.com/talks/argon2"
+
+
+def _video_project(project="vid"):
+    """A project with one handcrafted schema v2 video-transcript source."""
+    from resynth import config
+    from resynth.fsutil import sha256_text
+    from resynth.intake import check_intake_gate
+    from resynth.project import run_init
+
+    run_init(project)
+    pdir = config.project_dir(project)
+    body = "# Argon2 conference talk\n\nThe speaker recommends Argon2id throughout.\n"
+    frontmatter = (
+        "---\n"
+        "source_id: S01\n"
+        "title: Argon2 conference talk\n"
+        "origin: test\n"
+        "author_or_tool: unknown\n"
+        "date_authored: unknown\n"
+        "date_ingested: '2026-06-12'\n"
+        "authority_tier: unknown\n"
+        "recency_rank: 1\n"
+        f"sha256: {sha256_text(body)}\n"
+        "schema_version: 2\n"
+        "source_type: video-transcript\n"
+        f"url: {VIDEO_URL}\n"
+        "resolved_from: null\n"
+        "transcript_status: fetched\n"
+        "---\n"
+    )
+    (pdir / "sources" / "S01-argon2-conference-talk.md").write_text(
+        frontmatter + body, encoding="utf-8"
+    )
+    check_intake_gate(pdir)
+    run_extract(project)
+    return pdir
+
+
+def test_verify_warns_video_claim_without_timestamp(ws):
+    pdir = _video_project()
+    (pdir / "claims" / "S01-claims.jsonl").write_text(
+        json.dumps(VALID) + "\n", encoding="utf-8"
+    )
+    result = run_extract_verify("vid")
+    assert result["ok"]
+    assert any(
+        "S01-C001: video source claim without a timestamp locator" in w
+        for w in result["gate"]["warnings"]
+    )
+
+
+def test_verify_warns_locator_url_mismatch(ws):
+    pdir = _video_project()
+    claim = dict(VALID)
+    claim["source_locator"] = {"timestamp": "00:14:32", "url": "https://elsewhere.example.com"}
+    (pdir / "claims" / "S01-claims.jsonl").write_text(
+        json.dumps(claim) + "\n", encoding="utf-8"
+    )
+    result = run_extract_verify("vid")
+    assert result["ok"]
+    warnings = result["gate"]["warnings"]
+    assert any("S01-C001: locator url does not match the source url" in w for w in warnings)
+    assert not any("without a timestamp" in w for w in warnings)
+
+
+def test_verify_no_url_warning_when_locator_url_matches(ws):
+    pdir = _video_project()
+    claim = dict(VALID)
+    claim["source_locator"] = {"timestamp": "00:14:32", "url": VIDEO_URL}
+    (pdir / "claims" / "S01-claims.jsonl").write_text(
+        json.dumps(claim) + "\n", encoding="utf-8"
+    )
+    result = run_extract_verify("vid")
+    assert result["ok"]
+    assert not any("locator url" in w for w in result["gate"]["warnings"])
diff --git a/tests/test_migrate.py b/tests/test_migrate.py
new file mode 100644
index 0000000..639d0bd
--- /dev/null
+++ b/tests/test_migrate.py
@@ -0,0 +1,155 @@
+"""Tests for the schema v1 to v2 project upgrader."""
+
+import pytest
+
+from helpers import snapshot
+from resynth import config
+from resynth.errors import ResynthError
+from resynth.fsutil import parse_frontmatter, sha256_text
+from resynth.intake import FRONTMATTER_FIELDS, SCHEMA_VERSION, V2_FIELDS, register_source
+from resynth.migrate import run_migrate
+from resynth.project import run_init
+
+BODY_A = "# Alpha Report\n\nArgon2id is preferred for password hashing.\n"
+BODY_B = "# Beta Paper\n\nA bcrypt work factor of at least 12 is required.\n"
+
+
+def write_v1(pdir, sid, origin, body, rank):
+    """Handcraft a pre 0.2.0 source file with the nine v1 fields only."""
+    lines = [
+        f"source_id: {sid}",
+        f"title: Source {sid}",
+        f"origin: {origin}",
+        "author_or_tool: unknown",
+        "date_authored: unknown",
+        "date_ingested: '2026-01-01'",
+        "authority_tier: unknown",
+        f"recency_rank: {rank}",
+        f"sha256: {sha256_text(body)}",
+    ]
+    path = pdir / "sources" / f"{sid}-source.md"
+    path.write_text(
+        "---\n" + "\n".join(lines) + "\n---\n" + body, encoding="utf-8", newline="\n"
+    )
+    return path
+
+
+def v1_project(ws, project="demo"):
+    run_init(project)
+    pdir = config.project_dir(project)
+    write_v1(pdir, "S01", "notes/alpha-report.md", BODY_A, 1)
+    write_v1(pdir, "S02", "papers/beta-paper.PDF", BODY_B, 2)
+    return pdir
+
+
+def test_migrate_adds_v2_fields_in_canonical_order(ws):
+    pdir = v1_project(ws)
+    res = run_migrate("demo")
+    assert res["ok"] is True
+    assert [e["action"] for e in res["events"]] == ["replaced", "replaced"]
+    for fname, stype in (("S01-source.md", "report"), ("S02-source.md", "pdf")):
+        fm, _body = parse_frontmatter(
+            (pdir / "sources" / fname).read_text(encoding="utf-8"), fname
+        )
+        assert list(fm) == [*FRONTMATTER_FIELDS, *V2_FIELDS]
+        assert fm["schema_version"] == SCHEMA_VERSION
+        assert fm["source_type"] == stype
+        assert fm["url"] is None
+        assert fm["resolved_from"] is None
+        assert "transcript_status" not in fm
+    assert res["events"][0]["source_type"] == "report"
+    assert res["events"][1]["source_type"] == "pdf"
+
+
+def test_v1_values_preserved(ws):
+    pdir = v1_project(ws)
+    before, _ = parse_frontmatter(
+        (pdir / "sources" / "S01-source.md").read_text(encoding="utf-8"), "S01"
+    )
+    run_migrate("demo")
+    after, _ = parse_frontmatter(
+        (pdir / "sources" / "S01-source.md").read_text(encoding="utf-8"), "S01"
+    )
+    for field in FRONTMATTER_FIELDS:
+        assert after[field] == before[field]
+
+
+def test_body_untouched_and_hash_still_valid(ws):
+    pdir = v1_project(ws)
+    res = run_migrate("demo")
+    raw = (pdir / "sources" / "S01-source.md").read_bytes()
+    assert raw.endswith(BODY_A.encode("utf-8"))
+    _fm, body = parse_frontmatter(raw.decode("utf-8"), "S01")
+    assert body == BODY_A
+    gate = res["gate"]
+    assert gate["status"] == "PASS"
+    assert gate["warnings"] == []
+    assert (pdir / "gates" / "01-intake.yaml").is_file()
+
+
+def test_messages_verbatim(ws):
+    v1_project(ws)
+    res = run_migrate("demo")
+    assert res["messages"] == [
+        "S01: upgraded to schema v2 (source_type report)",
+        "S02: upgraded to schema v2 (source_type pdf)",
+        "Frontmatter has changed, so the existing seal no longer matches these files.",
+        "The sealed git tag still pins the old state.",
+        "When you are ready, re-seal with: resynth audit demo then resynth seal demo",
+        "gate 01-intake: PASS",
+    ]
+
+
+def test_idempotent_second_run(ws):
+    pdir = v1_project(ws)
+    run_migrate("demo")
+    sources = sorted((pdir / "sources").glob("S*.md"))
+    mtimes = {f.name: f.stat().st_mtime_ns for f in sources}
+    before = snapshot(pdir / "sources", pdir / "gates")
+    res = run_migrate("demo")
+    assert res["ok"] is True
+    assert [e["action"] for e in res["events"]] == ["unchanged", "unchanged"]
+    assert res["messages"] == [
+        "S01: already schema v2",
+        "S02: already schema v2",
+        "gate 01-intake: PASS",
+    ]
+    assert snapshot(pdir / "sources", pdir / "gates") == before
+    for f in sources:
+        assert f.stat().st_mtime_ns == mtimes[f.name]
+
+
+def test_dry_run_writes_nothing(ws):
+    pdir = v1_project(ws)
+    before = snapshot(pdir)
+    res = run_migrate("demo", dry_run=True)
+    assert res["ok"] is True
+    assert res["gate"] is None
+    assert [e["action"] for e in res["events"]] == ["dry-run", "dry-run"]
+    assert snapshot(pdir) == before
+
+
+def test_mixed_project_migrates_only_v1(ws):
+    run_init("demo")
+    pdir = config.project_dir("demo")
+    write_v1(pdir, "S01", "notes/alpha-report.md", BODY_A, 1)
+    register_source(pdir, BODY_B, title="Beta Paper", origin="papers/beta-paper.md")
+    v2_file = next(f for f in (pdir / "sources").glob("S02*.md"))
+    v2_before = v2_file.read_bytes()
+    res = run_migrate("demo")
+    events = {e["source"]: e["action"] for e in res["events"]}
+    assert events["S01"] == "replaced"
+    assert events["S02"] == "unchanged"
+    assert v2_file.read_bytes() == v2_before
+    fm, _ = parse_frontmatter(
+        (pdir / "sources" / "S01-source.md").read_text(encoding="utf-8"), "S01"
+    )
+    assert fm["schema_version"] == SCHEMA_VERSION
+    assert res["gate"]["status"] == "PASS"
+    assert "S02: already schema v2" in res["messages"]
+
+
+def test_empty_project_raises(ws):
+    run_init("demo")
+    with pytest.raises(ResynthError, match="no sources to migrate"):
+        run_migrate("demo")
diff --git a/tests/test_reconcile.py b/tests/test_reconcile.py
index 5d8d7c3..3dbc18e 100644
--- a/tests/test_reconcile.py
+++ b/tests/test_reconcile.py
@@ -23,6 +23,30 @@ def test_workspace_and_candidates(ws):
     assert {"S01-C001", "S02-C001"} in pairs
 
 
+def test_claims_index_carries_locator_hint(ws):
+    pdir = to_extracted()
+    path = pdir / "claims" / "S03-claims.jsonl"
+    claim = {
+        "claim_id": "S03-C099",
+        "source_id": "S03",
+        "claim_text": "Deep linked claim",
+        "claim_type": "fact",
+        "topic_tags": ["locator-test"],
+        "supporting_quote_location": "Somewhere",
+        "confidence_as_stated": "unstated",
+        "depends_on": [],
+        "source_locator": {"timestamp": "00:14:32", "page": 12},
+    }
+    path.write_text(
+        path.read_text(encoding="utf-8") + json.dumps(claim) + "\n", encoding="utf-8"
+    )
+    run_reconcile("demo")
+    index = (pdir / "index" / "claims-index.md").read_text(encoding="utf-8")
+    line = next(l for l in index.splitlines() if "S03-C099" in l)
+    assert " @ 00:14:32" in line
+    assert line.index("@ 00:14:32") < line.index("Deep linked claim")
+
+
 def test_gate_fails_until_decisions_written(ws):
     to_extracted()
     result = run_reconcile("demo")
diff --git a/tests/test_resolve.py b/tests/test_resolve.py
new file mode 100644
index 0000000..32ff40c
--- /dev/null
+++ b/tests/test_resolve.py
@@ -0,0 +1,565 @@
+import io
+import shutil
+import urllib.error
+from pathlib import Path
+from urllib.parse import quote
+
+import pytest
+
+from helpers import snapshot
+
+from resynth import config
+from resynth.errors import ResynthError
+from resynth.fsutil import iter_jsonl, parse_frontmatter, sha256_text
+from resynth.intake import run_intake
+from resynth.project import run_init
+from resynth.resolve import manifest_path, preview_targets, run_resolve
+from resynth.resolve import net
+from resynth.resolve.discover import discover_targets
+from resynth.resolve.fetchers import (
+    classify_target,
+    fetch_local,
+    fetch_url,
+    fetch_vimeo,
+    fetch_youtube,
+)
+from resynth.resolve.net import FetchError
+from resynth.resolve.reduce_html import extract_title, reduce_html
+
+FIX = Path(__file__).parent / "fixtures" / "resolve"
+
+ARTICLE_URL = "https://example-articles.test/guide"
+COPY_URL = "https://example-articles.test/guide-copy"
+VIMEO_URL = "https://vimeo.com/123456"
+YT_URL = "https://www.youtube.com/watch?v=abc123XYZ"
+BLOCKED_URL = "https://blocked.test/secret"
+
+YT_OEMBED = f"https://www.youtube.com/oembed?url={quote(YT_URL, safe='')}&format=json"
+YT_LIST = "https://www.youtube.com/api/timedtext?type=list&v=abc123XYZ"
+YT_TRACK = "https://www.youtube.com/api/timedtext?lang=en&v=abc123XYZ"
+VIMEO_OEMBED = f"https://vimeo.com/api/oembed.json?url={quote(VIMEO_URL, safe='')}"
+VIMEO_CONFIG = "https://player.vimeo.com/video/123456/config"
+VIMEO_VTT = "https://player.vimeo.com/texttrack/123.vtt?token=x"
+
+
+class _Headers:
+    def __init__(self, ctype):
+        self.ctype = ctype
+
+    def get(self, name, default=None):
+        return self.ctype if name.lower() == "content-type" else default
+
+
+class _Resp:
+    def __init__(self, url, status, ctype, body):
+        self.url = url
+        self.status = status
+        self.headers = _Headers(ctype)
+        self._body = body
+
+    def read(self, n=-1):
+        return self._body if n is None or n < 0 else self._body[:n]
+
+    def geturl(self):
+        return self.url
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, *exc):
+        return False
+
+
+class FakeNet:
+    """Maps url -> (status, content_type, bytes); unmapped urls assert."""
+
+    def __init__(self, mapping):
+        self.mapping = dict(mapping)
+        self.calls = []
+
+    def __call__(self, req, timeout=None, **kwargs):
+        url = getattr(req, "full_url", req)
+        self.calls.append(url)
+        assert url in self.mapping, f"unmapped url: {url}"
+        status, ctype, body = self.mapping[url]
+        if status >= 400:
+            raise urllib.error.HTTPError(url, status, "error", None, io.BytesIO(b""))
+        return _Resp(url, status, ctype, body)
+
+
+def robots_404(host):
+    return (f"https://{host}/robots.txt", (404, "text/plain", b""))
+
+
+def base_mapping():
+    return dict(
+        [
+            robots_404("example-articles.test"),
+            robots_404("www.youtube.com"),
+            robots_404("vimeo.com"),
+            robots_404("player.vimeo.com"),
+            (
+                "https://blocked.test/robots.txt",
+                (200, "text/plain", (FIX / "robots_disallow.txt").read_bytes()),
+            ),
+            (ARTICLE_URL, (200, "text/html; charset=utf-8", (FIX / "article.html").read_bytes())),
+            (COPY_URL, (200, "text/html; charset=utf-8", (FIX / "article.html").read_bytes())),
+            (YT_OEMBED, (200, "application/json", (FIX / "youtube_oembed.json").read_bytes())),
+            (YT_LIST, (200, "text/xml", b"<transcript_list></transcript_list>")),
+            (VIMEO_OEMBED, (200, "application/json", (FIX / "vimeo_oembed.json").read_bytes())),
+            (VIMEO_CONFIG, (404, "application/json", b"")),
+        ]
+    )
+
+
+def use_net(monkeypatch, mapping):
+    fake = FakeNet(mapping)
+    monkeypatch.setattr(net, "urlopen", fake)
+    return fake
+
+
+@pytest.fixture(autouse=True)
+def _clean_net(monkeypatch):
+    monkeypatch.setattr(net, "_robots", {})
+    monkeypatch.setattr(net, "_last_hit", {})
+    monkeypatch.setattr(net, "sleep", lambda _s: None)
+
+
+def make_links_project(ws):
+    srcdir = ws / "incoming"
+    srcdir.mkdir()
+    notes = srcdir / "notes-with-links.md"
+    shutil.copy(FIX / "notes-with-links.md", notes)
+    (srcdir / "extra-notes.md").write_text(
+        "# Extra notes\n\nLocal supporting notes for the resolve test suite.\n",
+        encoding="utf-8",
+    )
+    run_init("links")
+    run_intake("links", [str(notes)])
+    return config.project_dir("links")
+
+
+def local_target(ws):
+    return str((ws / "incoming" / "extra-notes.md").resolve())
+
+
+def load_manifest(pdir):
+    return {
+        rec["target"]: rec for _n, _raw, rec, _err in iter_jsonl(manifest_path(pdir)) if rec
+    }
+
+
+# --- classify ---------------------------------------------------------------
+
+
+@pytest.mark.parametrize(
+    ("raw", "kind"),
+    [
+        ("https://www.youtube.com/watch?v=abc", "youtube"),
+        ("https://youtu.be/abc", "youtube"),
+        ("https://m.youtube.com/shorts/abc", "youtube"),
+        ("https://vimeo.com/123456", "vimeo"),
+        ("https://player.vimeo.com/video/123456", "vimeo"),
+        ("https://example.com/page", "url"),
+        ("http://example.com/page.pdf", "url"),
+    ],
+)
+def test_classify_target_urls(raw, kind):
+    assert classify_target(raw) == kind
+
+
+def test_classify_target_local(tmp_path):
+    f = tmp_path / "doc.md"
+    f.write_text("x", encoding="utf-8")
+    assert classify_target(str(f)) == "local"
+    assert classify_target(str(tmp_path / "missing.md")) == "url"
+
+
+# --- reducer ----------------------------------------------------------------
+
+
+def test_reduce_html_main_only_and_block_forms():
+    html = (FIX / "article.html").read_text(encoding="utf-8")
+    body, title = reduce_html(html, ARTICLE_URL)
+    assert title == "Field Guide to Widgets"
+    assert "# Field Guide to Widgets" in body
+    assert "## Setup" in body
+    assert "- First step: inventory existing widgets" in body
+    assert "> Quoted wisdom about widgets from the maintainers." in body
+    assert "```\nwidgetctl install --all\n```" in body
+    assert "Name | Status" in body
+    assert "Alpha | stable" in body
+    assert "the spec (https://example-articles.test/spec.html)" in body
+    assert "manual (https://example-articles.test/manual)" in body
+    for noise in (
+        "Site navigation",
+        "Site header",
+        "Footer copyright",
+        "Related links",
+        "scriptVar",
+        "Outside main",
+    ):
+        assert noise not in body
+
+
+def test_extract_title_precedence():
+    og = (
+        '<html><head><meta property="og:title" content="OG Title">'
+        "<title>Doc Title</title></head><body><h1>H1 Title</h1></body></html>"
+    )
+    assert extract_title(og) == "OG Title"
+    titled = "<html><head><title>Doc Title</title></head><body><h1>H1 Title</h1></body></html>"
+    assert extract_title(titled) == "Doc Title"
+    assert extract_title("<html><body><h1>H1 Title</h1></body></html>") == "H1 Title"
+    assert extract_title("<html><body><p>nothing</p></body></html>") is None
+
+
+# --- net --------------------------------------------------------------------
+
+
+def test_robots_disallow_blocks_fetch(monkeypatch):
+    use_net(
+        monkeypatch,
+        dict([("https://blocked.test/robots.txt", (200, "text/plain", (FIX / "robots_disallow.txt").read_bytes()))]),
+    )
+    with pytest.raises(FetchError, match="robots"):
+        net.http_get(BLOCKED_URL)
+
+
+def test_size_cap(monkeypatch):
+    big = b"x" * (net.MAX_BYTES + 1)
+    use_net(
+        monkeypatch,
+        dict([robots_404("big.test"), ("https://big.test/file", (200, "text/html", big))]),
+    )
+    with pytest.raises(FetchError, match="10 MiB"):
+        net.http_get("https://big.test/file")
+
+
+def test_rate_limit_sleeps_between_same_host_requests(monkeypatch):
+    use_net(
+        monkeypatch,
+        dict(
+            [
+                robots_404("rate.test"),
+                robots_404("other.test"),
+                ("https://rate.test/a", (200, "text/plain", b"a")),
+                ("https://rate.test/b", (200, "text/plain", b"b")),
+                ("https://other.test/c", (200, "text/plain", b"c")),
+            ]
+        ),
+    )
+    naps = []
+    monkeypatch.setattr(net, "sleep", naps.append)
+    net.http_get("https://rate.test/a")
+    net.http_get("https://rate.test/b")
+    assert len(naps) == 1
+    assert 0 < naps[0] <= net.HOST_DELAY
+    net.http_get("https://other.test/c")
+    assert len(naps) == 1
+
+
+# --- fetchers ---------------------------------------------------------------
+
+
+def test_fetch_local(tmp_path):
+    f = tmp_path / "extra.md"
+    f.write_text("# Local Title\n\nBody text.\n", encoding="utf-8")
+    doc = fetch_local(str(f))
+    assert doc["title"] == "Local Title"
+    assert doc["source_type"] == "notes"
+    assert doc["url"] is None
+    assert doc["origin"] == str(f)
+    assert doc["transcript_status"] is None
+    assert "Body text." in doc["body_markdown"]
+    plain = tmp_path / "plain.txt"
+    plain.write_text("no heading here\n", encoding="utf-8")
+    assert fetch_local(str(plain))["title"] == "plain"
+
+
+def test_fetch_url_article(monkeypatch):
+    use_net(monkeypatch, base_mapping())
+    doc = fetch_url(ARTICLE_URL)
+    assert doc["source_type"] == "html-article"
+    assert doc["title"] == "Field Guide to Widgets"
+    assert doc["url"] == ARTICLE_URL
+    assert doc["origin"] == ARTICLE_URL
+    assert "# Field Guide to Widgets" in doc["body_markdown"]
+
+
+def test_fetch_url_no_extractable_text(monkeypatch):
+    tiny = b"<html><body><main><p>too short</p></main></body></html>"
+    use_net(
+        monkeypatch,
+        dict([robots_404("tiny.test"), ("https://tiny.test/p", (200, "text/html", tiny))]),
+    )
+    with pytest.raises(FetchError, match="no extractable text"):
+        fetch_url("https://tiny.test/p")
+
+
+def test_fetch_url_unsupported_content_type(monkeypatch):
+    use_net(
+        monkeypatch,
+        dict([robots_404("img.test"), ("https://img.test/x", (200, "image/png", b"\x89PNG"))]),
+    )
+    with pytest.raises(FetchError, match="unsupported content type"):
+        fetch_url("https://img.test/x")
+
+
+def test_fetch_youtube_happy_path(monkeypatch):
+    mapping = base_mapping()
+    mapping[YT_LIST] = (200, "text/xml", (FIX / "youtube_timedtext_list.xml").read_bytes())
+    mapping[YT_TRACK] = (200, "text/xml", (FIX / "youtube_timedtext_en.xml").read_bytes())
+    use_net(monkeypatch, mapping)
+    doc = fetch_youtube(YT_URL)
+    assert doc["transcript_status"] == "fetched"
+    assert doc["source_type"] == "video-transcript"
+    assert doc["title"] == "Deep Dive Video"
+    assert doc["author_or_tool"] == "Chan Academy"
+    body = doc["body_markdown"]
+    assert body.startswith("# Deep Dive Video\n\n## Transcript\n")
+    assert "[00:00:00] Welcome to the deep dive." in body
+    assert "[00:00:04] Today we cover widgets & gadgets." in body
+    # gap over 8 seconds starts a new paragraph
+    assert "gadgets.\n\n[00:00:20] After a long pause we resume." in body
+
+
+def test_fetch_youtube_no_captions_yields_pending_stub(monkeypatch):
+    use_net(monkeypatch, base_mapping())
+    doc = fetch_youtube(YT_URL)
+    assert doc["transcript_status"] == "pending"
+    assert doc["title"] == "Deep Dive Video"
+    body = doc["body_markdown"]
+    assert body.startswith("# Deep Dive Video\n")
+    assert "> [!info] Video transcript pending" in body
+    assert f"> Link: {YT_URL}" in body
+
+
+def test_fetch_vimeo_happy_path(monkeypatch):
+    mapping = base_mapping()
+    mapping[VIMEO_CONFIG] = (200, "application/json", (FIX / "vimeo_config.json").read_bytes())
+    mapping[VIMEO_VTT] = (200, "text/vtt", (FIX / "vimeo_track.vtt").read_bytes())
+    use_net(monkeypatch, mapping)
+    doc = fetch_vimeo(VIMEO_URL)
+    assert doc["transcript_status"] == "fetched"
+    assert doc["title"] == "Vimeo Talk"
+    assert doc["author_or_tool"] == "Speaker Person"
+    assert doc["date_authored"] == "2024-05-01"
+    body = doc["body_markdown"]
+    assert "[00:00:00] Welcome to the talk." in body
+    assert "[00:00:04] We discuss widgets at length." in body
+    assert "length.\n\n[00:00:20] Closing remarks after a pause." in body
+
+
+def test_fetch_vimeo_no_captions_yields_pending_stub(monkeypatch):
+    use_net(monkeypatch, base_mapping())
+    doc = fetch_vimeo(VIMEO_URL)
+    assert doc["transcript_status"] == "pending"
+    assert "> [!info] Video transcript pending" in doc["body_markdown"]
+    assert f"> Link: {VIMEO_URL}" in doc["body_markdown"]
+
+
+# --- discovery --------------------------------------------------------------
+
+
+def test_discover_targets_urls_and_punctuation():
+    body = (
+        "see https://example.com/x, then <https://example.com/y> and\n"
+        "[wiki](https://example.com/page_(1)) plus https://example.com/a#frag.\n"
+        "repeat https://example.com/x once more\n"
+    )
+    raws = [t["raw"] for t in discover_targets(body, "")]
+    assert raws == [
+        "https://example.com/x",
+        "https://example.com/y",
+        "https://example.com/page_(1)",
+        "https://example.com/a#frag",
+    ]
+
+
+def test_discover_targets_local_paths(tmp_path):
+    extra = tmp_path / "extra.md"
+    extra.write_text("x", encoding="utf-8")
+    origin = str(tmp_path / "notes.md")
+    body = f"see [extra](extra.md) and `missing.md` and `{extra}`\nplain extra.md mention\n"
+    targets = discover_targets(body, origin)
+    assert targets == [{"raw": str(extra.resolve()), "kind": "local"}]
+
+
+def test_preview_targets(ws, monkeypatch):
+    pdir = make_links_project(ws)
+    targets = preview_targets("links")
+    assert [t["raw"] for t in targets] == [
+        ARTICLE_URL,
+        COPY_URL,
+        VIMEO_URL,
+        YT_URL,
+        BLOCKED_URL,
+        local_target(ws),
+    ]
+    assert [t["kind"] for t in targets] == ["url", "url", "vimeo", "youtube", "url", "local"]
+    assert all(t["parent"] == "S01" for t in targets)
+
+
+# --- run_resolve ------------------------------------------------------------
+
+
+def test_run_resolve_requires_project_and_sources(ws):
+    with pytest.raises(ResynthError, match="not found"):
+        run_resolve("nope")
+    run_init("empty")
+    with pytest.raises(ResynthError, match="no sources"):
+        run_resolve("empty")
+
+
+def test_run_resolve_unknown_source_id(ws):
+    make_links_project(ws)
+    with pytest.raises(ResynthError, match="unknown source"):
+        run_resolve("links", source_ids=["S99"])
+
+
+def test_run_resolve_integration(ws, monkeypatch):
+    pdir = make_links_project(ws)
+    use_net(monkeypatch, base_mapping())
+    result = run_resolve("links")
+    assert result["ok"] is True
+    assert result["gate"]["status"] == "PASS"
+    assert result["counts"] == {
+        "fetched": 2,
+        "cached": 0,
+        "duplicate": 1,
+        "transcript_pending": 2,
+        "failed": 1,
+    }
+    files = sorted(f.name for f in (pdir / "sources").glob("S*.md"))
+    assert len(files) == 5
+
+    article = next((pdir / "sources").glob("S02-*.md"))
+    fm, body = parse_frontmatter(article.read_text(encoding="utf-8"), article.name)
+    assert fm["schema_version"] == 2
+    assert fm["source_type"] == "html-article"
+    assert fm["url"] == ARTICLE_URL
+    assert fm["resolved_from"] == "S01"
+    assert fm["sha256"] == sha256_text(body)
+    assert "transcript_status" not in fm
+
+    vimeo = next((pdir / "sources").glob("S03-*.md"))
+    fm_v, body_v = parse_frontmatter(vimeo.read_text(encoding="utf-8"), vimeo.name)
+    assert fm_v["source_type"] == "video-transcript"
+    assert fm_v["transcript_status"] == "pending"
+    assert "> [!info] Video transcript pending" in body_v
+
+    local = next((pdir / "sources").glob("S05-*.md"))
+    fm_l, _body_l = parse_frontmatter(local.read_text(encoding="utf-8"), local.name)
+    assert fm_l["source_type"] == "notes"
+    assert fm_l["url"] is None
+    assert fm_l["resolved_from"] == "S01"
+
+    recs = load_manifest(pdir)
+    assert manifest_path(pdir).read_text(encoding="utf-8").startswith("#")
+    assert len(recs) == 6
+    assert recs[ARTICLE_URL]["status"] == "fetched"
+    assert recs[ARTICLE_URL]["source_id"] == "S02"
+    assert recs[COPY_URL] == {**recs[COPY_URL], "status": "duplicate", "source_id": "S02"}
+    assert recs[VIMEO_URL]["status"] == "transcript_pending"
+    assert recs[YT_URL]["status"] == "transcript_pending"
+    assert recs[BLOCKED_URL]["status"] == "failed"
+    assert recs[BLOCKED_URL]["note"] == "disallowed by robots.txt"
+    assert recs[BLOCKED_URL]["source_id"] is None
+    assert recs[local_target(ws)]["status"] == "fetched"
+    assert all(r["resolved_from"] == "S01" for r in recs.values())
+
+    msgs = result["messages"]
+    assert f"{ARTICLE_URL}: fetched as S02 (html-article)" in msgs
+    assert f"{COPY_URL}: duplicate of S02" in msgs
+    assert f"{VIMEO_URL}: transcript pending, stub created as S03" in msgs
+    assert f"{YT_URL}: transcript pending, stub created as S04" in msgs
+    assert f"{BLOCKED_URL}: failed (disallowed by robots.txt)" in msgs
+    assert f"{local_target(ws)}: fetched as S05 (notes)" in msgs
+    assert msgs[-1] == "gate 01-intake: PASS"
+
+
+def test_run_resolve_second_run_is_idempotent(ws, monkeypatch):
+    make_links_project(ws)
+    use_net(monkeypatch, base_mapping())
+    run_resolve("links")
+    before = snapshot(ws)
+    result = run_resolve("links")
+    assert snapshot(ws) == before
+    # fetched and duplicate targets are cached; pending and failed retry
+    assert result["counts"] == {
+        "fetched": 0,
+        "cached": 3,
+        "duplicate": 0,
+        "transcript_pending": 2,
+        "failed": 1,
+    }
+    msgs = result["messages"]
+    assert f"{ARTICLE_URL}: cached (S02)" in msgs
+    assert f"{COPY_URL}: cached (S02)" in msgs
+    assert f"{local_target(ws)}: cached (S05)" in msgs
+
+
+def test_transcript_upgrade_in_place(ws, monkeypatch):
+    pdir = make_links_project(ws)
+    fake = use_net(monkeypatch, base_mapping())
+    run_resolve("links")
+    stub = next((pdir / "sources").glob("S03-*.md"))
+    fm_before, _ = parse_frontmatter(stub.read_text(encoding="utf-8"), stub.name)
+    fake.mapping[VIMEO_CONFIG] = (
+        200,
+        "application/json",
+        (FIX / "vimeo_config.json").read_bytes(),
+    )
+    fake.mapping[VIMEO_VTT] = (200, "text/vtt", (FIX / "vimeo_track.vtt").read_bytes())
+    result = run_resolve("links")
+    assert result["counts"]["fetched"] == 1
+    assert result["counts"]["cached"] == 3
+    assert result["counts"]["transcript_pending"] == 1
+    assert f"{VIMEO_URL}: fetched as S03 (video-transcript)" in result["messages"]
+
+    upgraded = next((pdir / "sources").glob("S03-*.md"))
+    assert upgraded.name == stub.name
+    fm, body = parse_frontmatter(upgraded.read_text(encoding="utf-8"), upgraded.name)
+    assert fm["source_id"] == "S03"
+    assert fm["transcript_status"] == "fetched"
+    assert fm["sha256"] == sha256_text(body)
+    assert fm["recency_rank"] == fm_before["recency_rank"]
+    assert fm["date_ingested"] == fm_before["date_ingested"]
+    assert fm["resolved_from"] == "S01"
+    assert "## Transcript" in body
+    assert "[00:00:00] Welcome to the talk." in body
+    recs = load_manifest(pdir)
+    assert recs[VIMEO_URL]["status"] == "fetched"
+    assert recs[VIMEO_URL]["source_id"] == "S03"
+    assert len(list((pdir / "sources").glob("S*.md"))) == 5
+
+
+def test_run_resolve_dry_run_writes_nothing(ws, monkeypatch):
+    pdir = make_links_project(ws)
+    fake = use_net(monkeypatch, {})
+    before = snapshot(ws)
+    result = run_resolve("links", dry_run=True)
+    assert snapshot(ws) == before
+    assert fake.calls == []
+    assert not manifest_path(pdir).exists()
+    would = [m for m in result["messages"] if "would fetch" in m]
+    assert len(would) == 6
+    assert f"{VIMEO_URL}: would fetch (vimeo)" in result["messages"]
+    assert f"{YT_URL}: would fetch (youtube)" in result["messages"]
+    assert f"{local_target(ws)}: would fetch (local)" in result["messages"]
+
+
+def test_run_resolve_only_filter(ws, monkeypatch):
+    pdir = make_links_project(ws)
+    use_net(monkeypatch, base_mapping())
+    result = run_resolve("links", only="vimeo")
+    assert result["counts"] == {
+        "fetched": 0,
+        "cached": 0,
+        "duplicate": 0,
+        "transcript_pending": 1,
+        "failed": 0,
+    }
+    recs = load_manifest(pdir)
+    assert list(recs) == [VIMEO_URL]
diff --git a/tests/test_synthesis.py b/tests/test_synthesis.py
index 9730b8d..26057aa 100644
--- a/tests/test_synthesis.py
+++ b/tests/test_synthesis.py
@@ -18,6 +18,11 @@ def test_scaffold_generation(ws):
     assert "## Appendix: Source Register" in text
     assert "[!todo]" in text
     assert "[S01-C001, S02-C001]" in text
+    assert "| Source | Title | Type | Authority | Authored | Link | Content hash |" in text
+    row = next(line for line in text.splitlines() if line.startswith("| S01 |"))
+    cells = [c.strip() for c in row.strip("|").split("|")]
+    assert cells[2] == "report", "Type cell carries source_type"
+    assert cells[5] == "-", "Link cell renders a dash when url is absent"
 
 
 def test_full_synthesis_passes(ws):
diff --git a/tests/test_wizard.py b/tests/test_wizard.py
index 6feb4a3..999f53d 100644
--- a/tests/test_wizard.py
+++ b/tests/test_wizard.py
@@ -49,3 +49,19 @@ def test_state_done_after_seal(ws):
     pdir = run_full()
     run_brief("demo", "topic")
     assert project_state(pdir) == "done"
+
+
+def test_cli_version_and_new_commands(ws):
+    from click.testing import CliRunner
+
+    from resynth import __version__
+    from resynth.cli import main
+
+    runner = CliRunner()
+    res = runner.invoke(main, ["--version"])
+    assert res.exit_code == 0
+    assert f"resynth, version {__version__}" in res.output
+    assert "bye" not in res.output
+    for cmd in ("resolve", "migrate"):
+        res = runner.invoke(main, [cmd, "--help"])
+        assert res.exit_code == 0, res.output