Add source-type aware extraction and avoid binary/text fallbacks in extracted.md

## Summary

`wiki_capture_source` would be more robust with a source-type aware extraction pipeline. Today, fallback behavior can write non-markdown content into `extracted.md`, including raw PDF bytes or raw XML.

## Problem

The packet contract says:

```text
raw/sources/SRC-*/extracted.md — normalized markdown text
```

But current behavior can produce:

- raw PDF bytes if MarkItDown times out and `curl` fallback succeeds,
- raw XML for local `.xml` files,
- potentially raw HTML or other text-like formats without normalization.

## Suggested design

Use a typed extraction pipeline:

1. Identify source type:
   - URL extension
   - `Content-Type`
   - file extension
   - magic bytes
2. Route to appropriate extractor:
   - PDF → download original, run MarkItDown/PDF extractor
   - HTML → readability/markdown extraction
   - XML → XML-to-markdown or project-provided converter
   - Markdown/text → copy as-is
   - binary/unknown → write clear extraction failure message
3. Record extraction metadata in manifest:
   - `extractor`
   - `extraction_status: success | failed | timeout | unsupported`
   - `content_type`
   - `original_file`
   - optional error message

## Expected behavior

`extracted.md` should always be human-readable markdown/text, or a clear extraction failure note. It should never contain binary bytes.

## Optional extension point

A pluggable extractor interface would let projects register domain-specific converters for structured XML or other specialized source formats.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add source-type aware extraction and avoid binary/text fallbacks in extracted.md #4

Summary

Problem

Suggested design

Expected behavior

Optional extension point

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add source-type aware extraction and avoid binary/text fallbacks in extracted.md #4

Description

Summary

Problem

Suggested design

Expected behavior

Optional extension point

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions