Auto-detect Apple MPS device by chonknick · Pull Request #17 · feyninc/pulpie

chonknick · 2026-06-30T06:10:54Z

Extractor()/Pipeline() with no device previously checked only CUDA, falling straight to CPU on Macs even when MPS was available. Adds default_device() (cuda → mps → cpu) and uses it in both. Verified: Extractor() now lands on mps on Apple Silicon and produces identical output to CPU; 65 parity tests pass.

- Convert emoji feature list to plain markdown bullets - Qualify the 20x faster/cheaper claim as 'on an L4 GPU' (it's 20.1x on L4, 7.1x on A100 per the blog) so it doesn't read as universal - Remove all em/en dashes from prose per style preference

Add default_device() (cuda -> mps -> cpu) and use it in Extractor and Pipeline when device is unspecified. Previously device=None fell back straight to CPU on Macs even when MPS was available, silently leaving Apple acceleration unused.

Copilot

Pull request overview

Adds a shared device auto-detection helper so Extractor() and Pipeline() select Apple MPS on macOS when CUDA is unavailable, instead of falling straight to CPU.

Changes:

Introduce default_device() in model_utils (CUDA → MPS → CPU).
Use default_device() when Extractor(device=None) and Pipeline(devices=None) resolve their default device(s).
Update README formatting/copy and attribution text.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/pulpie/model_utils.py`	Adds `default_device()` helper for consistent device selection.
`src/pulpie/extractor.py`	Switches default device selection to `default_device()`.
`src/pulpie/pipeline.py`	Uses `default_device()` when CUDA is unavailable and `devices` is not provided.
`README.md`	Reformats and rewrites copy; also changes footer attribution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+Pulpie extracts the main content from raw HTML, stripping navigation, ads, sidebars, and footers. It uses small encoder models that label every block in a single forward pass, approaching state-of-the-art extraction quality while running up to 20x faster and 20x cheaper than autoregressive extractors on an L4 GPU.

-**⚡ Fast** — an encoder labels every block in one forward pass (13.7 pages/sec on an L4) </br>
-**🎯 Accurate** — matches SOTA quality: 0.862–0.873 ROUGE-5 F1 on WebMainBench </br>
-**🪶 Small** — the recommended model is 210M params, fits on any GPU </br>
-**💸 Cheap** — clean 1 billion pages for ~$7,900 vs ~$159,000 for the leading decoder </br>
-**📦 Simple** — `pip install pulpie`, then `Extractor().extract(html)` </br>
-**🔌 Batched** — overlapped CPU+GPU pipeline scales across multiple GPUs </br>
+- **Fast.** An encoder labels every block in one forward pass (13.7 pages/sec on an L4).
+- **Accurate.** Matches state-of-the-art quality: 0.862 to 0.873 ROUGE-5 F1 on WebMainBench.
+- **Small.** The recommended model is 210M parameters and fits on any GPU.



 <div align="center">
-Built by <a href="https://github.com/chonkie-inc">Chonkie</a>, the open-source work behind <a href="https://usefeyn.com">Feyn</a>.
+Built by <a href="https://usefeyn.com">Feyn</a>.


chonknick added 4 commits June 29, 2026 21:54

README footer: built by Feyn

b1d55ad

README: static 'python 3.9+' badge instead of per-version list

7817f99

Auto-detect Apple MPS device

b2c6bd3

Add default_device() (cuda -> mps -> cpu) and use it in Extractor and Pipeline when device is unspecified. Previously device=None fell back straight to CPU on Macs even when MPS was available, silently leaving Apple acceleration unused.

Copilot AI review requested due to automatic review settings June 30, 2026 06:10

Copilot started reviewing on behalf of chonknick June 30, 2026 06:11 View session

chonknick merged commit 0cb0f5b into main Jun 30, 2026
8 checks passed

chonknick deleted the dx-mps-autodetect branch June 30, 2026 06:12

Copilot AI reviewed Jun 30, 2026

View reviewed changes

chonknick mentioned this pull request Jun 30, 2026

DX fixes: markdown cleanup, error propagation, extract_batch, dataclass #18

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto-detect Apple MPS device#17

Auto-detect Apple MPS device#17
chonknick merged 4 commits into
mainfrom
dx-mps-autodetect

chonknick commented Jun 30, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chonknick commented Jun 30, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants