Skip to content

feat: add lighthtml output format for minimal-weight HTML extraction#30

Merged
simonediroma merged 1 commit into
mainfrom
claude/lightweight-html-extraction-5HF9Y
Apr 7, 2026
Merged

feat: add lighthtml output format for minimal-weight HTML extraction#30
simonediroma merged 1 commit into
mainfrom
claude/lightweight-html-extraction-5HF9Y

Conversation

@simonediroma
Copy link
Copy Markdown
Owner

New output_format="lighthtml" strips <style> tags, all <script> tags
except JSON-LD, HTML comments, and all tag attributes (inline style,
class, id, data-*, etc.), returning bare HTML structure with text
content preserved. JSON-LD blocks are kept with their type attribute.

https://claude.ai/code/session_015LSkMsBv6F16qSHkWgKo4A

New output_format="lighthtml" strips <style> tags, all <script> tags
except JSON-LD, HTML comments, and all tag attributes (inline style,
class, id, data-*, etc.), returning bare HTML structure with text
content preserved. JSON-LD blocks are kept with their type attribute.

https://claude.ai/code/session_015LSkMsBv6F16qSHkWgKo4A
@simonediroma simonediroma merged commit 0332936 into main Apr 7, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants