A script that extracts full content from X.com (Twitter) articles and converts them to markdown files. No AI summarization - preserves all original content.
Available in both Node.js (recommended) and Python versions.
✅ Full content extraction - No summarization, captures everything
✅ Automatic title detection - Uses article title for filename
✅ Preserves structure - Maintains headings, lists, and paragraphs
✅ Image extraction - Includes all images from the article
✅ Metadata capture - Author, timestamp, stats (likes, retweets, etc.)
✅ Browser automation - Handles dynamic content loading
Requirements: Node.js 16+ and npm
# Install dependencies
npm installThat's it! Puppeteer will automatically download the required browser.
Requirements: Python 3.7-3.13 (Note: Python 3.14 has compatibility issues)
# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install playwright
# Install browser
playwright install chromiumnode x_to_markdown.js [-o output_dir] <x.com_url>Example:
node x_to_markdown.js -o ./out_put https://x.com/bozhou_ai/status/2011738838767423983python x_to_markdown.py [-o output_dir] <x.com_url>Example:
python x_to_markdown.py -o ./out_put https://x.com/bozhou_ai/status/2011738838767423983Both versions will create a markdown file named with an English-only, short, no-space slug (derived from the title; falls back to x_<handle>_<status_id> when needed) in $HOME/tmp by default. You can override this with -o <output_dir>. If the output directory does not exist, the script will exit with an error.
The generated markdown file includes:
- Title (H1 heading)
- Metadata (Author, Date, Source URL, Stats)
- Full Content (All text, headings, lists, images)
Example output structure:
# Article Title
**Author:** Name (@handle)
**Date:** 2026-01-16
**Source:** [URL](URL)
**Stats:** 33 Replies | 105 Reposts | 471 Likes
---
## Section 1
Content here...

## Section 2
More content...- Browser Automation: Uses Playwright to load the X.com page
- Content Loading: Scrolls through the page to load all dynamic content
- Extraction: Uses JavaScript to extract text, headings, images, and metadata
- Conversion: Converts extracted content to properly formatted markdown
- File Creation: Saves as
<title>.mdin $HOME/tmp (errors if the directory is missing)
- Python 3.7+
- Playwright
- Internet connection
Make sure you've installed Playwright browsers:
playwright install chromiumThe page might be loading slowly. The script waits up to 30 seconds. If you have a slow connection, you can modify the timeout in the script.
Some X.com pages require login. The script works best with public articles and threads.
MIT License - Feel free to use and modify as needed.