X.com to Markdown Converter

A script that extracts full content from X.com (Twitter) articles and converts them to markdown files. No AI summarization - preserves all original content.

Available in both Node.js (recommended) and Python versions.

Features

✅ Full content extraction - No summarization, captures everything
✅ Automatic title detection - Uses article title for filename
✅ Preserves structure - Maintains headings, lists, and paragraphs
✅ Image extraction - Includes all images from the article
✅ Metadata capture - Author, timestamp, stats (likes, retweets, etc.)
✅ Browser automation - Handles dynamic content loading

Installation

Option 1: Node.js (Recommended)

Requirements: Node.js 16+ and npm

# Install dependencies
npm install

That's it! Puppeteer will automatically download the required browser.

Option 2: Python

Requirements: Python 3.7-3.13 (Note: Python 3.14 has compatibility issues)

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install playwright

# Install browser
playwright install chromium

Usage

Node.js Version

node x_to_markdown.js [-o output_dir] <x.com_url>

Example:

node x_to_markdown.js -o ./out_put https://x.com/bozhou_ai/status/2011738838767423983

Python Version

python x_to_markdown.py [-o output_dir] <x.com_url>

Example:

python x_to_markdown.py -o ./out_put https://x.com/bozhou_ai/status/2011738838767423983

Both versions will create a markdown file named with an English-only, short, no-space slug (derived from the title; falls back to x_<handle>_<status_id> when needed) in $HOME/tmp by default. You can override this with -o <output_dir>. If the output directory does not exist, the script will exit with an error.

Output Format

The generated markdown file includes:

Title (H1 heading)
Metadata (Author, Date, Source URL, Stats)
Full Content (All text, headings, lists, images)

Example output structure:

# Article Title

**Author:** Name (@handle)
**Date:** 2026-01-16
**Source:** [URL](URL)
**Stats:** 33 Replies | 105 Reposts | 471 Likes

---

## Section 1

Content here...

![Image description](image_url)

## Section 2

More content...

How It Works

Browser Automation: Uses Playwright to load the X.com page
Content Loading: Scrolls through the page to load all dynamic content
Extraction: Uses JavaScript to extract text, headings, images, and metadata
Conversion: Converts extracted content to properly formatted markdown
File Creation: Saves as <title>.md in $HOME/tmp (errors if the directory is missing)

Requirements

Python 3.7+
Playwright
Internet connection

Troubleshooting

"playwright: command not found"

Make sure you've installed Playwright browsers:

playwright install chromium

Timeout errors

The page might be loading slowly. The script waits up to 30 seconds. If you have a slow connection, you can modify the timeout in the script.

Missing content

Some X.com pages require login. The script works best with public articles and threads.

License

MIT License - Feel free to use and modify as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
package.json		package.json
quick_start.sh		quick_start.sh
requirements.txt		requirements.txt
x_to_markdown.js		x_to_markdown.js
x_to_markdown.py		x_to_markdown.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

X.com to Markdown Converter

Features

Installation

Option 1: Node.js (Recommended)

Option 2: Python

Usage

Node.js Version

Python Version

Output Format

How It Works

Requirements

Troubleshooting

"playwright: command not found"

Timeout errors

Missing content

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

X.com to Markdown Converter

Features

Installation

Option 1: Node.js (Recommended)

Option 2: Python

Usage

Node.js Version

Python Version

Output Format

How It Works

Requirements

Troubleshooting

"playwright: command not found"

Timeout errors

Missing content

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages