A powerful, production-ready Twitter/X data extraction actor built for the Apify platform.
Extract tweets, profiles, engagement metrics, media, and more β with intelligent scrolling, date filtering, and login support.
Apify Twitter/X Scraper is a specialized web scraping actor that runs on the Apify cloud platform. It uses Playwright for reliable browser automation, navigating Twitter/X pages, auto-scrolling through infinite timelines, and extracting structured data from tweets, profiles, and search results.
Why this scraper? Unlike API-based solutions limited by Twitter's expensive API tiers, this scraper works directly with the Twitter web interface β giving you access to public data without API key restrictions.
| Feature | Description |
|---|---|
| π¦ Twitter-Specific Extraction | Purpose-built selectors for tweets, profiles, and search results |
| π Rich Data Output | Tweets, usernames, display names, timestamps, engagement metrics (likes, retweets, replies, views), and media URLs |
| π Date Range Filtering | Extract tweets from specific time periods with smart startDate / endDate filtering |
| π Infinite Scroll Support | Automatically scrolls through Twitter's infinite timeline with configurable scroll count and delay |
| πΌοΈ Media Tab Navigation | Auto-navigates to profile Media tabs for chronologically sorted media tweets |
| π Login Support | Optional Twitter login for accessing restricted content and avoiding rate limits |
| π Proxy Support | Built-in Apify Proxy integration to distribute requests and bypass rate limiting |
| π§ Memory Optimized | Intelligent DOM cleanup during long scroll sessions to prevent memory leaks |
| β‘ High Performance | Configurable concurrency, aggressive scroll strategies, and smart early-exit logic |
| π‘οΈ Error Resilient | Graceful error handling with fallback selectors and retry mechanisms |
{
"url": "https://twitter.com/elonmusk",
"scrapedAt": "2026-02-16T10:30:00.000Z",
"text": "The future of AI is incredibly exciting...",
"username": "elonmusk",
"displayName": "Elon Musk",
"timestamp": "2026-02-15T18:45:00.000Z",
"tweetUrl": "https://twitter.com/elonmusk/status/1234567890",
"replies": 4200,
"retweets": 15000,
"likes": 120000,
"views": 5000000,
"media": [
{
"type": "image",
"url": "https://pbs.twimg.com/media/example.jpg"
}
],
"pageType": "profile"
}{
"url": "https://twitter.com/elonmusk",
"scrapedAt": "2026-02-16T10:30:00.000Z",
"type": "profile",
"username": "elonmusk",
"bio": "Mars & Cars, Chips & Dips",
"stats": "Following 800 Β· Followers 175M"
}| URL Type | Example | Description |
|---|---|---|
| Profile | https://twitter.com/username |
Scrapes tweets from a user's timeline |
| Media Tab | https://twitter.com/username/media |
Auto-navigated; scrapes media tweets sorted by date |
| Single Tweet | https://twitter.com/user/status/123 |
Extracts data from a specific tweet |
| Search | https://twitter.com/search?q=AI |
Scrapes search result tweets |
| X.com | https://x.com/username |
Full support for the new X.com domain |
| Parameter | Type | Description |
|---|---|---|
startUrls |
array |
Array of Twitter/X URLs to scrape |
| Parameter | Type | Default | Description |
|---|---|---|---|
maxRequestsPerCrawl |
integer |
50 |
Maximum pages to crawl |
maxConcurrency |
integer |
50 |
Concurrent browser pages |
maxTweets |
integer |
0 (unlimited) |
Maximum tweets to extract |
scrollCount |
integer |
10 |
Number of scroll iterations |
scrollDelay |
integer |
2000 |
Delay between scrolls (ms) |
scrapeMediaTab |
boolean |
true |
Auto-navigate to Media tab |
startDate |
string |
β | Filter: tweets from this date (YYYY-MM-DD) |
endDate |
string |
β | Filter: tweets until this date (YYYY-MM-DD) |
twitterUsername |
string |
β | Twitter login username/email |
twitterPassword |
string |
β | Twitter login password |
waitForTimeout |
integer |
2000 |
Wait time for content to load (ms) |
proxyConfiguration |
object |
β | Apify Proxy settings |
{
"startUrls": [{ "url": "https://twitter.com/elonmusk" }],
"maxRequestsPerCrawl": 20,
"maxConcurrency": 1,
"waitForTimeout": 3000
}{
"startUrls": [
{ "url": "https://twitter.com/elonmusk" },
{ "url": "https://twitter.com/OpenAI" },
{ "url": "https://x.com/Google" }
],
"maxRequestsPerCrawl": 100,
"scrollCount": 20
}{
"startUrls": [{ "url": "https://twitter.com/elonmusk" }],
"startDate": "2025-01-01",
"endDate": "2025-12-31",
"maxRequestsPerCrawl": 200
}{
"startUrls": [
{ "url": "https://twitter.com/search?q=artificial%20intelligence" }
],
"maxRequestsPerCrawl": 50,
"maxConcurrency": 1
}flowchart TB
subgraph APIFY["βοΈ Apify Platform"]
subgraph ACTOR["π¦ Twitter Scraper Actor"]
IP["π₯ Input Parser"] --> PC["π Playwright Crawler"] --> ED["π€ Extract Data"]
IP --> DF["π
Date Filter Config"]
PC --> AS["π Auto Scroll Engine"]
ED --> DA["ποΈ Date Filter Apply"]
AS --> DS["πΎ Apify Dataset\n(JSON Output)"]
DA --> DS
end
PP["π Proxy Pool"]
DC["π³ Docker Container"]
CB["π Chromium Browser"]
end
PP & DC & CB -.-> ACTOR
- Input Parsing β Reads Twitter URLs and configuration from Apify input
- Optional Login β Authenticates with Twitter if credentials are provided (handles multi-step login flow)
- Page Navigation β Opens each URL in a Playwright-controlled Chromium browser
- Media Tab Detection β Automatically navigates to the Media tab for profile URLs
- Infinite Scroll β Scrolls through Twitter's timeline with configurable depth and smart stopping logic
- Data Extraction β Parses tweet elements using Twitter's
data-testidselectors with multiple fallbacks - Date Filtering β Applies start/end date filters to extracted tweets
- Deduplication β Tracks tweet URLs to ensure no duplicate entries
- Memory Management β Periodically removes old DOM elements during long scroll sessions
- Output β Saves structured JSON data to Apify Dataset
The scraper features an intelligent date filtering system:
- Unlimited scrolling when date filters are active β scrolls until all tweets in range are found
- Smart exit detection β stops when scrolled past the target date range
- Tolerance for gaps β continues scrolling even if temporary gaps appear in the timeline
- UTC-based comparison β consistent timezone handling across all date operations
Timeline: βββ Older βββββββββββββββ Newer βββ
β β
startDate endDate
ββββ Extracted βββΊβ
# Install Apify CLI
npm install -g apify-cli
# Login to Apify
apify login
# Push actor to Apify
apify pushdocker build -t twitter-scraper .
docker run -e APIFY_INPUT_JSON='{"startUrls":[{"url":"https://twitter.com/elonmusk"}]}' twitter-scraper# Install dependencies
npm install
npx playwright install chromium
# Run with test input
APIFY_INPUT_JSON="$(cat test_input.json)" node main.js- Use
maxConcurrency: 1andwaitForTimeout: 2000-3000for safe scraping - Enable Apify Proxy to distribute requests across IPs
- Start with small
maxRequestsPerCrawlvalues and scale gradually
- Works with public content without login
- Login support available for accessing restricted content
- Store credentials securely using Apify Secrets
- Full support for both
twitter.comandx.comdomains - URLs are automatically normalized regardless of domain
- Runtime: Node.js (ES Modules)
- Browser Automation: Playwright
- Crawler Framework: Crawlee
- Platform: Apify
- Container: Docker (apify/actor-node-playwright-chrome)
ISC
Built with β€οΈ for the data extraction community
β Star this repo if you find it useful!