🐦 Apify Twitter/X Scraper

A powerful, production-ready Twitter/X data extraction actor built for the Apify platform.
Extract tweets, profiles, engagement metrics, media, and more — with intelligent scrolling, date filtering, and login support.

🚀 Overview

Apify Twitter/X Scraper is a specialized web scraping actor that runs on the Apify cloud platform. It uses Playwright for reliable browser automation, navigating Twitter/X pages, auto-scrolling through infinite timelines, and extracting structured data from tweets, profiles, and search results.

Why this scraper? Unlike API-based solutions limited by Twitter's expensive API tiers, this scraper works directly with the Twitter web interface — giving you access to public data without API key restrictions.

✨ Key Features

Feature	Description
🐦 Twitter-Specific Extraction	Purpose-built selectors for tweets, profiles, and search results
📊 Rich Data Output	Tweets, usernames, display names, timestamps, engagement metrics (likes, retweets, replies, views), and media URLs
📅 Date Range Filtering	Extract tweets from specific time periods with smart `startDate` / `endDate` filtering
🔄 Infinite Scroll Support	Automatically scrolls through Twitter's infinite timeline with configurable scroll count and delay
🖼️ Media Tab Navigation	Auto-navigates to profile Media tabs for chronologically sorted media tweets
🔐 Login Support	Optional Twitter login for accessing restricted content and avoiding rate limits
🌐 Proxy Support	Built-in Apify Proxy integration to distribute requests and bypass rate limiting
🧠 Memory Optimized	Intelligent DOM cleanup during long scroll sessions to prevent memory leaks
⚡ High Performance	Configurable concurrency, aggressive scroll strategies, and smart early-exit logic
🛡️ Error Resilient	Graceful error handling with fallback selectors and retry mechanisms

📦 Data You Can Extract

Tweet Data

{
  "url": "https://twitter.com/elonmusk",
  "scrapedAt": "2026-02-16T10:30:00.000Z",
  "text": "The future of AI is incredibly exciting...",
  "username": "elonmusk",
  "displayName": "Elon Musk",
  "timestamp": "2026-02-15T18:45:00.000Z",
  "tweetUrl": "https://twitter.com/elonmusk/status/1234567890",
  "replies": 4200,
  "retweets": 15000,
  "likes": 120000,
  "views": 5000000,
  "media": [
    {
      "type": "image",
      "url": "https://pbs.twimg.com/media/example.jpg"
    }
  ],
  "pageType": "profile"
}

Profile Data

{
  "url": "https://twitter.com/elonmusk",
  "scrapedAt": "2026-02-16T10:30:00.000Z",
  "type": "profile",
  "username": "elonmusk",
  "bio": "Mars & Cars, Chips & Dips",
  "stats": "Following 800 · Followers 175M"
}

🎯 Supported URL Types

URL Type	Example	Description
Profile	`https://twitter.com/username`	Scrapes tweets from a user's timeline
Media Tab	`https://twitter.com/username/media`	Auto-navigated; scrapes media tweets sorted by date
Single Tweet	`https://twitter.com/user/status/123`	Extracts data from a specific tweet
Search	`https://twitter.com/search?q=AI`	Scrapes search result tweets
X.com	`https://x.com/username`	Full support for the new X.com domain

⚙️ Input Configuration

Required Parameters

Parameter	Type	Description
`startUrls`	`array`	Array of Twitter/X URLs to scrape

Optional Parameters

Parameter	Type	Default	Description
`maxRequestsPerCrawl`	`integer`	`50`	Maximum pages to crawl
`maxConcurrency`	`integer`	`50`	Concurrent browser pages
`maxTweets`	`integer`	`0` (unlimited)	Maximum tweets to extract
`scrollCount`	`integer`	`10`	Number of scroll iterations
`scrollDelay`	`integer`	`2000`	Delay between scrolls (ms)
`scrapeMediaTab`	`boolean`	`true`	Auto-navigate to Media tab
`startDate`	`string`	—	Filter: tweets from this date (`YYYY-MM-DD`)
`endDate`	`string`	—	Filter: tweets until this date (`YYYY-MM-DD`)
`twitterUsername`	`string`	—	Twitter login username/email
`twitterPassword`	`string`	—	Twitter login password
`waitForTimeout`	`integer`	`2000`	Wait time for content to load (ms)
`proxyConfiguration`	`object`	—	Apify Proxy settings

📋 Usage Examples

Basic Profile Scrape

{
  "startUrls": [{ "url": "https://twitter.com/elonmusk" }],
  "maxRequestsPerCrawl": 20,
  "maxConcurrency": 1,
  "waitForTimeout": 3000
}

Multi-Profile Scrape

{
  "startUrls": [
    { "url": "https://twitter.com/elonmusk" },
    { "url": "https://twitter.com/OpenAI" },
    { "url": "https://x.com/Google" }
  ],
  "maxRequestsPerCrawl": 100,
  "scrollCount": 20
}

Date-Filtered Extraction

{
  "startUrls": [{ "url": "https://twitter.com/elonmusk" }],
  "startDate": "2025-01-01",
  "endDate": "2025-12-31",
  "maxRequestsPerCrawl": 200
}

Search Results

{
  "startUrls": [
    { "url": "https://twitter.com/search?q=artificial%20intelligence" }
  ],
  "maxRequestsPerCrawl": 50,
  "maxConcurrency": 1
}

🏗️ Architecture

flowchart TB
    subgraph APIFY["☁️ Apify Platform"]
        subgraph ACTOR["🐦 Twitter Scraper Actor"]
            IP["📥 Input Parser"] --> PC["🎭 Playwright Crawler"] --> ED["📤 Extract Data"]
            IP --> DF["📅 Date Filter Config"]
            PC --> AS["🔄 Auto Scroll Engine"]
            ED --> DA["🗂️ Date Filter Apply"]
            AS --> DS["💾 Apify Dataset\n(JSON Output)"]
            DA --> DS
        end
        PP["🌐 Proxy Pool"]
        DC["🐳 Docker Container"]
        CB["🌍 Chromium Browser"]
    end

    PP & DC & CB -.-> ACTOR

🔧 How It Works

Input Parsing — Reads Twitter URLs and configuration from Apify input
Optional Login — Authenticates with Twitter if credentials are provided (handles multi-step login flow)
Page Navigation — Opens each URL in a Playwright-controlled Chromium browser
Media Tab Detection — Automatically navigates to the Media tab for profile URLs
Infinite Scroll — Scrolls through Twitter's timeline with configurable depth and smart stopping logic
Data Extraction — Parses tweet elements using Twitter's data-testid selectors with multiple fallbacks
Date Filtering — Applies start/end date filters to extracted tweets
Deduplication — Tracks tweet URLs to ensure no duplicate entries
Memory Management — Periodically removes old DOM elements during long scroll sessions
Output — Saves structured JSON data to Apify Dataset

📅 Smart Date Filtering

The scraper features an intelligent date filtering system:

Unlimited scrolling when date filters are active — scrolls until all tweets in range are found
Smart exit detection — stops when scrolled past the target date range
Tolerance for gaps — continues scrolling even if temporary gaps appear in the timeline
UTC-based comparison — consistent timezone handling across all date operations

Timeline: ←── Older ─────────────── Newer ──→
                    │                    │
              startDate              endDate
                    │◄── Extracted ──►│

🚀 Deployment

Deploy to Apify

# Install Apify CLI
npm install -g apify-cli

# Login to Apify
apify login

# Push actor to Apify
apify push

Run with Docker

docker build -t twitter-scraper .
docker run -e APIFY_INPUT_JSON='{"startUrls":[{"url":"https://twitter.com/elonmusk"}]}' twitter-scraper

Local Development

# Install dependencies
npm install
npx playwright install chromium

# Run with test input
APIFY_INPUT_JSON="$(cat test_input.json)" node main.js

⚠️ Important Notes

Rate Limiting

Use maxConcurrency: 1 and waitForTimeout: 2000-3000 for safe scraping
Enable Apify Proxy to distribute requests across IPs
Start with small maxRequestsPerCrawl values and scale gradually

Authentication

Works with public content without login
Login support available for accessing restricted content
Store credentials securely using Apify Secrets

Domain Support

Full support for both twitter.com and x.com domains
URLs are automatically normalized regardless of domain

🛠️ Tech Stack

Runtime: Node.js (ES Modules)
Browser Automation: Playwright
Crawler Framework: Crawlee
Platform: Apify
Container: Docker (apify/actor-node-playwright-chrome)

📄 License

ISC

Built with ❤️ for the data extraction community
⭐ Star this repo if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐦 Apify Twitter/X Scraper

🚀 Overview

✨ Key Features

📦 Data You Can Extract

Tweet Data

Profile Data

🎯 Supported URL Types

⚙️ Input Configuration

Required Parameters

Optional Parameters

📋 Usage Examples

Basic Profile Scrape

Multi-Profile Scrape

Date-Filtered Extraction

Search Results

🏗️ Architecture

🔧 How It Works

📅 Smart Date Filtering

🚀 Deployment

Deploy to Apify

Run with Docker

Local Development

⚠️ Important Notes

Rate Limiting

Authentication

Domain Support

🛠️ Tech Stack

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🐦 Apify Twitter/X Scraper

🚀 Overview

✨ Key Features

📦 Data You Can Extract

Tweet Data

Profile Data

🎯 Supported URL Types

⚙️ Input Configuration

Required Parameters

Optional Parameters

📋 Usage Examples

Basic Profile Scrape

Multi-Profile Scrape

Date-Filtered Extraction

Search Results

🏗️ Architecture

🔧 How It Works

📅 Smart Date Filtering

🚀 Deployment

Deploy to Apify

Run with Docker

Local Development

⚠️ Important Notes

Rate Limiting

Authentication

Domain Support

🛠️ Tech Stack

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages