Degen Scraper

Pipeline for generating AI character files and training datasets by scraping public figures' online presence across Twitter and blogs.

⚠️ IMPORTANT: Create a new Twitter account for this tool. DO NOT use your main account as it may trigger Twitter's automation detection and result in account restrictions.

Setup

Install dependencies:
```
npm install
```

Copy the .env.example into a .env file:

# (Required) Twitter Authentication
TWITTER_USERNAME=     # your twitter username
TWITTER_PASSWORD=     # your twitter password
TWITTER_EMAIL=        # your twitter email

# RapidAPI Configuration
RAPIDAPI_URL=
RAPIDAPI_KEY=

# Google Generative AI API Key. Required for summarizing tweets.
GOOGLE_GENERATIVE_AI_API_KEY=

# (Optional) Blog Configuration
BLOG_URLS_FILE=      # path to file containing blog URLs

# (Optional) Scraping Configuration
MAX_TWEETS=          # max tweets to scrape
MAX_RETRIES=         # max retries for scraping
RETRY_DELAY=         # delay between retries
MIN_DELAY=           # minimum delay between requests
MAX_DELAY=           # maximum delay between requests

Update

Add Rapid API to get more data.

Get full text tweet:

const twitterCrawlAPI = new TwitterCrawlAPI();
twitterCrawlAPI.getFullTextTweet();

Use puppeteer to get full text tweet with tweet before Sep 29, 2022:

twitterCrawlAPI.fallbackGetFullTextTweet();

Get message examples:

this.messageExamplesCrawler = new MessageExamplesCrawler();
messageExamplesCrawler.addExample();

Using Google Generative AI to summarize tweets

// Extract knowledge with longer tweets
const knowledgeGenerator = new KnowledgeGenerator();
await knowledgeGenerator.addKnowledge(uniqueTweets);
characterData.knowledge = knowledgeGenerator.getKnowledge();

Usage

Run as Server

npm run start

Add express Server

APIs:

GET /api/characters/:username - get character data by username
POST /api/characters - scrape tweets and blogs by username

{
  "username": "pmarca", // twitter username
  "is_crawl": true // scrape tweets
}

Collect Tweets and Blogs by using CLI

Twitter Collection

npm run twitter -- username

Example: npm run twitter -- pmarca

Blog Collection

npm run blog

Generate Character

npm run character -- username

Example: npm run character -- pmarca

Finetune

npm run finetune

Finetune (with test)

npm run finetune:test

Generate Virtuals Character Card

https://whitepaper.virtuals.io/developer-documents/agent-contribution/contribute-to-cognitive-core#character-card-and-goal-samples

Run this after Twitter Collection step

npm run generate-virtuals -- username date

Example: npm run generate-virtuals -- pmarca 2024-11-29 Example without date: npm run generate-virtuals -- pmarca

The generated character file will be in the characters/[username].json directory. Edit clients and modelProvider fields to match your needs.

The generated tweet dataset file will be in pipeline/[username]/[date]/raw/tweets.json.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github/workflows		.github/workflows
ci		ci
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Degen Scraper

Setup

Update

Using Google Generative AI to summarize tweets

Usage

Run as Server

APIs:

Collect Tweets and Blogs by using CLI

Twitter Collection

Blog Collection

Generate Character

Finetune

Finetune (with test)

Generate Virtuals Character Card

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Degen Scraper

Setup

Update

Using Google Generative AI to summarize tweets

Usage

Run as Server

APIs:

Collect Tweets and Blogs by using CLI

Twitter Collection

Blog Collection

Generate Character

Finetune

Finetune (with test)

Generate Virtuals Character Card

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages