This project collects public post data from the Bluesky platform, giving you a straightforward way to analyze conversations, trends, and user activity. It focuses on clean, structured extraction so you can plug the results into any workflow with minimal effort. If you need reliable Bluesky post scraping for research, monitoring, or analytics, this tool has you covered.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for Bluesky Posts Scraper you've just found your team — Let's Chat. 👆👆
The scraper collects detailed information from public posts and returns it in an organized dataset. It solves the challenge of manually gathering Bluesky content at scale and is ideal for analysts, developers, and teams tracking platform activity.
- Helps you analyze user interactions and engagement patterns.
- Supports monitoring specific topics, hashtags, or users.
- Offers structured, machine-friendly results for automation.
- Reduces time spent manually searching and collecting data.
- Works well for both one-off pulls and recurring research tasks.
| Feature | Description |
|---|---|
| Query-based scraping | Pull posts using specific search terms, hashtags, or user queries. |
| Time-range filtering | Limit results using since and until parameters for historical research. |
| Language targeting | Extract only posts written in your preferred languages. |
| Engagement metrics | Fetch likes, replies, and repost counts for analysis. |
| Media extraction | Capture thumbnails, full images, and attachments. |
| Flexible sorting | Choose between latest or top-ranked posts. |
| Field Name | Field Description |
|---|---|
| id | Unique identifier for each post. |
| authorId | User ID of the post creator. |
| authorName | Display name of the author. |
| authorUsername | Username/handle associated with the user. |
| authorAvatar | URL to the author’s avatar image. |
| text | The full text of the post. |
| images | Array of image objects with URLs and metadata. |
| primaryImage | Full-size primary image when available. |
| link | Any attached external link. |
| createdAt | Timestamp of when the post was published. |
| langs | Languages detected in the post. |
| replyCount | Number of replies. |
| repostCount | Number of reposts. |
| likeCount | Number of likes. |
| url | Direct link to the post on Bluesky. |
[
{
"id": "bafyreibqjfx2ejvxkd3okjtodoyqvoyk7wuberwonsakdjv4yahp2lrn4a",
"authorId": "did:plc:pc2aiklrpzwgsiq3fuohbui4",
"authorName": "Keri Warbis",
"authorUsername": "keriwarbis.bsky.social",
"authorAvatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:pc2aiklrpzwgsiq3fuohbui4/bafkreihgejbtckxrsgrba7ckx6mlsofe6nzvs4t2m54y2in6edcp65tlne@jpeg",
"text": "Bit sunburnt from yesterday’s stint in the garden.\n\nBit hungover from Eurovision.\n\nAnother day of sun and entertaining ahead.\n\nSunday roast at The Grand will be happening to round of the day.",
"images": [
{
"thumb": "https://cdn.bsky.app/img/feed_thumbnail/plain/did:plc:pc2aiklrpzwgsiq3fuohbui4/bafkreiejc6jc4z47urksbwn7owoyyi4o46ufi362e52bscl4bsjrj4izyq@jpeg",
"fullsize": "https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:pc2aiklrpzwgsiq3fuohbui4/bafkreiejc6jc4z47urksbwn7owoyyi4o46ufi362e52bscl4bsjrj4izyq@jpeg",
"alt": "",
"aspectRatio": {
"height": 820,
"width": 828
}
}
],
"link": null,
"primaryImage": "https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:pc2aiklrpzwgsiq3fuohbui4/bafkreiejc6jc4z47urksbwn7owoyyi4o46ufi362e52bscl4bsjrj4izyq@jpeg",
"createdAt": "2024-05-12T08:36:29.345Z",
"langs": ["en"],
"replyCount": 0,
"repostCount": 0,
"likeCount": 0,
"url": "https://bsky.app/profile/keriwarbis.bsky.social/post/3kxkwdhu77o23"
}
]
Bluesky Posts Scraper/
├── src/
│ ├── runner.js
│ ├── extractors/
│ │ ├── bluesky_parser.js
│ │ └── utils_time.js
│ ├── outputs/
│ │ └── exporters.js
│ └── config/
│ └── settings.example.json
├── data/
│ ├── queries.sample.txt
│ └── sample-output.json
├── package.json
└── README.md
- Researchers track topic trends to understand how discussions evolve over time, helping them identify emerging patterns.
- Marketing teams follow influencers or brand mentions to refine campaigns and messaging.
- Analysts monitor user engagement on specific themes so they can assess audience reactions.
- Developers integrate Bluesky post data into dashboards to power real-time insights.
- Journalists keep an eye on public conversations to support data-driven reporting.
How do I limit the number of posts returned?
Use the limit parameter to cap the number of posts per query.
Can I filter posts by language?
Yes, setting the language option restricts results to posts in specific languages.
Does it support date-range filtering?
You can use since and until fields to control the time window for extraction.
What formats can I export the data to? Results can be converted into JSON, CSV, or any format supported by your processing pipeline.
Primary Metric: Consistently processes around several hundred posts per minute depending on query complexity and network conditions.
Reliability Metric: Maintains a high completion rate with stable extraction across varying content types.
Efficiency Metric: Handles large batches with minimal overhead, allowing scalable automation.
Quality Metric: Extracted data reaches a strong completeness level with detailed metadata, accurate timestamps, and well-structured media fields.
