A powerful data pipeline for scraping, analyzing, and visualizing gaming market trends using SteamSpy data.
- Overview
- Features
- Project Structure
- Prerequisites
- Installation
- Usage
- Configuration
- Data Output
- Troubleshooting
- Roadmap
- Contributing
This project provides an automated pipeline for collecting and analyzing gaming market data from SteamSpy. Built with modern async Python and interactive Marimo notebooks, it enables market research, trend analysis, and data-driven decision making for game developers and industry analysts.
Current Capabilities:
- 🚀 High-performance async scraping with rate limiting
- 📊 Interactive data visualization with Marimo notebooks
- 💾 Automatic daily data organization
- 🔄 Resumable scraping with progress tracking
- 📈 Genre and tag frequency analysis
- Async Architecture: Efficient concurrent data fetching with
aiohttp - Rate Limiting: Respectful API usage with configurable delays
- Progress Tracking: Resume interrupted scrapes without data loss
- Interactive Notebooks: Marimo-powered reactive analysis
- Clean Architecture: Modular design with base scraper class for extensibility
- Error Handling: Comprehensive logging and error recovery
- Daily Organization: Automatic date-based data storage
steamscraping/
├── 📂 Data/ # Generated during scraping
│ └── 📂 YYYY-MM-DD/ # Daily data folders
│ ├── steamspy_data.jsonl # Main dataset (JSON Lines)
│ ├── scraped_appids.txt # Progress tracking
│ ├── metadata.json # Scrape session info
│ └── steamspy_errors.log # Error logs
├── 📂 src/ # Source code
│ ├── BaseScraper.py # Abstract base scraper
│ ├── FileSystem.py # File I/O operations
│ └── SteamSpyScraper.py # SteamSpy implementation
├── 📂 .vscode/ # VS Code configuration
│ ├── launch.json # Debug configurations
│ ├── settings.json # Editor settings
│ └── tasks.json # Build tasks
├── main.py # Scraper notebook
├── visualization.py # Analysis notebook
├── pyproject.toml # Project dependencies
├── .python-version # Python version (3.13)
└── README.md # You are here!
- Operating System: Windows, macOS, or Linux
- Python: 3.13+ (automatically managed by UV)
- Internet: Stable connection for API requests
- Disk Space: ~100MB per 10,000 apps scraped
UV is a fast Python package installer and resolver. Choose your platform:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"curl -LsSf https://astral.sh/uv/install.sh | shuv --version💡 Tip: Restart your terminal after installation to ensure UV is in your PATH.
# Clone the repository
git clone https://github.com/Icewav3/SteamScraping
cd steamscrapingOr download and extract the ZIP file, then navigate to the folder:
cd path/to/steamscrapingUV will automatically create a virtual environment and install all dependencies:
# Install all dependencies
uv syncThis command:
- ✅ Creates a
.venvfolder with Python 3.13 - ✅ Installs
marimo,aiohttp, andtqdm - ✅ Sets up the project for immediate use
Launch the interactive Marimo scraper notebook:
uv run marimo edit main.pyThis opens an interactive notebook in your browser where you can:
- Configure scraping parameters (pages, delays)
- Start/stop the scraper
- Monitor real-time progress
- View scraping statistics
Command Line Alternative (headless mode):
uv run marimo run main.pyAnalyze collected data with the visualization notebook:
uv run marimo edit visualization.pyFeatures:
- 📊 Genre distribution analysis
- 🏷️ Tag frequency charts
- 📈 Market trend visualization
- 🎨 Interactive Seaborn plots
- Open the project in VS Code
- Install the Marimo extension
- Press
F5to launch with debugger attached
# Run with auto-reload on file changes
uv run marimo edit main.py --watch --port 8888
# Run in sandbox mode (isolated execution)
uv run marimo edit main.py --sandboxEdit these in main.py or pass to SteamSpyScraper():
| Parameter | Default | Description |
|---|---|---|
pages |
10 |
Number of pages to scrape (~1000 apps each) |
page_delay |
15.0 |
Seconds between page requests |
app_delay |
0.1 |
Seconds between app detail requests |
suppress_output |
False |
Hide console output |
Example:
async with SteamSpyScraper(
fs,
pages=20, # Scrape 20 pages (~20,000 apps)
page_delay=10, # Wait 10s between pages
app_delay=0.2 # Wait 0.2s between apps
) as scraper:
await scraper.scrape()JSON Lines format - one game per line:
{"appid": 730, "name": "Counter-Strike 2", "developer": "Valve", "owners": "100,000,000 .. 200,000,000", ...}
{"appid": 570, "name": "Dota 2", "developer": "Valve", "owners": "50,000,000 .. 100,000,000", ...}Scrape session information:
{
"start_time": "2024-12-12T10:30:00",
"end_time": 1702384500.123,
"pages_scraped": 10,
"apps_scraped": 8547
}List of completed app IDs for resume functionality:
730
570
440
...
- Solution: Restart your terminal or add UV to PATH manually
- Windows:
%USERPROFILE%\.cargo\bin - Unix:
~/.cargo/bin
- Solution: Increase
page_delayandapp_delayvalues - SteamSpy allows ~1 request per second
- Known Issue: Marimo progress bars may not render in all browsers
- Workaround: Check console output or use
suppress_output=False
- Automatically created on first run
- If deleted, will be recreated
# Use a different port
uv run marimo edit main.py --port 8889- Complete Marimo migration for native progress bar support
- Implement data integrity checking script
- Create time-series comparative analysis notebook
- Remove unused dependencies
- Add multiple data source support (IGDB, Steam Store API)
- Implement async multi-scraper coordination
- Design data merging strategy for multi-source accuracy
- Enhanced UI with scraper control buttons
- Advanced genre/tag correlation analysis
- Export to CSV/Excel formats
- Cloud deployment for 24/7 scraping (free tier)
- Automated backup system (GitHub/cloud storage)
- Machine learning for market trend prediction
- Real-time dashboard with live updates
Contributions are welcome! Here's how to get started:
- Fork the repository
- Create a feature branch
git checkout -b feature/amazing-feature
- Make your changes
- Test thoroughly
uv run marimo edit main.py
- Commit with clear messages
git commit -m "Add amazing feature" - Push and create a Pull Request
git push origin feature/amazing-feature
- Follow PEP 8 guidelines
- Use type hints where possible
- Document all public functions
- Keep modules focused and modular
This project is licensed under the MIT License - see the LICENSE file for details.
Having issues? Here's how to get help:
- Check Troubleshooting section
- Search existing issues on GitHub
- Create a new issue with:
- Error message
- Steps to reproduce
- System info (
uv --version, OS)
⭐ Star this repo if you find it helpful!
Made with ❤️ and ☕ for the gaming community