🎮 Game Industry Culumative Temporal Dataset & Market Analysis

A powerful data pipeline for scraping, analyzing, and visualizing gaming market trends using SteamSpy data.

📑 Table of Contents

Overview
Features
Project Structure
Prerequisites
Installation
Usage
Configuration
Data Output
Troubleshooting
Roadmap
Contributing

🎯 Overview

This project provides an automated pipeline for collecting and analyzing gaming market data from SteamSpy. Built with modern async Python and interactive Marimo notebooks, it enables market research, trend analysis, and data-driven decision making for game developers and industry analysts.

Current Capabilities:

🚀 High-performance async scraping with rate limiting
📊 Interactive data visualization with Marimo notebooks
💾 Automatic daily data organization
🔄 Resumable scraping with progress tracking
📈 Genre and tag frequency analysis

✨ Features

Async Architecture: Efficient concurrent data fetching with aiohttp
Rate Limiting: Respectful API usage with configurable delays
Progress Tracking: Resume interrupted scrapes without data loss
Interactive Notebooks: Marimo-powered reactive analysis
Clean Architecture: Modular design with base scraper class for extensibility
Error Handling: Comprehensive logging and error recovery
Daily Organization: Automatic date-based data storage

📁 Project Structure

steamscraping/
├── 📂 Data/                      # Generated during scraping
│   └── 📂 YYYY-MM-DD/            # Daily data folders
│       ├── steamspy_data.jsonl   # Main dataset (JSON Lines)
│       ├── scraped_appids.txt    # Progress tracking
│       ├── metadata.json         # Scrape session info
│       └── steamspy_errors.log   # Error logs
├── 📂 src/                       # Source code
│   ├── BaseScraper.py            # Abstract base scraper
│   ├── FileSystem.py             # File I/O operations
│   └── SteamSpyScraper.py        # SteamSpy implementation
├── 📂 .vscode/                   # VS Code configuration
│   ├── launch.json               # Debug configurations
│   ├── settings.json             # Editor settings
│   └── tasks.json                # Build tasks
├── main.py                       # Scraper notebook
├── visualization.py              # Analysis notebook
├── pyproject.toml                # Project dependencies
├── .python-version               # Python version (3.13)
└── README.md                     # You are here!

📋 Prerequisites

Operating System: Windows, macOS, or Linux
Python: 3.13+ (automatically managed by UV)
Internet: Stable connection for API requests
Disk Space: ~100MB per 10,000 apps scraped

🚀 Installation

Step 1: Install UV

UV is a fast Python package installer and resolver. Choose your platform:

Windows (PowerShell)

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

macOS/Linux

curl -LsSf https://astral.sh/uv/install.sh | sh

Verify Installation

uv --version

💡 Tip: Restart your terminal after installation to ensure UV is in your PATH.

Step 2: Clone the Repository

# Clone the repository
git clone https://github.com/Icewav3/SteamScraping
cd steamscraping

Or download and extract the ZIP file, then navigate to the folder:

cd path/to/steamscraping

Step 3: Setup Environment

UV will automatically create a virtual environment and install all dependencies:

# Install all dependencies
uv sync

This command:

✅ Creates a .venv folder with Python 3.13
✅ Installs marimo, aiohttp, and tqdm
✅ Sets up the project for immediate use

💻 Usage

Running the Scraper

Launch the interactive Marimo scraper notebook:

uv run marimo edit main.py

This opens an interactive notebook in your browser where you can:

Configure scraping parameters (pages, delays)
Start/stop the scraper
Monitor real-time progress
View scraping statistics

Command Line Alternative (headless mode):

uv run marimo run main.py

Running Visualizations

Analyze collected data with the visualization notebook:

uv run marimo edit visualization.py

Features:

📊 Genre distribution analysis
🏷️ Tag frequency charts
📈 Market trend visualization
🎨 Interactive Seaborn plots

Development Mode

Using VS Code

Open the project in VS Code
Install the Marimo extension
Press F5 to launch with debugger attached

Manual Development Server

# Run with auto-reload on file changes
uv run marimo edit main.py --watch --port 8888

# Run in sandbox mode (isolated execution)
uv run marimo edit main.py --sandbox

⚙️ Configuration

Scraper Parameters

Edit these in main.py or pass to SteamSpyScraper():

Parameter	Default	Description
`pages`	`10`	Number of pages to scrape (~1000 apps each)
`page_delay`	`15.0`	Seconds between page requests
`app_delay`	`0.1`	Seconds between app detail requests
`suppress_output`	`False`	Hide console output

Example:

async with SteamSpyScraper(
    fs, 
    pages=20,           # Scrape 20 pages (~20,000 apps)
    page_delay=10,      # Wait 10s between pages
    app_delay=0.2       # Wait 0.2s between apps
) as scraper:
    await scraper.scrape()

📦 Data Output

File Formats

`steamspy_data.jsonl`

JSON Lines format - one game per line:

{"appid": 730, "name": "Counter-Strike 2", "developer": "Valve", "owners": "100,000,000 .. 200,000,000", ...}
{"appid": 570, "name": "Dota 2", "developer": "Valve", "owners": "50,000,000 .. 100,000,000", ...}

`metadata.json`

Scrape session information:

{
  "start_time": "2024-12-12T10:30:00",
  "end_time": 1702384500.123,
  "pages_scraped": 10,
  "apps_scraped": 8547
}

`scraped_appids.txt`

List of completed app IDs for resume functionality:

🔧 Troubleshooting

Common Issues

`uv: command not found`

Solution: Restart your terminal or add UV to PATH manually
Windows: %USERPROFILE%\.cargo\bin
Unix: ~/.cargo/bin

Rate Limit Errors (HTTP 429)

Solution: Increase page_delay and app_delay values
SteamSpy allows ~1 request per second

Progress Bar Not Showing in Browser

Known Issue: Marimo progress bars may not render in all browsers
Workaround: Check console output or use suppress_output=False

Missing Data Directory

Automatically created on first run
If deleted, will be recreated

Port Already in Use

# Use a different port
uv run marimo edit main.py --port 8889

🗺️ Roadmap

🔥 High Priority

Complete Marimo migration for native progress bar support
Implement data integrity checking script
Create time-series comparative analysis notebook
Remove unused dependencies

🎯 Medium Priority

Add multiple data source support (IGDB, Steam Store API)
Implement async multi-scraper coordination
Design data merging strategy for multi-source accuracy

💡 Low Priority

Enhanced UI with scraper control buttons
Advanced genre/tag correlation analysis
Export to CSV/Excel formats

🚀 Long Term

Cloud deployment for 24/7 scraping (free tier)
Automated backup system (GitHub/cloud storage)
Machine learning for market trend prediction
Real-time dashboard with live updates

🤝 Contributing

Contributions are welcome! Here's how to get started:

Fork the repository
Create a feature branch
```
git checkout -b feature/amazing-feature
```
Make your changes
Test thoroughly
```
uv run marimo edit main.py
```
Commit with clear messages
```
git commit -m "Add amazing feature"
```
Push and create a Pull Request
```
git push origin feature/amazing-feature
```

Code Style

Follow PEP 8 guidelines
Use type hints where possible
Document all public functions
Keep modules focused and modular

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

SteamSpy - API provider
Marimo - Reactive notebook framework
UV - Fast Python package manager

📞 Support

Having issues? Here's how to get help:

Check Troubleshooting section
Search existing issues on GitHub
Create a new issue with:
- Error message
- Steps to reproduce
- System info (uv --version, OS)

⭐ Star this repo if you find it helpful!

Made with ❤️ and ☕ for the gaming community

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
LegacyScraper.ipynb		LegacyScraper.ipynb
README.md		README.md
main.ipynb		main.ipynb
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock
visualization.ipynb		visualization.ipynb
visualization.py		visualization.py

License

Icewav3/SteamScraping

Folders and files

Latest commit

History

Repository files navigation