🇵🇪 GitHub Peru Analytics: Developer Ecosystem Dashboard

A data analytics platform that extracts, processes, and visualizes information about the Peruvian developer ecosystem using the GitHub API, GPT-4o-mini classification, and an interactive Streamlit dashboard.

🚀 Easter Egg

Before starting, run this in Python:

import antigravity

📊 Key Findings

JavaScript dominates: With 161 repositories, JavaScript is by far the most popular language in Peru's developer ecosystem, followed by Python (87) and CSS (60).
Information & Communication rules: 67.3% of all repositories (673 out of 1,000) fall under the Information & Communication industry (CIIU code J), reflecting Peru's strong software development culture.
Education is second: 11.5% of repos (115) are classified under Education (P), showing significant activity in EdTech and learning platforms.
Top developer is devaige: With an impact score of 9,169 (5,513 stars + 1,178 followers), devaige leads the Peruvian GitHub ecosystem, primarily through Android UI libraries.
Most starred repo is financial: dcajasn/Riskfolio-Lib — a Portfolio Optimization library in Python — leads with 3,804 stars, showing strong quantitative finance activity from Peru.

🗂️ Data Collection

Metric	Value
Total developers	921
Total repositories	1,000
Total stars	18,317
Total forks	3,665
Data collected	March 2026
Search locations	Peru, Lima, Arequipa, Trujillo, Cusco
Rate limiting strategy	Exponential backoff with `tenacity`

✨ Features

Overview Dashboard — Key ecosystem stats, top 10 developers by impact score, industry distribution, top repositories
Developer Explorer — Searchable/filterable table with all metrics, CSV export
Repository Browser — Filter by industry, language, stars; view classification confidence and reasoning
Industry Analysis — CIIU distribution charts, top repos per industry, developer specialization
Language Analytics — Language distribution, top developers per language, Language × Industry heatmap

Screenshots

Page	Screenshot
Overview
Developers
Repositories
Industries
Languages

⚙️ Installation

Prerequisites

Python 3.10+
PostgreSQL running locally
GitHub Personal Access Token
OpenAI API Key

Steps

# 1. Clone the repository
git clone https://github.com/YOUR_USERNAME/github-peru-analytics.git
cd github-peru-analytics

# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment variables
cp .env.example .env
# Edit .env and fill in your tokens and DATABASE_URL

# 5. Run setup (validates env + creates DB tables)
python setup_project.py

GitHub Token Setup

Go to GitHub → Settings → Developer Settings → Personal Access Tokens → Tokens (classic)
Click "Generate new token (classic)"
Scopes: public_repo, read:user
Copy the token into your .env file as GITHUB_TOKEN

OpenAI Key Setup

Go to platform.openai.com/api-keys
Create a new key and add it to .env as OPENAI_API_KEY

🏃 Usage

# Step 1: Extract data from GitHub (1000+ repos)
python scripts/extract_data.py

# Step 2: Classify repos into CIIU industries using GPT-4o-mini
python scripts/classify_repos.py

# Step 3: Calculate user and ecosystem metrics
python scripts/calculate_metrics.py

# Step 4: Launch the dashboard
streamlit run app/main.py

# Optional: Run the AI Classification Agent demo
python scripts/run_agent.py

📐 Metrics Documentation

User-Level Metrics

Metric	Formula	Description
`total_repos`	COUNT(repos)	Number of owned public repos
`total_stars_received`	SUM(stars)	Total stars across all repos
`total_forks_received`	SUM(forks)	Total forks across all repos
`avg_stars_per_repo`	stars / repos	Average popularity per repo
`account_age_days`	today − created_at	Days since account creation
`repos_per_year`	repos / (age / 365)	Repository creation rate
`follower_ratio`	followers / following	Influence ratio
`h_index`	h repos with ≥ h stars	GitHub h-index
`impact_score`	stars + forks×2 + followers	Composite influence score
`language_diversity`	COUNT(unique languages)	Technical breadth
`has_readme_pct`	repos_with_readme / total	Documentation quality
`has_license_pct`	repos_with_license / total	Professionalism indicator
`is_active`	last_push < 90 days	Active status

Ecosystem Metrics

Metric	Value	Description
`total_developers`	921	Unique Peruvian developers
`total_repositories`	1,000	Total repos collected
`total_stars`	18,317	Sum of all stars
`total_forks`	3,665	Sum of all forks
`avg_repos_per_user`	1.09	Average repos per developer
`avg_account_age_days`	2,896	~7.9 years average tenure
`active_developer_pct`	1.41%	Active in last 90 days
`top_language`	JavaScript (161)	Most used language
`top_industry`	J — Information & Communication	Dominant industry

🤖 AI Agent Documentation

Classification Agent (Option B)

The agent autonomously classifies repositories into 21 CIIU industry categories using a multi-step reasoning process.

Architecture:

Repository info → Agent decides if more context needed
                      ↓
              [get_readme tool]    ← if description is vague
              [get_languages tool] ← if tech stack unclear
                      ↓
              classify_industry tool → Final result

Tools available:

Tool	Description
`get_readme(owner, repo)`	Fetches README content (up to 3,000 chars)
`get_languages(owner, repo)`	Gets language breakdown in bytes
`classify_industry(...)`	Submits final classification with reasoning

Requirements met:

✅ Autonomy — makes decisions without human intervention
✅ Tool use — uses at least 2 different tools
✅ Reasoning — explains every classification decision
✅ Error handling — fallback to J on failures
✅ Logging — full log in logs/agent_classification.log

Example agent run:

🤖 Agent starting: dcajasn/Riskfolio-Lib
  → Tool call: get_readme({'owner': 'dcajasn', 'repo': 'Riskfolio-Lib'})
  → Tool call: classify_industry({'industry_code': 'K', 'confidence': 'high',
      'reasoning': 'Portfolio optimization library for quantitative finance...'})
  ✅ Classified as K (Financial & Insurance) [high]

Full agent run log: data/metrics/agent_run_log.json

⚠️ Limitations

Location bias: GitHub users without a location set are excluded, which likely undercounts the real number of Peruvian developers significantly.
Star bias: The top-1,000-by-stars strategy overrepresents popular or older projects and may miss newer or less-starred talent.
Classification accuracy: Generic repositories (utilities, hello-world, course homework) are defaulted to J (Information & Communication), which inflates that category to 67.3%.
Low active developer rate (1.41%): This is likely caused by the star-based collection strategy, which captures older repos that are no longer maintained rather than currently active projects.
Language detection: GitHub's primary language only shows the dominant language, missing truly polyglot repositories.

📎 Video

Demo Video Link

👤 Author

[Santiago Miguel Maldonado Vizcarra] Course: Prompt Engineering Institution: [Pontificia Universidad Católica del Perú] Date: March 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🇵🇪 GitHub Peru Analytics: Developer Ecosystem Dashboard

🚀 Easter Egg

📊 Key Findings

🗂️ Data Collection

✨ Features

Screenshots

⚙️ Installation

Prerequisites

Steps

GitHub Token Setup

OpenAI Key Setup

🏃 Usage

📐 Metrics Documentation

User-Level Metrics

Ecosystem Metrics

🤖 AI Agent Documentation

Classification Agent (Option B)

⚠️ Limitations

📎 Video

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
data		data
demo		demo
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
desktop.ini		desktop.ini
requirements.txt		requirements.txt
setup_project.py		setup_project.py

Folders and files

Latest commit

History

Repository files navigation

🇵🇪 GitHub Peru Analytics: Developer Ecosystem Dashboard

🚀 Easter Egg

📊 Key Findings

🗂️ Data Collection

✨ Features

Screenshots

⚙️ Installation

Prerequisites

Steps

GitHub Token Setup

OpenAI Key Setup

🏃 Usage

📐 Metrics Documentation

User-Level Metrics

Ecosystem Metrics

🤖 AI Agent Documentation

Classification Agent (Option B)

⚠️ Limitations

📎 Video

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages