Skip to content

A comprehensive web scraping and data analytics platform that analyzes job market trends for technology professionals.

Notifications You must be signed in to change notification settings

bigdata5911/Jobsite-Scraper-and-Analyzer

Repository files navigation

Job Market Analytics Platform

Scrapy Selenium Flask Pandas Matplotlib SQLAlchemy Redis MySQL

A comprehensive web scraping and data analytics platform that analyzes job market trends for technology professionals. This project scrapes job listings from the.protocol to identify the most in-demand technologies for Python developers in Poland, providing actionable insights for career development and skill prioritization.

🎯 Project Overview

This platform combines web scraping, data analysis, and visualization to deliver market intelligence for job seekers and professionals looking to understand current technology trends. By analyzing job postings, it helps users make informed decisions about which skills to develop for maximum career impact.

πŸš€ Key Features

πŸ” Advanced Web Scraping

  • Multi-technology scraping using Scrapy and Selenium for dynamic content
  • Custom pipelines and middlewares for robust data extraction
  • CSS and XPath selectors for precise data targeting
  • Configurable position targeting (Python, Java, JavaScript, etc.)

πŸ“Š Data Analytics & Visualization

  • Real-time data analysis using Pandas and Jupyter
  • Interactive visualizations with Matplotlib
  • Market trend identification and skill demand analysis
  • Custom analytics pipeline for job market insights

πŸ—οΈ Scalable Architecture

  • Microservices architecture with Docker containerization
  • Asynchronous task processing with Celery and Redis
  • RESTful API with Flask backend
  • Nginx reverse proxy for production deployment
  • MySQL database for metadata management

βš™οΈ DevOps & Infrastructure

  • Docker Compose for easy deployment
  • Automated scraping schedules with Celery Beat
  • Environment-based configuration management
  • Production-ready setup with proper logging and monitoring

πŸ› οΈ Technology Stack

Category Technology Version
Web Scraping Scrapy, Selenium 2.11.2, 4.27.1
Backend Flask 3.0.3
Data Processing Pandas, NLTK 2.2.2
Visualization Matplotlib 3.8.4
Database MySQL, SQLAlchemy 9.0.0, 2.0.31
Task Queue Celery, Redis 5.2.1
Containerization Docker, Docker Compose Latest
Web Server Nginx Latest

πŸ“ˆ Analytics Capabilities

The platform provides comprehensive analytics including:

  • Skill Demand Analysis: Most requested technologies by experience level
  • Market Trends: Employment types and contract preferences
  • Geographic Insights: Job distribution across locations
  • Technology Stack Analysis: Required vs. optional skills breakdown
  • Career Path Guidance: Experience level requirements and progression

πŸš€ Quick Start

Prerequisites

  • Docker and Docker Compose installed
  • Git for version control

Installation

  1. Clone the repository

    git clone https://github.com/bigdata5911/Jobsite-Scraper-and-Analyzer.git
    cd Jobsite-Scraper-and-Analyzer
  2. Configure environment variables

    cp .env.sample .env
    # Edit .env file with your configuration
  3. Launch the application

    docker-compose up --build
  4. Access the application

πŸ“Š API Endpoints

Endpoint Method Description
/scraping/diagrams GET Initiates analytics task and returns task ID
/scraping/diagrams/{task_id} GET Retrieves analytics results by task ID

πŸ“‹ Configuration

Position Targeting

Modify the POSITION variable in config.py to target different job roles:

POSITION = "python"  # Options: python, java, javascript, dev, etc.

Scraping Schedule

Adjust the scraping frequency in your environment configuration:

SCRAPING_EVERY_DAYS=7  # Scrape every 7 days

🎨 Sample Visualizations

Skills by Experience Level Required Skills Optional Skills Experience Level Distribution Employment Type UA Support Contract Types Location Distribution

πŸ—οΈ Project Structure

Jobsite-Scraper-and-Analyzer/
β”œβ”€β”€ analyzing/           # Data analysis and visualization modules
β”œβ”€β”€ scraping/           # Web scraping spiders and pipelines
β”œβ”€β”€ web_server/         # Flask web application
β”œβ”€β”€ db/                 # Database models and connections
β”œβ”€β”€ main_celery/        # Celery task queue configuration
β”œβ”€β”€ static/             # Static assets and visualizations
β”œβ”€β”€ migrations/         # Database migration files
β”œβ”€β”€ nginx/              # Nginx configuration
└── scripts/            # Utility scripts

πŸ”§ Development

Key Learning Outcomes

  • Web Scraping: Advanced techniques with Selenium, Scrapy, CSS/XPath selectors
  • Data Analysis: Comprehensive data processing with Pandas, NLTK, and visualization
  • Task Management: Distributed task processing with Celery and Redis
  • Containerization: Production-ready Docker deployment
  • API Development: RESTful API design with Flask
  • DevOps: Nginx configuration and container orchestration

Areas for Future Enhancement

  • Cloud Storage Integration: Migrate from local file storage to cloud solutions
  • Database Optimization: Implement proper database storage for scraped data
  • Container Optimization: Reduce Docker image sizes and improve build efficiency
  • Code Quality: Implement comprehensive testing and SOLID principles
  • Performance Monitoring: Add application performance monitoring and logging

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ‘¨β€πŸ’» Author

bigdata5911 - GitHub Profile


Built with ❀️ for the developer community

Empowering developers with data-driven career insights


Contact Us

Telegram

About

A comprehensive web scraping and data analytics platform that analyzes job market trends for technology professionals.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published