Skip to content

The project is a software application designed for the automatic collection and analysis of job vacancy data from the hh.ru website. It provides users with a convenient graphical user interface (GUI) for interacting with the system, including configuring search parameters, viewing results, and exporting data.

License

Notifications You must be signed in to change notification settings

unSoto/MarketScope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏒 MarketScope - HH.ru Vacancy Scraper

Preview

A program for automated collection and analysis of job vacancy data from hh.ru with a convenient graphical interface

Python License Version hh.ru

Description

This program allows you to:

  • Automatically collect job vacancies from hh.ru according to specified parameters
  • Save data to a local SQLite database
  • View and filter results in a convenient interface
  • Export data to various formats (CSV, Excel, JSON)
  • Analyze vacancy statistics

✨ Features

🎯 Smart Job Search

  • Search by keywords, location, and experience level
  • Support for all major Russian cities
  • Advanced filtering options
  • Real-time progress tracking

πŸ“Š Data Analysis

  • Comprehensive statistics and analytics
  • Salary range analysis
  • Location-based insights
  • Experience level distribution
  • Remote work opportunities tracking

πŸ’Ύ Data Management

  • Local SQLite database storage
  • Batch data processing
  • Data export to multiple formats
  • Search history tracking
  • Duplicate detection and removal

πŸ–₯️ User-Friendly Interface

  • Modern GUI built with tkinter
  • Intuitive search parameters
  • Interactive results table
  • Export functionality
  • Real-time status updates

πŸ”§ Developer-Friendly

  • Clean, modular code structure
  • Comprehensive documentation
  • Type hints and docstrings
  • Error handling and logging
  • Configuration management

Project Architecture

Project Structure

MarketScope/
β”œβ”€β”€ main.py                 # Main application entry point
β”œβ”€β”€ config.py              # Configuration settings
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md              # Documentation
β”œβ”€β”€ database/              # Database module
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── db_manager.py      # SQLite database management
β”œβ”€β”€ parser/                # Parsing module
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── hh_parser.py       # Parser for hh.ru
β”œβ”€β”€ gui/                   # Graphical interface
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── main_window.py     # Main application window
└── utils/                 # Utility functions
    β”œβ”€β”€ __init__.py
    └── helpers.py        # Helper functions

Modules

Main Module (main.py)

  • Application entry point
  • Dependency checking
  • Database initialization
  • GUI launch

Database Module (database/db_manager.py)

  • SQLite database management
  • Table creation and migration
  • CRUD operations for vacancies
  • Search and filtering functionality
  • Statistics collection

Parser Module (parser/hh_parser.py)

  • HTTP client for hh.ru
  • HTML page parsing
  • Vacancy data extraction
  • Pagination handling
  • Restriction bypass (user-agent rotation, delays)

Graphical Interface (gui/main_window.py)

  • Main application window using tkinter
  • Input fields for search parameters
  • Results table
  • Export buttons
  • Status bar and notifications

Configuration (config.py)

  • Application settings
  • Location and experience mapping
  • Parsing parameters
  • GUI settings

Utilities (utils/helpers.py)

  • Helper functions
  • Data validation
  • Salary formatting
  • Text normalization

Database

Vacancies Table Structure

CREATE TABLE vacancies (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    title TEXT NOT NULL,                    -- Job vacancy title
    salary_min INTEGER,                     -- Minimum salary
    salary_max INTEGER,                     -- Maximum salary
    currency TEXT,                          -- Currency
    location TEXT,                          -- Location
    experience TEXT,                        -- Required experience
    key_skills TEXT,                        -- Key skills (JSON)
    company TEXT,                           -- Company
    link TEXT UNIQUE NOT NULL,              -- Vacancy link
    remote BOOLEAN DEFAULT FALSE,           -- Remote work
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Installation and Launch

Requirements

  • Python 3.7+
  • pip (Python package manager)
  • Internet connection (for downloading dependencies and scraping hh.ru)

Quick Start

  1. Clone or download the project files
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the application:
    python main.py

Installing Dependencies

pip install -r requirements.txt

Note: sqlite3 and tkinter are part of the standard Python library and do not require separate installation.

Running the Program

python main.py

First Launch

  • The program will automatically create a SQLite database (vacancies.db)
  • Initial setup may take a few seconds
  • The main window will appear with search options

Usage

Main Features

  1. Job Search

    • Enter a keyword in the "Keyword" field
    • Select location from the dropdown list
    • Specify required work experience
    • Click "Start Search"
  2. Viewing Results

    • Results are displayed in a table
    • Sorting and scrolling available
    • Double-click on a vacancy opens it in the browser
  3. Data Export

    • File β†’ Export to Excel
    • File β†’ Export to CSV
    • Data is saved with a timestamp
  4. Statistics

    • View β†’ Statistics
    • Shows general information about collected data

Search Parameters

  • Keyword: Required field for job search
  • Location: City or region (optional)
  • Work Experience: Required experience level (optional)
    • No experience
    • 1 to 3 years
    • 3 to 6 years
    • More than 6 years

Configuration

The main settings are located in the config.py file:

  • DATABASE_PATH: Path to the database file
  • DEFAULT_PAGE_LIMIT: Number of pages for parsing
  • REQUEST_TIMEOUT: HTTP request timeout
  • DELAY_BETWEEN_REQUESTS: Delay between requests
  • LOCATION_MAPPING: Mapping of city names to hh.ru IDs

Security and Ethics

  • The program uses User-Agent rotation to avoid blocking
  • Delays between requests are implemented to comply with site rules
  • Does not collect users' personal data
  • Works only with publicly available information

Possible Problems

Module Import Error

pip install -r requirements.txt

Database Locked

Close all program instances and delete the vacancies.db.lock file

Site hh.ru Blocks Requests

  • Increase the delay between requests in config.py
  • Use VPN if necessary
  • Check request limits

Development

Adding New Features

  1. Create a new branch: git checkout -b feature/new-feature
  2. Implement the functionality
  3. Add tests
  4. Create a Pull Request

Code Structure

  • Use type hints
  • Add docstrings
  • Follow PEP 8
  • Separate logic into modules

License

⚠️ EDUCATIONAL AND PERSONAL USE ONLY

This project is licensed under the MarketScope Usage License - see the LICENSE file for full details.

Key Points:

  • βœ… Educational Use: Study the code, learn web scraping techniques
  • βœ… Personal Use: Job searching for personal employment
  • βœ… Code Analysis: Understand software architecture and Python patterns
  • ❌ Commercial Use: Prohibited without explicit written permission
  • ❌ Mass Data Collection: Prohibited to avoid overloading hh.ru servers
  • ❌ Service Abuse: Must respect hh.ru's Terms of Service and robots.txt

Important Legal Notice:

This software scrapes data from hh.ru (HeadHunter), a commercial job search platform. Users must:

  • Comply with hh.ru's Terms of Service
  • Respect their robots.txt file
  • Follow reasonable request rates
  • Not interfere with hh.ru's normal operations

Violation of hh.ru's terms may result in account suspension or legal action from hh.ru.

Support

If you encounter problems:

  1. Check the logs in the vacancy_scraper.log file
  2. Make sure all dependencies are installed: pip install -r requirements.txt
  3. Check your internet connection
  4. Ensure you're complying with hh.ru's Terms of Service
  5. Refer to the hh.ru robots.txt file for scraping guidelines

Getting Help:


πŸ‘¨β€πŸ’» Developer

Signature

πŸ“§ Email: vlskrauch@mail.ru 🌐 Telegram: @worksoto


Version: 1.0.0 Date: 2025 Developer: Skrauch Vladislav Igorevich

About

The project is a software application designed for the automatic collection and analysis of job vacancy data from the hh.ru website. It provides users with a convenient graphical user interface (GUI) for interacting with the system, including configuring search parameters, viewing results, and exporting data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages