A program for automated collection and analysis of job vacancy data from hh.ru with a convenient graphical interface
This program allows you to:
- Automatically collect job vacancies from hh.ru according to specified parameters
- Save data to a local SQLite database
- View and filter results in a convenient interface
- Export data to various formats (CSV, Excel, JSON)
- Analyze vacancy statistics
- Search by keywords, location, and experience level
- Support for all major Russian cities
- Advanced filtering options
- Real-time progress tracking
- Comprehensive statistics and analytics
- Salary range analysis
- Location-based insights
- Experience level distribution
- Remote work opportunities tracking
- Local SQLite database storage
- Batch data processing
- Data export to multiple formats
- Search history tracking
- Duplicate detection and removal
- Modern GUI built with tkinter
- Intuitive search parameters
- Interactive results table
- Export functionality
- Real-time status updates
- Clean, modular code structure
- Comprehensive documentation
- Type hints and docstrings
- Error handling and logging
- Configuration management
MarketScope/
βββ main.py # Main application entry point
βββ config.py # Configuration settings
βββ requirements.txt # Python dependencies
βββ README.md # Documentation
βββ database/ # Database module
β βββ __init__.py
β βββ db_manager.py # SQLite database management
βββ parser/ # Parsing module
β βββ __init__.py
β βββ hh_parser.py # Parser for hh.ru
βββ gui/ # Graphical interface
β βββ __init__.py
β βββ main_window.py # Main application window
βββ utils/ # Utility functions
βββ __init__.py
βββ helpers.py # Helper functions
- Application entry point
- Dependency checking
- Database initialization
- GUI launch
- SQLite database management
- Table creation and migration
- CRUD operations for vacancies
- Search and filtering functionality
- Statistics collection
- HTTP client for hh.ru
- HTML page parsing
- Vacancy data extraction
- Pagination handling
- Restriction bypass (user-agent rotation, delays)
- Main application window using tkinter
- Input fields for search parameters
- Results table
- Export buttons
- Status bar and notifications
- Application settings
- Location and experience mapping
- Parsing parameters
- GUI settings
- Helper functions
- Data validation
- Salary formatting
- Text normalization
CREATE TABLE vacancies (
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT NOT NULL, -- Job vacancy title
salary_min INTEGER, -- Minimum salary
salary_max INTEGER, -- Maximum salary
currency TEXT, -- Currency
location TEXT, -- Location
experience TEXT, -- Required experience
key_skills TEXT, -- Key skills (JSON)
company TEXT, -- Company
link TEXT UNIQUE NOT NULL, -- Vacancy link
remote BOOLEAN DEFAULT FALSE, -- Remote work
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);- Python 3.7+
- pip (Python package manager)
- Internet connection (for downloading dependencies and scraping hh.ru)
- Clone or download the project files
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python main.py
pip install -r requirements.txtNote: sqlite3 and tkinter are part of the standard Python library and do not require separate installation.
python main.py- The program will automatically create a SQLite database (
vacancies.db) - Initial setup may take a few seconds
- The main window will appear with search options
-
Job Search
- Enter a keyword in the "Keyword" field
- Select location from the dropdown list
- Specify required work experience
- Click "Start Search"
-
Viewing Results
- Results are displayed in a table
- Sorting and scrolling available
- Double-click on a vacancy opens it in the browser
-
Data Export
- File β Export to Excel
- File β Export to CSV
- Data is saved with a timestamp
-
Statistics
- View β Statistics
- Shows general information about collected data
- Keyword: Required field for job search
- Location: City or region (optional)
- Work Experience: Required experience level (optional)
- No experience
- 1 to 3 years
- 3 to 6 years
- More than 6 years
The main settings are located in the config.py file:
DATABASE_PATH: Path to the database fileDEFAULT_PAGE_LIMIT: Number of pages for parsingREQUEST_TIMEOUT: HTTP request timeoutDELAY_BETWEEN_REQUESTS: Delay between requestsLOCATION_MAPPING: Mapping of city names to hh.ru IDs
- The program uses User-Agent rotation to avoid blocking
- Delays between requests are implemented to comply with site rules
- Does not collect users' personal data
- Works only with publicly available information
pip install -r requirements.txtClose all program instances and delete the vacancies.db.lock file
- Increase the delay between requests in config.py
- Use VPN if necessary
- Check request limits
- Create a new branch:
git checkout -b feature/new-feature - Implement the functionality
- Add tests
- Create a Pull Request
- Use type hints
- Add docstrings
- Follow PEP 8
- Separate logic into modules
This project is licensed under the MarketScope Usage License - see the LICENSE file for full details.
- β Educational Use: Study the code, learn web scraping techniques
- β Personal Use: Job searching for personal employment
- β Code Analysis: Understand software architecture and Python patterns
- β Commercial Use: Prohibited without explicit written permission
- β Mass Data Collection: Prohibited to avoid overloading hh.ru servers
- β Service Abuse: Must respect hh.ru's Terms of Service and robots.txt
This software scrapes data from hh.ru (HeadHunter), a commercial job search platform. Users must:
- Comply with hh.ru's Terms of Service
- Respect their robots.txt file
- Follow reasonable request rates
- Not interfere with hh.ru's normal operations
Violation of hh.ru's terms may result in account suspension or legal action from hh.ru.
If you encounter problems:
- Check the logs in the
vacancy_scraper.logfile - Make sure all dependencies are installed:
pip install -r requirements.txt - Check your internet connection
- Ensure you're complying with hh.ru's Terms of Service
- Refer to the hh.ru robots.txt file for scraping guidelines
- Email: vlskrauch@mail.ru
- Telegram: @worksoto
π§ Email: vlskrauch@mail.ru π Telegram: @worksoto
Version: 1.0.0 Date: 2025 Developer: Skrauch Vladislav Igorevich

