jobs-scraper

A selenium-based job scraper in Python. Automate your job search by retrieving job listings from popular job portals, and storing them in a database (duckdb) for easy management and analysis.

Have a look at the Quarto-based demo at hh-13.github.io/jobs-scraper!

Table of Contents

About The Project
- The What and the Why
- Project Structure
Features
Getting Started
Usage
- The video
- Snapshots
What's next
Keep Coding!

About The Project

The What and the Why

This project aims to quicken the process of finding relevant job listings from popular job portals by providing a simple interface to automate this task. With in-built functions for storing the scraped data into a database, it allows you to keep up with the market by analysing hundreds of listings on the go and monitoring the pulse of the industry.

Project Structure

src/
 -> indeed_scraper.py    - scraper object for Indeed
 -> utils.py             - generic classes for configuring portals and browsers
definitions.sql          - sample database definition for storing search results
poetry.lock              - lock file for poetry
pyproject.toml           - project configs
scraper_example.py       - example script for running the scraper, saving the data to the database and using it to view and analyse jobs

Features

Built on object-oriented principals to simplify adding more websites and job portals.
Enables saving search results to a database.
Easily break down and analyse the collected data using both SQL and Dataframe-based tools, thanks to duckdb.
Automatically detect and handle 'Are you a Human?' pages (Upcoming!)

Getting Started

Requirements

Installation and Usage

# Clone this repo and cd into it
foo@bar:~$ git clone https://github.com/hH-13/jobs-scraper.git
foo@bar:~$ cd jobs-scraper

# Set up the virtual environment:
foo@bar:~$ poetry install

# Activate the virtual environment (alternatively, ensure any python-related command is preceded by poetry run):
foo@bar:~$ poetry shell

# Run the program
foo@bar:~$ poetry run python scraper_example.py  --help
usage: scraper_example.py [-h] -k KEYWORDS -l LOCATION [-r RADIUS] [--sort_by_date] [-n NUM_PAGES] [top_level_domain]

Scrape a job portal for job listings. Currently only supports Indeed.

positional arguments:
  top_level_domain      The regional indeed website to search, e.g. `www.indeed.com` for the USA or `in.indeed.com` for India

options:
  -h, --help            show this help message and exit
  -k KEYWORDS, --keywords KEYWORDS
                        Your search keywords, like "Software Engineer"
  -l LOCATION, --location LOCATION
                        Your search location, like "New York"
  -r RADIUS, --radius RADIUS
                        The search radius, in mi or km, depending on the location. Ignored if location is Remote.
  --sort_by_date        Sort the results by date. Default sorting method is by relevance
  -n NUM_PAGES, --num_pages NUM_PAGES
                        The number of pages of search results to return

Examples

Have a look at this notebook usage.qmd or the demo page at hh-13.github.io/jobs-scraper for usage examples.

What's next?

The possibilites are endless!

Extract specific details from Job descriptions using Natural-language processing tools, or LLMs using Retrieval-augmented Generation techniques.
Create an embeddings map of the descriptions to spot patterns.
Stay on top of the latest trends in the job markets by monitoring the top keywords in a search session.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
img		img
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
definitions.sql		definitions.sql
jobs.db		jobs.db
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
scraper_example.py		scraper_example.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jobs-scraper

About The Project

The What and the Why

Project Structure

Features

Getting Started

Requirements

Installation and Usage

Examples

What's next?

Keep Coding!

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

jobs-scraper

About The Project

The What and the Why

Project Structure

Features

Getting Started

Requirements

Installation and Usage

Examples

What's next?

Keep Coding!

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages