auto-crawler

auto-crawler is an asynchronous Python-based service for collecting user car reviews from auto.ria.com. It distributes crawling tasks across multiple workers, stores data in a PostgreSQL database via SQLAlchemy, and supports incremental parsing with retry logic and configurable page selection.

Features

Async web crawling with aiohttp and tenacity
HTML parsing via BeautifulSoup
Structured data persistence with SQLAlchemy and Alembic
Modular repository layer: supports DB or file storage
Worker distribution with page tracking to prevent re-parsing
Dockerized setup with dev/test environments
In-progress visual analytics service with seaborn and matplotlib

Tech Stack

Python 3.12
aiohttp & BeautifulSoup
SQLAlchemy & Alembic
Tenacity (retry logic)
Docker + Docker Compose
Bandit, Flake8, isort, mypy, yamllint (for quality checks)
pandas, seaborn, matplotlib (for plotting, WIP)

Quick Start

1. Clone and build

git clone https://github.com/yakhoruzhenko/auto-crawler.git
cd auto-crawler
make up

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
alembic		alembic
app		app
.bandit		.bandit
.dockerignore		.dockerignore
.gitignore		.gitignore
.yamllint		.yamllint
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

auto-crawler

Features

Tech Stack

Quick Start

1. Clone and build

About

Uh oh!

Releases

Packages

Languages

License

yakhoruzhenko/auto-crawler

Folders and files

Latest commit

History

Repository files navigation

auto-crawler

Features

Tech Stack

Quick Start

1. Clone and build

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages