RentHunter

RentHunter is a self-hosted rental-listing monitor for the Trójmiasto area. It crawls apartment offers from multiple portals, applies hard filters and optional AI scoring, and pushes a notification the moment an interesting offer appears. A Svelte SPA lets you search semantically, browse with sortable infinite scroll, inspect each offer's change history, and reconfigure everything live — all stored in Postgres.

Everything runs in-process and off-host: no cloud queue, no external scheduler, and Postgres/Apprise are never exposed to the public network.

How it works

The pipeline is a single in-process flow, lock-guarded so only one crawl runs at a time (src/pipeline/):

Scrape — fetch list pages from the configured search URLs across every enabled source (src/scraper/sources/: trojmiasto, olx, otodom, nieruchomosci-online). Each source is a small adapter behind a shared registry. Fetching goes through a single fetchPage() (src/scraper/fetch.ts); set BROWSERLESS_URL to route every request through a self-hosted browserless instance — it renders the page and returns HTML from its own egress IP, which bypasses IP blocks on a flagged server and handles JS-heavy portals. Empty = plain direct fetch.
Filter — drop offers failing the hard criteria (price / area / rooms range) before any expensive work (pipeline/filter.ts).
Enrich — for new offers, fetch the detail page (bounded concurrency via pipeline/pool.ts) and extract structured data (pipeline/enrich.ts): canonical district (keywords/gazetteer.ts), kind, and features (keywords/features.ts), plus an embedding of the listing text for semantic search (embeddings/).
Score — optionally rate each offer 0–100 against your free-text criteria with DeepSeek (scorer/deepseek.ts), storing the score and its reasoning.
Notify — when an offer's score clears the threshold, send it through Apprise (notify/apprise.ts) to whatever targets you configured.
Track changes — every crawl snapshots tracked fields, so price/description/ parameter changes are recorded and viewable as per-offer history (db/snapshot.ts, offer_snapshots).
Reconcile — offers no longer present on the source are marked inactive.

The crawl is triggered two ways: the in-process scheduler (pipeline/scheduler.ts) re-runs it every config.pollIntervalMin minutes (0 = off), and the "Uruchom crawler" button in the UI fires POST /api/run on demand. Live progress streams to the browser over a WebSocket (pipeline/progress.ts).

Search & browsing

Facets (district / kind / features / source) are hard filters.
Keyword search turns your text into an embedding and ranks offers by cosine similarity (embeddings/cosine.ts). "Trafność" (relevance) returns the most relevant offers in similarity order; picking Cena / Najnowsze / Powierzchnia orders that relevant subset by price / date / area instead.
The list is server-paginated and DOM-virtualized (@tanstack/virtual-core), so it stays fast with thousands of offers — infinite scroll fetches the next page and only the visible rows are rendered, in both the card and table views.

Tech stack

Layer	Choice
Runtime / bundler / test runner	Bun (server, `bun build`, `bun test` — no Node, webpack, vite, or jest)
Web UI	Svelte 5 (runes) SPA, Tailwind v4, `@tanstack/virtual-core` for virtualization; built to static assets and served by `Bun.serve()` with a WebSocket for live crawl progress
Database	Postgres via Drizzle ORM + `postgres-js`; tests run against in-memory PGlite
AI scoring	DeepSeek chat completions
Semantic search	Text embeddings via any OpenAI-compatible endpoint (e.g. self-hosted Ollama, or OpenAI) + in-process cosine ranking
Notifications	Apprise
Packaging	Docker Compose (dev + prod), in-process lock-guarded crawl scheduler

Run it with Docker

The stack is the same on a laptop and on a dedicated server — only the compose file differs. Both keep Postgres and Apprise off the public network.

Development (hot reload)

Full stack with the app hot-reloading from bind-mounted source:

bun run compose:dev          # docker compose -f docker-compose.dev.yml up

App: http://localhost:3000 · Postgres: 127.0.0.1:5432 · Apprise: 127.0.0.1:8000 (loopback-only, so host bun test can reach them).
Trigger a crawl with the "Uruchom crawler" button in the UI.
Frontend (web/) edits need a rebuild; the server hot-reloads on its own.
Stop: bun run compose:dev:down.

Production (dedicated server)

docker-compose.yml is the production stack: db + apprise + app, self-contained. The scheduled crawl runs in-process inside the app (src/pipeline/scheduler.ts, driven by the DB pollIntervalMin setting) — there is no separate scheduler service. Postgres and Apprise are internal-only.

cp .env.production.example .env.production   # then fill in POSTGRES_PASSWORD etc.
bun run compose:prod                         # up -d --build, reads .env.production

Embeddings (semantic search) are opt-in. By default no Ollama container runs and the in-panel Embeddings toggle is off, so a plain deploy needs zero embedding setup. To enable semantic search you need both layers:

Run the provider — start the ollama + ollama-pull containers via the embeddings compose profile: COMPOSE_PROFILES=embeddings in your env (or --profile embeddings on the up command).
Turn on Embeddings in the UI Konfiguracja panel.

Without the profile, the app still boots fine — it just skips embedding (it never blocks on or requires Ollama). Point EMBED_BASE_URL/EMBED_API_KEY/EMBED_MODEL at a paid OpenAI-compatible provider instead of Ollama if you prefer.

The app service listens on 3000 (no host port is published) — put a reverse proxy (Caddy/Traefik/nginx) in front of it for TLS, or publish a port yourself.
The app auto-applies migrations (drizzle-kit migrate) on start.
Logs: bun run compose:prod:logs · Stop: bun run compose:prod:down.

Production (Coolify)

docker-compose.yml is Coolify-ready (Docker Compose build pack):

Create a resource from this Git repo and pick Docker Compose (it uses docker-compose.yml by default).
In the app service, assign a domain — Coolify's reverse proxy terminates TLS and routes to the exposed port 3000; no host port is published, so the internal services (db, apprise) stay off the public network.
Set the environment variables in the app's Environment Variables tab — POSTGRES_PASSWORD is required; everything else (DeepSeek, browserless, embeddings) is optional. See .env.production.example for the full list. To enable semantic search, add COMPOSE_PROFILES=embeddings (starts the internal Ollama provider) and turn on Embeddings in the panel.
Deploy. Migrations run automatically on start; pgdata and ollama are persistent named volumes (the ollama volume is only populated when the embeddings profile runs).

Run on the host (without Docker)

bun install
cp .env.example .env          # point DATABASE_URL at a reachable Postgres
bun run db:push               # apply schema
bun run dev                   # build SPA + hot-reload API on PORT (default 3000)
bun test                      # run the test suite (PGlite, never touches your DB)

Configuration

Two kinds of configuration:

Live settings — search URLs, hard filters, AI criteria, score threshold, Apprise targets, poll interval, concurrency, list pages, request delay, and the extraction/embedding toggles — are edited in the UI Konfiguracja panel and stored in Postgres.

Environment — connection strings and secrets only (src/config.ts):

Var	Purpose	Default
`DATABASE_URL`	Postgres connection (required)	—
`PORT`	App HTTP port	`3000`
`DEEPSEEK_API_KEY` / `DEEPSEEK_BASE_URL`	AI scoring	`https://api.deepseek.com`
`EMBED_BASE_URL` / `EMBED_API_KEY` / `EMBED_MODEL`	Embeddings for semantic search	`https://api.openai.com/v1`, `text-embedding-3-small`
`APPRISE_URL`	Apprise API endpoint	`http://localhost:8000`
`BROWSERLESS_URL`	Self-hosted browserless base URL; when set, all scraping is routed through its `/content` endpoint (bypasses IP blocks). Empty = direct fetch	—
`BROWSERLESS_TOKEN`	`?token=` for browserless, if your instance requires auth	—

See .env.example (host) and .env.production.example (prod compose).

Project layout

src/
  api/         Bun.serve HTTP + WebSocket server, routes
  scraper/     source adapters (trojmiasto, olx, otodom, nieruchomosci-online) +
               HTML parsing; fetch.ts routes via direct fetch or self-hosted browserless
  pipeline/    crawl orchestration: filter, enrich, score, notify, scheduler, run-lock
  scorer/      DeepSeek AI scoring
  embeddings/  embedding client + cosine ranking
  keywords/    district gazetteer + feature extraction
  notify/      Apprise integration
  db/          Drizzle schema, queries, change-snapshot tracking
  log/         DB-backed logger
web/           Svelte 5 SPA (cards/table, search, config, logs, history)
test/          bun test suite (runs on PGlite)

Notes

Tests run on in-memory PGlite, never your real database; src/db/client.ts refuses to start under NODE_ENV=test without it.
make db-backup before any risky DB work. make up-fresh destroys the DB volume — only make up is safe to rerun.

Name		Name	Last commit message	Last commit date
Latest commit History 239 Commits
.claude		.claude
.vscode		.vscode
config		config
drizzle		drizzle
graphify-out		graphify-out
scripts		scripts
src		src
test		test
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.env.production.example		.env.production.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
bun.lock		bun.lock
bunfig.toml		bunfig.toml
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
drizzle.config.ts		drizzle.config.ts
index.ts		index.ts
package.json		package.json
sources.txt		sources.txt
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RentHunter

How it works

Search & browsing

Tech stack

Run it with Docker

Development (hot reload)

Production (dedicated server)

Production (Coolify)

Run on the host (without Docker)

Configuration

Project layout

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RentHunter

How it works

Search & browsing

Tech stack

Run it with Docker

Development (hot reload)

Production (dedicated server)

Production (Coolify)

Run on the host (without Docker)

Configuration

Project layout

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages