RentHunter is a self-hosted rental-listing monitor for the Trójmiasto area. It crawls apartment offers from multiple portals, applies hard filters and optional AI scoring, and pushes a notification the moment an interesting offer appears. A Svelte SPA lets you search semantically, browse with sortable infinite scroll, inspect each offer's change history, and reconfigure everything live — all stored in Postgres.
Everything runs in-process and off-host: no cloud queue, no external scheduler, and Postgres/Apprise are never exposed to the public network.
The pipeline is a single in-process flow, lock-guarded so only one crawl runs at a
time (src/pipeline/):
- Scrape — fetch list pages from the configured search URLs across every enabled
source (
src/scraper/sources/:trojmiasto,olx,otodom,nieruchomosci-online). Each source is a small adapter behind a shared registry. Fetching goes through a singlefetchPage()(src/scraper/fetch.ts); setBROWSERLESS_URLto route every request through a self-hosted browserless instance — it renders the page and returns HTML from its own egress IP, which bypasses IP blocks on a flagged server and handles JS-heavy portals. Empty = plain direct fetch. - Filter — drop offers failing the hard criteria (price / area / rooms range)
before any expensive work (
pipeline/filter.ts). - Enrich — for new offers, fetch the detail page (bounded concurrency via
pipeline/pool.ts) and extract structured data (pipeline/enrich.ts): canonical district (keywords/gazetteer.ts), kind, and features (keywords/features.ts), plus an embedding of the listing text for semantic search (embeddings/). - Score — optionally rate each offer 0–100 against your free-text criteria with
DeepSeek (
scorer/deepseek.ts), storing the score and its reasoning. - Notify — when an offer's score clears the threshold, send it through
Apprise (
notify/apprise.ts) to whatever targets you configured. - Track changes — every crawl snapshots tracked fields, so price/description/
parameter changes are recorded and viewable as per-offer history
(
db/snapshot.ts,offer_snapshots). - Reconcile — offers no longer present on the source are marked inactive.
The crawl is triggered two ways: the in-process scheduler (pipeline/scheduler.ts)
re-runs it every config.pollIntervalMin minutes (0 = off), and the
"Uruchom crawler" button in the UI fires POST /api/run on demand. Live progress
streams to the browser over a WebSocket (pipeline/progress.ts).
- Facets (district / kind / features / source) are hard filters.
- Keyword search turns your text into an embedding and ranks offers by cosine
similarity (
embeddings/cosine.ts). "Trafność" (relevance) returns the most relevant offers in similarity order; picking Cena / Najnowsze / Powierzchnia orders that relevant subset by price / date / area instead. - The list is server-paginated and DOM-virtualized (
@tanstack/virtual-core), so it stays fast with thousands of offers — infinite scroll fetches the next page and only the visible rows are rendered, in both the card and table views.
| Layer | Choice |
|---|---|
| Runtime / bundler / test runner | Bun (server, bun build, bun test — no Node, webpack, vite, or jest) |
| Web UI | Svelte 5 (runes) SPA, Tailwind v4, @tanstack/virtual-core for virtualization; built to static assets and served by Bun.serve() with a WebSocket for live crawl progress |
| Database | Postgres via Drizzle ORM + postgres-js; tests run against in-memory PGlite |
| AI scoring | DeepSeek chat completions |
| Semantic search | Text embeddings via any OpenAI-compatible endpoint (e.g. self-hosted Ollama, or OpenAI) + in-process cosine ranking |
| Notifications | Apprise |
| Packaging | Docker Compose (dev + prod), in-process lock-guarded crawl scheduler |
The stack is the same on a laptop and on a dedicated server — only the compose file differs. Both keep Postgres and Apprise off the public network.
Full stack with the app hot-reloading from bind-mounted source:
bun run compose:dev # docker compose -f docker-compose.dev.yml up- App: http://localhost:3000 · Postgres:
127.0.0.1:5432· Apprise:127.0.0.1:8000(loopback-only, so hostbun testcan reach them). - Trigger a crawl with the "Uruchom crawler" button in the UI.
- Frontend (
web/) edits need a rebuild; the server hot-reloads on its own. - Stop:
bun run compose:dev:down.
docker-compose.yml is the production stack: db + apprise + app,
self-contained. The scheduled crawl runs in-process inside the app
(src/pipeline/scheduler.ts, driven by the DB pollIntervalMin setting) — there is
no separate scheduler service. Postgres and Apprise are internal-only.
cp .env.production.example .env.production # then fill in POSTGRES_PASSWORD etc.
bun run compose:prod # up -d --build, reads .env.productionEmbeddings (semantic search) are opt-in. By default no Ollama container runs and the in-panel Embeddings toggle is off, so a plain deploy needs zero embedding setup. To enable semantic search you need both layers:
- Run the provider — start the
ollama+ollama-pullcontainers via theembeddingscompose profile:COMPOSE_PROFILES=embeddingsin your env (or--profile embeddingson theupcommand). - Turn on Embeddings in the UI Konfiguracja panel.
Without the profile, the app still boots fine — it just skips embedding (it never
blocks on or requires Ollama). Point EMBED_BASE_URL/EMBED_API_KEY/EMBED_MODEL at
a paid OpenAI-compatible provider instead of Ollama if you prefer.
- The
appservice listens on3000(no host port is published) — put a reverse proxy (Caddy/Traefik/nginx) in front of it for TLS, or publish a port yourself. - The app auto-applies migrations (
drizzle-kit migrate) on start. - Logs:
bun run compose:prod:logs· Stop:bun run compose:prod:down.
docker-compose.yml is Coolify-ready (Docker Compose build pack):
- Create a resource from this Git repo and pick Docker Compose (it uses
docker-compose.ymlby default). - In the
appservice, assign a domain — Coolify's reverse proxy terminates TLS and routes to the exposed port3000; no host port is published, so the internal services (db,apprise) stay off the public network. - Set the environment variables in the app's Environment Variables tab —
POSTGRES_PASSWORDis required; everything else (DeepSeek, browserless, embeddings) is optional. See.env.production.examplefor the full list. To enable semantic search, addCOMPOSE_PROFILES=embeddings(starts the internal Ollama provider) and turn on Embeddings in the panel. - Deploy. Migrations run automatically on start;
pgdataandollamaare persistent named volumes (theollamavolume is only populated when the embeddings profile runs).
bun install
cp .env.example .env # point DATABASE_URL at a reachable Postgres
bun run db:push # apply schema
bun run dev # build SPA + hot-reload API on PORT (default 3000)
bun test # run the test suite (PGlite, never touches your DB)Two kinds of configuration:
-
Live settings — search URLs, hard filters, AI criteria, score threshold, Apprise targets, poll interval, concurrency, list pages, request delay, and the extraction/embedding toggles — are edited in the UI Konfiguracja panel and stored in Postgres.
-
Environment — connection strings and secrets only (
src/config.ts):Var Purpose Default DATABASE_URLPostgres connection (required) — PORTApp HTTP port 3000DEEPSEEK_API_KEY/DEEPSEEK_BASE_URLAI scoring https://api.deepseek.comEMBED_BASE_URL/EMBED_API_KEY/EMBED_MODELEmbeddings for semantic search https://api.openai.com/v1,text-embedding-3-smallAPPRISE_URLApprise API endpoint http://localhost:8000BROWSERLESS_URLSelf-hosted browserless base URL; when set, all scraping is routed through its /contentendpoint (bypasses IP blocks). Empty = direct fetch— BROWSERLESS_TOKEN?token=for browserless, if your instance requires auth— See
.env.example(host) and.env.production.example(prod compose).
src/
api/ Bun.serve HTTP + WebSocket server, routes
scraper/ source adapters (trojmiasto, olx, otodom, nieruchomosci-online) +
HTML parsing; fetch.ts routes via direct fetch or self-hosted browserless
pipeline/ crawl orchestration: filter, enrich, score, notify, scheduler, run-lock
scorer/ DeepSeek AI scoring
embeddings/ embedding client + cosine ranking
keywords/ district gazetteer + feature extraction
notify/ Apprise integration
db/ Drizzle schema, queries, change-snapshot tracking
log/ DB-backed logger
web/ Svelte 5 SPA (cards/table, search, config, logs, history)
test/ bun test suite (runs on PGlite)
- Tests run on in-memory PGlite, never your real database;
src/db/client.tsrefuses to start underNODE_ENV=testwithout it. make db-backupbefore any risky DB work.make up-freshdestroys the DB volume — onlymake upis safe to rerun.