JobCrawler is a full-stack application that combines a modern Next.js frontend with a Python-based crawling backend. It crawls real job listings, stores them in Firebase Firestore, and allows users to browse and save jobs individually.
The system is designed to run locally, in production, and via CI/CD, with a clear separation between frontend, API, and crawler logic.
- 🔍 Crawl real job listings using Scrapy
- ⚡ FastAPI backend to control and monitor crawls
- 📦 Firebase Firestore for persistent storage
- 🔐 Firebase Authentication (per-user saved jobs)
- 📄 Live crawler logs (streamed to frontend)
- 🎨 Modern UI with HeroUI + Tailwind
- 🚀 Production-ready deployment (systemd / GitHub Actions)
job-tracker/
├── app/ # Next.js frontend
├── crawler/ # Python backend + Scrapy project
│ ├── jobsCrawler/ # Scrapy spiders, pipelines, settings
│ ├── main.py # FastAPI entrypoint
│ ├── runner.py # Scrapy subprocess runner
│ ├── logs/ # Runtime logs (gitignored)
│ └── .venv/ # Python virtual environment (local)
├── lib/
│ └── serviceAccountKey.json (local only, not committed)
├── .env.local # Local environment variables
└── README.md
The application uses environment variables to configure both the frontend (Next.js) and the backend/crawler (FastAPI + Scrapy).
Sensitive files (Firebase keys, secrets) are never committed to Git.
Used by the Next.js application.
NEXT_PUBLIC_JOB_API=http://localhost:8000
NEXT_PUBLIC_COMPANY_API=http://localhost:8000
NEXT_PUBLIC_CLOSESPIDER_ITEMCOUNT=5- NEXT_PUBLIC_JOB_API: Base URL of the FastAPI backend.
- NEXT_PUBLIC_COMPANY_API: API endpoint for company-related requests.
- NEXT_PUBLIC_CLOSESPIDER_ITEMCOUNT: Maximum number of items the crawler is allowed to scrape per run.
The backend is a Python-based service built with FastAPI and Scrapy. It is responsible for starting and monitoring crawler runs and persisting job data into Firebase Firestore.
The crawler is executed as a subprocess and its logs are streamed live to the frontend.
The crawler requires a Firebase service account key. The path to this file is provided via an environment variable.
FIREBASE_SERVICE_ACCOUNT=/absolute/path/to/serviceAccountKey.jsonFor local development, the Firebase service account key is usually stored inside the project directory but outside of the crawler folder.
Example path:
FIREBASE_SERVICE_ACCOUNT=/Users/yourname/Projects/job-tracker/lib/serviceAccountKey.jsonMake sure that:
- The file exists at the given path
- The path is absolute
- The file is readable by your local user
- The file is listed in .gitignore and never committed
Create and activate a Python virtual environment for the crawler:
cd crawler
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtStart the FastAPI backend with auto-reload enabled:
uvicorn main:app --reloadThe crawler is triggered via the backend API.
During a crawl run:
- Scrapy executes spiders as subprocesses
- Jobs are scraped and validated
- Duplicate jobs are skipped
- Data is written to Firebase Firestore
- Logs are written to crawler/logs/
- Logs are streamed live to the frontend