JobCrawler (Next.js, TypeScript, Python, FastAPI, Scrapy)

JobCrawler is a full-stack application that combines a modern Next.js frontend with a Python-based crawling backend. It crawls real job listings, stores them in Firebase Firestore, and allows users to browse and save jobs individually.

The system is designed to run locally, in production, and via CI/CD, with a clear separation between frontend, API, and crawler logic.

Features

🔍 Crawl real job listings using Scrapy
⚡ FastAPI backend to control and monitor crawls
📦 Firebase Firestore for persistent storage
🔐 Firebase Authentication (per-user saved jobs)
📄 Live crawler logs (streamed to frontend)
🎨 Modern UI with HeroUI + Tailwind
🚀 Production-ready deployment (systemd / GitHub Actions)

Technologies Used

Frontend

Backend / Crawler

Project Structure

job-tracker/
├── app/                    # Next.js frontend
├── crawler/                # Python backend + Scrapy project
│   ├── jobsCrawler/        # Scrapy spiders, pipelines, settings
│   ├── main.py             # FastAPI entrypoint
│   ├── runner.py           # Scrapy subprocess runner
│   ├── logs/               # Runtime logs (gitignored)
│   └── .venv/              # Python virtual environment (local)
├── lib/
│   └── serviceAccountKey.json (local only, not committed)
├── .env.local              # Local environment variables
└── README.md

Environment Variables

The application uses environment variables to configure both the frontend (Next.js) and the backend/crawler (FastAPI + Scrapy).

Sensitive files (Firebase keys, secrets) are never committed to Git.

Frontend (`.env.local`)

Used by the Next.js application.

NEXT_PUBLIC_JOB_API=http://localhost:8000
NEXT_PUBLIC_COMPANY_API=http://localhost:8000
NEXT_PUBLIC_CLOSESPIDER_ITEMCOUNT=5

NEXT_PUBLIC_JOB_API: Base URL of the FastAPI backend.
NEXT_PUBLIC_COMPANY_API: API endpoint for company-related requests.
NEXT_PUBLIC_CLOSESPIDER_ITEMCOUNT: Maximum number of items the crawler is allowed to scrape per run.

Backend / Crawler

The backend is a Python-based service built with FastAPI and Scrapy. It is responsible for starting and monitoring crawler runs and persisting job data into Firebase Firestore.

The crawler is executed as a subprocess and its logs are streamed live to the frontend.

Required Environment Variables

The crawler requires a Firebase service account key. The path to this file is provided via an environment variable.

FIREBASE_SERVICE_ACCOUNT=/absolute/path/to/serviceAccountKey.json

Local example

For local development, the Firebase service account key is usually stored inside the project directory but outside of the crawler folder.

Example path:

FIREBASE_SERVICE_ACCOUNT=/Users/yourname/Projects/job-tracker/lib/serviceAccountKey.json

Make sure that:

The file exists at the given path
The path is absolute
The file is readable by your local user
The file is listed in .gitignore and never committed

Setup (Local Development)

Create and activate a Python virtual environment for the crawler:

cd crawler
python3 -m venv .venv
source .venv/bin/activate

Install Python dependencies:

pip install -r requirements.txt

Running the Backend Locally

Start the FastAPI backend with auto-reload enabled:

uvicorn main:app --reload

Running the Crawler Locally

The crawler is triggered via the backend API.

During a crawl run:

Scrapy executes spiders as subprocesses
Jobs are scraped and validated
Duplicate jobs are skipped
Data is written to Firebase Firestore
Logs are written to crawler/logs/
Logs are streamed live to the frontend

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
.vscode		.vscode
app		app
components		components
config		config
crawler		crawler
lib		lib
public		public
styles		styles
types		types
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JobCrawler (Next.js, TypeScript, Python, FastAPI, Scrapy)

Features

Technologies Used

Frontend

Backend / Crawler

Project Structure

Environment Variables

Frontend (`.env.local`)

Backend / Crawler

Required Environment Variables

Local example

Setup (Local Development)

Install Python dependencies:

Running the Backend Locally

Running the Crawler Locally

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JobCrawler (Next.js, TypeScript, Python, FastAPI, Scrapy)

Features

Technologies Used

Frontend

Backend / Crawler

Project Structure

Environment Variables

Frontend (.env.local)

Backend / Crawler

Required Environment Variables

Local example

Setup (Local Development)

Install Python dependencies:

Running the Backend Locally

Running the Crawler Locally

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Frontend (`.env.local`)

Packages