Personal experiments, data analysis projects, and automation tools.
- Job Postings Analysis: Web scraping and trend analysis of job market data
- Automatic Job Alerts: Real-time monitoring and notifications for new job postings
- Next.js 15 with App Router
- TypeScript
- Tailwind CSS
- Geist Font
- FastAPI (Python)
- SQLAlchemy ORM
- Playwright for web scraping
- APScheduler for job scheduling
- SQLite database
- Docker & Docker Compose
- Automated deployment
- Health checks and monitoring
- Docker and Docker Compose
- Node.js 20+ (for local development)
- Python 3.11+ (for local development)
# Build and start all services
docker-compose up --build
# Run in detached mode
docker-compose up -d
# View logs
docker-compose logs -f
# Stop all services
docker-compose downServices will be available at:
- Frontend: http://localhost:3000
- Job Scraper API: http://localhost:8001
- API Docs: http://localhost:8001/docs
cd web
npm install
npm run devFrontend will be available at http://localhost:3000
cd backend/job-scraper
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromium
uvicorn app.main:app --reload --port 8001Or use the development script:
./scripts/start-dev.shAPI will be available at http://localhost:8001
faiz-lab/
├── web/ # Next.js frontend
│ ├── src/
│ │ ├── app/ # App router pages
│ │ │ ├── labs/ # Research projects
│ │ │ └── tools/ # Utility tools
│ │ ├── components/ # Reusable React components
│ │ ├── lib/ # Utilities and API clients
│ │ └── data/ # Mock data and types
│ ├── public/ # Static assets
│ └── Dockerfile
│
├── backend/
│ └── job-scraper/ # Job scraping service
│ ├── app/ # FastAPI application
│ ├── models/ # Database models (SQLAlchemy)
│ ├── scrapers/ # Company-specific web scrapers
│ ├── data/ # SQLite database storage
│ └── Dockerfile
│
├── docs/ # Documentation
│ ├── DEPLOYMENT.md # Deployment guide
│ ├── DEPLOYMENT_CHECKLIST.md # Pre-deployment checklist
│ ├── DEPLOYMENT_SYNC.md # Fixing localhost vs deployed discrepancies
│ ├── PRODUCTION_CONFIG.md # Production configuration
│ ├── VERCEL_SETUP.md # Vercel-specific setup
│ └── ISSUE_17_FIX.md # Fix for deployment sync issues
│
├── scripts/ # Utility scripts
│ ├── setup.sh # Initial project setup
│ ├── start-dev.sh # Start development environment
│ ├── test-cors.sh # Test CORS configuration
│ ├── test-deployment.sh # Test deployment
│ └── verify-deployment.sh # Verify production deployment
│
├── docker-compose.yml # Development orchestration
├── docker-compose.prod.yml # Production orchestration
├── env.example # Environment variables template
└── README.md # This file
- 🔍 Automated Scraping: Searches job boards every hour using specific keywords
- 🎯 Keyword Search: Searches for
intern(exact word only) - 🇨🇦 Canada Only: Filters for Canadian cities (Toronto, Vancouver, Ottawa, Montreal, etc.)
- 📊 Database Storage: Tracks job postings over time
- 🔄 Smart Deduplication: Combines results from multiple searches
- 📈 Analytics: Statistics on new jobs, trends, and patterns
- 🚀 REST API: Query jobs programmatically
# Get all jobs
GET /api/jobs?company=Stripe&active_only=true&limit=100
# Filter by keywords (intern, internship, co-op, coop, software engineer, etc.)
GET /api/jobs?keywords=intern,internship,co-op,coop,software engineer,software engineering,software developer
# Get jobs first seen today
GET /api/jobs/new/today
# Get statistics
GET /api/stats
# Manually trigger scrape
POST /api/scrape?company=stripe- Create a new scraper in
backend/job-scraper/scrapers/ - Follow the pattern in
stripe_scraper.py - Add to scheduler in
app/scheduler.py - Update API endpoints as needed
Example:
class GoogleScraper:
async def scrape(self, query: str) -> List[Dict]:
# Implementation
passNEXT_PUBLIC_API_URL=http://localhost:8001DATABASE_URL=sqlite:///./data/jobs.db
SCRAPE_INTERVAL_HOURS=1
API_PORT=8001
CORS_ORIGINS=http://localhost:3000cd web
npm run dev # Start dev server
npm run build # Build for production
npm run lint # Lint codecd backend/job-scraper
# Run tests
python -m pytest
# Test scraper directly
python -m scrapers.stripe_scraper
# Format code
black .# Build all services
docker-compose build
# Run in production mode
docker-compose up -d
# View logs
docker-compose logs -fAll services include health checks:
- Frontend: Next.js health endpoint
- Backend:
/healthendpoint
cd backend/job-scraper
playwright install --with-deps chromiumrm backend/job-scraper/data/jobs.db
docker-compose restart job-scraperdocker-compose down
docker system prune -a
docker-compose build --no-cache
docker-compose up- Email/Slack notifications for new jobs
- More company scrapers (Google, Meta, Amazon, etc.)
- Job application tracking
- Advanced filtering and search
- Historical data visualization
- PostgreSQL for production
- Kubernetes deployment
This is a personal project, but feel free to fork and adapt for your own use!
MIT License - Feel free to use and modify as needed.