Skip to content

mwt2212/job_finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Job Finder Dashboard

Local-first job intelligence pipeline with a FastAPI backend, React dashboard, and feedback-driven ranking loop.

Preview

Job Finder Dashboard

Local Release Docs

  • Release notes: docs/release-notes-local-v1.0.0.md
  • Smoke checklist: docs/local-smoke-checklist.md
  • Data backup/restore/recovery: docs/local-data-and-recovery.md

Highlights

  • End-to-end pipeline: scout -> shortlist -> scrape -> eval (+ optional sort)
  • Unified operations UI for jobs, ratings, settings, pipeline runs, and cover letters
  • Guided onboarding wizard with setup checks, config save flow, and preflight gating
  • Local SQLite persistence with importable JSON/CSV artifacts
  • Feedback-to-tuning loop with guarded, idempotent behavior
  • Cost-aware AI eval and cover-letter generation tracking

Architecture

Core services:

  • backend/app.py: composition root (lifespan, middleware, router includes)
  • backend/api/handlers.py: API handler implementations and shared backend helpers
  • backend/api/routes/*: route registration by feature slice
  • backend/domain/services/*: extracted business/service logic
  • backend/infra/db/*: schema + repository modules
  • frontend/src/App.jsx: dashboard shell with feature modules under frontend/src/features/*

Pipeline scripts:

  • pipeline/scout.py: LinkedIn job metadata capture
  • pipeline/shortlist.py: rule + preference-based ranking
  • pipeline/scrape.py: full description scraping
  • pipeline/eval.py: structured AI fit analysis
  • pipeline/sort.py: bucket into apply/review/skip
  • Root scripts (job-scout.py, shortlist.py, deep-scrape-full.py, ai-eval.py, sort-results.py) are CLI compatibility wrappers

Data boundaries:

  • Runtime data: artifacts/
  • Database: artifacts/jobfinder.db
  • Source/config: repo files (backend/, frontend/, root config JSON)

Quick Start

Path A (Recommended): one-command install + one-click start (Windows)

  1. Install once (run from repo root):
scripts\setup-local.bat
  • Setup checks whether OPENAI_API_KEY is present and prints next steps if missing.
  1. Start app (backend + frontend + browser):
scripts\start-local.bat

This opens http://localhost:5173 and starts both services in separate terminals.

Path B (Manual): step-by-step install + manual start

Backend install:

cd backend
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m playwright install chromium
cd ..

Frontend install:

cd frontend
npm install
cd ..

Manual start:

backend\.venv\Scripts\python.exe run-backend.py
cd frontend
npm run dev

Then open http://localhost:5173.

After first launch, complete Onboarding Step 1 (bootstrap) before pipeline runs.

Onboarding (Recommended First Run)

Use the Onboarding tab in the dashboard to complete setup:

  • Step 1: environment status + bootstrap skeleton files
  • Step 2: LinkedIn session setup/status
  • Step 3: resume/profile capture + plain-English draft + resume upload parsing
  • Step 4: preferences + shortlist rules
  • Step 5: searches add/remove
  • Step 6: review + save
  • Step 7: preflight verification before first run

Pipeline start is preflight-gated. If hard checks fail, Start is blocked with actionable fix hints. For first run, execute POST /onboarding/bootstrap via the UI flow (Onboarding Step 1) before attempting pipeline runs.

One-Time LinkedIn Login Setup

To keep scraping isolated from your personal browser, this project uses a dedicated Chrome profile.

Run once:

python setup-linkedin-profile.py

Then:

  • Sign in to LinkedIn in the opened browser window
  • Complete any security/checkpoint prompts
  • Return to terminal and press Enter to verify session

After this, pipeline runs can reuse the saved session automatically.

LinkedIn setup runbook (exact checks):

  • Check current state:
    • GET /onboarding/linkedin/status
    • Expected success shape:
      • ok: true
      • profile_exists: true
      • message indicates LinkedIn session check passed
  • If ok: false, run:
    • python setup-linkedin-profile.py
    • Sign in in opened browser
    • Press Enter in terminal for script verification
    • Re-run GET /onboarding/linkedin/status
  • Final gate before pipeline:
    • POST /onboarding/preflight
    • ready must be true
    • checks should include playwright_runtime and linkedin_session with status: pass

Configuration

Environment variables:

  • OPENAI_API_KEY: required for AI eval and AI cover-letter generation
  • VITE_API_BASE: frontend API base URL (default http://127.0.0.1:8001)
  • JOBFINDER_CHROME_PROFILE: scraper browser profile directory
  • JOBFINDER_VIEWPORT: optional scraper viewport override as WIDTHxHEIGHT (example: 1280x1440)

Windows key setup example:

setx OPENAI_API_KEY "your_key_here"

Then open a new terminal before starting backend/frontend.

OpenAI API key setup (where/how):

  1. Go to https://platform.openai.com/ and sign in (or create an account).
  2. Create an API key at https://platform.openai.com/api-keys.
  3. Add billing/credits in the OpenAI platform billing section.
  4. Set the key locally:
setx OPENAI_API_KEY "your_key_here"
  1. Open a new terminal and start the app.

AI eval cost guide (approx jobs per $1):

  • Based on current ai_pricing.json and observed average ai_eval usage in artifacts/ai_usage_totals.json:
    • avg input tokens/job: ~1331
    • avg output tokens/job: ~244
  • Estimated jobs per $1 (about +/-20% token variance):
    • gpt-4.1-mini: ~1083 jobs (~902-1354)
    • gpt-5-mini: ~1217 jobs (~1014-1521)
    • gpt-4.1: ~217 jobs (~180-271)
    • gpt-5 / gpt-5.1: ~243 jobs (~203-304)
  • Notes:
    • Real cost depends on your selected model and job-description lengths.
    • Not every scraped job is always eligible for AI eval; total pipeline spend can be lower than raw scraped counts suggest.
    • Cover-letter generation is additional spend beyond eval.

Portability defaults:

  • If JOBFINDER_CHROME_PROFILE is unset, scripts use repo-local chrome-profile/
  • If JOBFINDER_VIEWPORT is unset, scrapers auto-size to half monitor width and full monitor height

Optional frontend env override:

  • frontend/.env is usually not required for local use.
  • Use it only when frontend should call a non-default backend URL.
  • Default already works locally: VITE_API_BASE=http://127.0.0.1:8001.
  • If you do need an override:
cd frontend
copy .env.example .env

Profile/template file precedence:

  • Resume profile: resume_profile.local.json -> resume_profile.json -> resume_profile.example.json
  • Cover-letter templates: cover_letter_templates.local.json -> cover_letter_templates.json -> cover_letter_templates.example.json
  • Preferences: preferences.local.json -> preferences.json -> preferences.example.json
  • Shortlist rules: shortlist_rules.local.json -> shortlist_rules.json -> shortlist_rules.example.json
  • Searches: searches.local.json -> searches.json -> searches.example.json

Personalize safely:

  • Keep your real data in *.local.json files (ignored by git)
  • Commit only sanitized *.json and *.example.json variants

Pipeline Sizing

Size presets are max_results / shortlist_k / final_top:

  • Test: 1 / 1 / 1
  • Large: 1000 / 120 / 50
  • Medium: 500 / 60 / 20
  • Small: 100 / 30 / 10

Onboarding API Surface

Key onboarding routes:

  • POST /onboarding/bootstrap
  • GET /onboarding/status
  • POST /onboarding/preflight
  • POST /onboarding/migrate
  • GET /onboarding/config
  • PUT /onboarding/config/resume-profile
  • PUT /onboarding/config/preferences
  • PUT /onboarding/config/shortlist-rules
  • PUT /onboarding/config/searches
  • POST /onboarding/profile-draft
  • POST /onboarding/resume-parse
  • GET/POST/PUT/DELETE /onboarding/searches...
  • GET /onboarding/linkedin/status
  • POST /onboarding/linkedin/init

Data Lifecycle

Generated outputs (safe to reset):

  • artifacts/tier2_metadata.json
  • artifacts/tier2_shortlist.json
  • artifacts/tier2_shortlist.csv
  • artifacts/tier2_full.json
  • artifacts/tier2_scored.json
  • artifacts/apply.json, artifacts/review.json, artifacts/skip.json
  • artifacts/*.csv exports, logs, and cover-letter outputs
  • artifacts/jobfinder.db

Persistent operator config:

  • preferences.json
  • shortlist_rules.json
  • searches.json
  • resume_profile.json
  • cover_letter_templates.json
  • ai_pricing.json

Local/private variants (preferred for personal data):

  • preferences.local.json
  • shortlist_rules.local.json
  • searches.local.json
  • resume_profile.local.json
  • cover_letter_templates.local.json

Backup/restore and recovery runbook:

  • docs/local-data-and-recovery.md

Tracked vs generated files:

  • Generated build output (frontend/dist/assets/index-*.js) and migration backups (*.bak.*) are treated as local artifacts, not source.
  • Keep source/config files tracked; keep generated runtime artifacts out of commits.

AI Cost Tracking

  • Pricing source: ai_pricing.json
  • Usage log: artifacts/ai_usage.jsonl
  • Rollups: artifacts/ai_usage_totals.json

Troubleshooting

Frontend cannot reach backend:

  • Start backend on 127.0.0.1:8001
  • Or set VITE_API_BASE in frontend/.env

Scraper captures fewer jobs per page than expected:

  • Let auto viewport sizing run by default
  • Or force JOBFINDER_VIEWPORT to a known-good value

Chrome profile lock error:

  • Close Chrome instances sharing the same profile
  • Or set JOBFINDER_CHROME_PROFILE to a dedicated folder

LinkedIn login required during scout or scrape:

  • Run python setup-linkedin-profile.py once
  • Make sure the same JOBFINDER_CHROME_PROFILE path is used when running the backend/pipeline
  • If JOBFINDER_CHROME_PROFILE is unset, backend/scripts default to repo-local chrome-profile/
  • If /onboarding/linkedin/status says LinkedIn session cookie (li_at) was not found, rerun setup script and complete sign-in/checkpoint in that same profile

Pipeline start blocked by preflight:

  • Open the Onboarding or Pipeline tab preflight panel
  • Run checks and apply listed fix_hint steps
  • Ensure playwright_runtime and linkedin_session are both pass

Resume parse upload fails:

  • Supported formats: .txt, .docx, .pdf
  • Ensure backend dependencies are installed from backend/requirements.txt (includes python-docx and pypdf)

README preview image not showing on GitHub:

  • Ensure file exists in repo at docs/dashboard-preview.png
  • Check case-sensitive path (GitHub is case-sensitive)
  • Verify it is tracked by git:
    • git ls-files docs

AI calls fail:

  • Confirm OPENAI_API_KEY is exported in the backend shell

Editor shows unresolved Python imports but backend runs:

  • This is usually VS Code using a different interpreter than your runtime shell.
  • Optional fix: Python: Select Interpreter and choose backend/.venv/Scripts/python.exe.

Quick Reset

Remove-Item -Recurse -Force artifacts
New-Item -ItemType Directory artifacts
python run-backend.py

Privacy

  • Treat resume_profile.local.json, cover_letter_templates.local.json, and browser profile data as private
  • Keep runtime artifacts out of commits
  • Sanitize local personal content before publishing the repository

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors