Local-first job intelligence pipeline with a FastAPI backend, React dashboard, and feedback-driven ranking loop.
- Release notes:
docs/release-notes-local-v1.0.0.md - Smoke checklist:
docs/local-smoke-checklist.md - Data backup/restore/recovery:
docs/local-data-and-recovery.md
- End-to-end pipeline:
scout -> shortlist -> scrape -> eval(+ optionalsort) - Unified operations UI for jobs, ratings, settings, pipeline runs, and cover letters
- Guided onboarding wizard with setup checks, config save flow, and preflight gating
- Local SQLite persistence with importable JSON/CSV artifacts
- Feedback-to-tuning loop with guarded, idempotent behavior
- Cost-aware AI eval and cover-letter generation tracking
Core services:
backend/app.py: composition root (lifespan, middleware, router includes)backend/api/handlers.py: API handler implementations and shared backend helpersbackend/api/routes/*: route registration by feature slicebackend/domain/services/*: extracted business/service logicbackend/infra/db/*: schema + repository modulesfrontend/src/App.jsx: dashboard shell with feature modules underfrontend/src/features/*
Pipeline scripts:
pipeline/scout.py: LinkedIn job metadata capturepipeline/shortlist.py: rule + preference-based rankingpipeline/scrape.py: full description scrapingpipeline/eval.py: structured AI fit analysispipeline/sort.py: bucket into apply/review/skip- Root scripts (
job-scout.py,shortlist.py,deep-scrape-full.py,ai-eval.py,sort-results.py) are CLI compatibility wrappers
Data boundaries:
- Runtime data:
artifacts/ - Database:
artifacts/jobfinder.db - Source/config: repo files (
backend/,frontend/, root config JSON)
Path A (Recommended): one-command install + one-click start (Windows)
- Install once (run from repo root):
scripts\setup-local.bat- Setup checks whether
OPENAI_API_KEYis present and prints next steps if missing.
- Start app (backend + frontend + browser):
scripts\start-local.batThis opens http://localhost:5173 and starts both services in separate terminals.
Path B (Manual): step-by-step install + manual start
Backend install:
cd backend
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m playwright install chromium
cd ..Frontend install:
cd frontend
npm install
cd ..Manual start:
backend\.venv\Scripts\python.exe run-backend.pycd frontend
npm run devThen open http://localhost:5173.
After first launch, complete Onboarding Step 1 (bootstrap) before pipeline runs.
Use the Onboarding tab in the dashboard to complete setup:
- Step 1: environment status + bootstrap skeleton files
- Step 2: LinkedIn session setup/status
- Step 3: resume/profile capture + plain-English draft + resume upload parsing
- Step 4: preferences + shortlist rules
- Step 5: searches add/remove
- Step 6: review + save
- Step 7: preflight verification before first run
Pipeline start is preflight-gated. If hard checks fail, Start is blocked with actionable fix hints.
For first run, execute POST /onboarding/bootstrap via the UI flow (Onboarding Step 1) before attempting pipeline runs.
To keep scraping isolated from your personal browser, this project uses a dedicated Chrome profile.
Run once:
python setup-linkedin-profile.pyThen:
- Sign in to LinkedIn in the opened browser window
- Complete any security/checkpoint prompts
- Return to terminal and press Enter to verify session
After this, pipeline runs can reuse the saved session automatically.
LinkedIn setup runbook (exact checks):
- Check current state:
GET /onboarding/linkedin/status- Expected success shape:
ok: trueprofile_exists: truemessageindicates LinkedIn session check passed
- If
ok: false, run:python setup-linkedin-profile.py- Sign in in opened browser
- Press Enter in terminal for script verification
- Re-run
GET /onboarding/linkedin/status
- Final gate before pipeline:
POST /onboarding/preflightreadymust betruechecksshould includeplaywright_runtimeandlinkedin_sessionwithstatus: pass
Environment variables:
OPENAI_API_KEY: required for AI eval and AI cover-letter generationVITE_API_BASE: frontend API base URL (defaulthttp://127.0.0.1:8001)JOBFINDER_CHROME_PROFILE: scraper browser profile directoryJOBFINDER_VIEWPORT: optional scraper viewport override asWIDTHxHEIGHT(example:1280x1440)
Windows key setup example:
setx OPENAI_API_KEY "your_key_here"Then open a new terminal before starting backend/frontend.
OpenAI API key setup (where/how):
- Go to
https://platform.openai.com/and sign in (or create an account). - Create an API key at
https://platform.openai.com/api-keys. - Add billing/credits in the OpenAI platform billing section.
- Set the key locally:
setx OPENAI_API_KEY "your_key_here"- Open a new terminal and start the app.
AI eval cost guide (approx jobs per $1):
- Based on current
ai_pricing.jsonand observed averageai_evalusage inartifacts/ai_usage_totals.json:- avg input tokens/job:
~1331 - avg output tokens/job:
~244
- avg input tokens/job:
- Estimated jobs per $1 (about +/-20% token variance):
gpt-4.1-mini: ~1083jobs (~902-1354)gpt-5-mini: ~1217jobs (~1014-1521)gpt-4.1: ~217jobs (~180-271)gpt-5/gpt-5.1: ~243jobs (~203-304)
- Notes:
- Real cost depends on your selected model and job-description lengths.
- Not every scraped job is always eligible for AI eval; total pipeline spend can be lower than raw scraped counts suggest.
- Cover-letter generation is additional spend beyond eval.
Portability defaults:
- If
JOBFINDER_CHROME_PROFILEis unset, scripts use repo-localchrome-profile/ - If
JOBFINDER_VIEWPORTis unset, scrapers auto-size to half monitor width and full monitor height
Optional frontend env override:
frontend/.envis usually not required for local use.- Use it only when frontend should call a non-default backend URL.
- Default already works locally:
VITE_API_BASE=http://127.0.0.1:8001. - If you do need an override:
cd frontend
copy .env.example .envProfile/template file precedence:
- Resume profile:
resume_profile.local.json->resume_profile.json->resume_profile.example.json - Cover-letter templates:
cover_letter_templates.local.json->cover_letter_templates.json->cover_letter_templates.example.json - Preferences:
preferences.local.json->preferences.json->preferences.example.json - Shortlist rules:
shortlist_rules.local.json->shortlist_rules.json->shortlist_rules.example.json - Searches:
searches.local.json->searches.json->searches.example.json
Personalize safely:
- Keep your real data in
*.local.jsonfiles (ignored by git) - Commit only sanitized
*.jsonand*.example.jsonvariants
Size presets are max_results / shortlist_k / final_top:
- Test:
1 / 1 / 1 - Large:
1000 / 120 / 50 - Medium:
500 / 60 / 20 - Small:
100 / 30 / 10
Key onboarding routes:
POST /onboarding/bootstrapGET /onboarding/statusPOST /onboarding/preflightPOST /onboarding/migrateGET /onboarding/configPUT /onboarding/config/resume-profilePUT /onboarding/config/preferencesPUT /onboarding/config/shortlist-rulesPUT /onboarding/config/searchesPOST /onboarding/profile-draftPOST /onboarding/resume-parseGET/POST/PUT/DELETE /onboarding/searches...GET /onboarding/linkedin/statusPOST /onboarding/linkedin/init
Generated outputs (safe to reset):
artifacts/tier2_metadata.jsonartifacts/tier2_shortlist.jsonartifacts/tier2_shortlist.csvartifacts/tier2_full.jsonartifacts/tier2_scored.jsonartifacts/apply.json,artifacts/review.json,artifacts/skip.jsonartifacts/*.csvexports, logs, and cover-letter outputsartifacts/jobfinder.db
Persistent operator config:
preferences.jsonshortlist_rules.jsonsearches.jsonresume_profile.jsoncover_letter_templates.jsonai_pricing.json
Local/private variants (preferred for personal data):
preferences.local.jsonshortlist_rules.local.jsonsearches.local.jsonresume_profile.local.jsoncover_letter_templates.local.json
Backup/restore and recovery runbook:
docs/local-data-and-recovery.md
Tracked vs generated files:
- Generated build output (
frontend/dist/assets/index-*.js) and migration backups (*.bak.*) are treated as local artifacts, not source. - Keep source/config files tracked; keep generated runtime artifacts out of commits.
- Pricing source:
ai_pricing.json - Usage log:
artifacts/ai_usage.jsonl - Rollups:
artifacts/ai_usage_totals.json
Frontend cannot reach backend:
- Start backend on
127.0.0.1:8001 - Or set
VITE_API_BASEinfrontend/.env
Scraper captures fewer jobs per page than expected:
- Let auto viewport sizing run by default
- Or force
JOBFINDER_VIEWPORTto a known-good value
Chrome profile lock error:
- Close Chrome instances sharing the same profile
- Or set
JOBFINDER_CHROME_PROFILEto a dedicated folder
LinkedIn login required during scout or scrape:
- Run
python setup-linkedin-profile.pyonce - Make sure the same
JOBFINDER_CHROME_PROFILEpath is used when running the backend/pipeline - If
JOBFINDER_CHROME_PROFILEis unset, backend/scripts default to repo-localchrome-profile/ - If
/onboarding/linkedin/statussaysLinkedIn session cookie (li_at) was not found, rerun setup script and complete sign-in/checkpoint in that same profile
Pipeline start blocked by preflight:
- Open the
OnboardingorPipelinetab preflight panel - Run checks and apply listed
fix_hintsteps - Ensure
playwright_runtimeandlinkedin_sessionare bothpass
Resume parse upload fails:
- Supported formats:
.txt,.docx,.pdf - Ensure backend dependencies are installed from
backend/requirements.txt(includespython-docxandpypdf)
README preview image not showing on GitHub:
- Ensure file exists in repo at
docs/dashboard-preview.png - Check case-sensitive path (GitHub is case-sensitive)
- Verify it is tracked by git:
git ls-files docs
AI calls fail:
- Confirm
OPENAI_API_KEYis exported in the backend shell
Editor shows unresolved Python imports but backend runs:
- This is usually VS Code using a different interpreter than your runtime shell.
- Optional fix:
Python: Select Interpreterand choosebackend/.venv/Scripts/python.exe.
Remove-Item -Recurse -Force artifacts
New-Item -ItemType Directory artifacts
python run-backend.py- Treat
resume_profile.local.json,cover_letter_templates.local.json, and browser profile data as private - Keep runtime artifacts out of commits
- Sanitize local personal content before publishing the repository
