A production-ready job search automation system. It analyzes your resume, finds matching jobs across multiple boards, researches the company's product, and drafts personalized outreach messages—all in a single, real-time pipeline.
The project consists of a FastAPI backend running a LangGraph workflow and a React frontend that displays real-time updates via Server-Sent Events (SSE).
The "brain" of the project is a directed graph where each node handles a specific step. The system iterates through jobs found in a loop.
| Node Name | Purpose | Key Inputs | Key Outputs | LLM Used? |
|---|---|---|---|---|
| resume_analyzer | Extract skills & impact from PDF. | resume_text |
resume_profile |
✅ (Sonnet) |
| job_scraper | Find jobs via Tavily/XING & extract JD. | job_title, location |
raw_jobs (List) |
✅ (Haiku) |
| match_scorer | Score match (0-100) based on resume. | profile, job[idx] |
match_result |
✅ (Sonnet) |
| product_analyzer | Research company & find product gaps. | company, JD |
product_analysis |
✅ (Sonnet) |
| hr_finder | Find recruiters/hiring managers on LinkedIn. | company, JD |
hr_contact |
✅ (GPT-4o-mini) |
| outreach_gen | Draft Email & LinkedIn messages. | All previous data | outreach_draft |
✅ (Sonnet) |
| results_persister | Save analyzed job to SQLite database. | All job data | completed_results |
❌ (No) |
| advance_job | Move to next job or end search. | current_index |
current_index + 1 |
❌ (No) |
The system is designed to be efficient and "resume-safe." It stores data at three key points:
- Resume Profile: Saved to
user_profiletable immediately after analysis. - Job Results: Saved to
run_resultstable at the end of every loop iteration. If the search is interrupted, you still have the results for jobs processed so far. - Research Caches:
product_cacheandhr_cachestore company-specific research. If two different jobs are from the same company, we don't pay for research twice.
The project uses a local SQLite database (path configured via SQLITE_DB_PATH) for persistent storage and caching.
Stores the analyzed resume and user metadata. This is a single-row table (ID=1).
| Column | Type | Description |
|---|---|---|
id |
INTEGER | Primary Key (always 1). |
email |
TEXT | User's email address. |
resume_text |
TEXT | Raw extracted text from the PDF. |
profile_json |
TEXT | Structured JSON of skills, experience, and goals. |
role_category |
TEXT | The detected professional category (e.g., "Software Engineer"). |
updated_at |
TEXT | ISO timestamp of the last update. |
Stores every job that passed the match threshold in the current session.
| Column | Type | Description |
|---|---|---|
id |
INTEGER | Primary Key (Autoincrement). |
result_json |
TEXT | Complete job object (JD, analysis, outreach drafts). |
overall_score |
REAL | The match score (0-100). |
saved_at |
TEXT | Timestamp when the job was persisted. |
Caches company research to avoid redundant LLM and Tavily API calls.
| Column | Type | Description |
|---|---|---|
company |
TEXT | Primary Key (Normalized lowercase company name). |
product_json |
TEXT | Analysis of product name, description, and gaps. |
created_at |
TEXT | Timestamp of the initial research. |
Caches recruiter contact information for companies.
| Column | Type | Description |
|---|---|---|
company |
TEXT | Primary Key (Normalized lowercase company name). |
hr_json |
TEXT | Found LinkedIn profiles, names, and titles of HR staff. |
created_at |
TEXT | Timestamp of the initial research. |
The frontend uses Zustand for state management and @microsoft/fetch-event-source for SSE.
- Real-time Streaming: The backend pushes events (
node_complete,scrape_progress,result_saved) to the frontend. - No Polling: The UI updates automatically when a job is finished—it doesn't need to ask the database.
- Persisted UI: Your resume data is saved in your browser's
localStorage, but the job results are ephemeral (cleared on refresh).
To keep costs low and results high-quality, the following limits are in place:
- Job Limit: Maximum 8 jobs processed per search.
- Recency: Only jobs posted in the last 30 days are accepted.
- Language: Only English job descriptions are processed (detected before LLM calls).
- Threshold: Only jobs with a match score >= 60 trigger the research & outreach phase.
- Python 3.11+
- Node.js 18+
- API Keys: OpenRouter, Resend, Tavily, Proxycurl
cd backend
pip install -r requirements.txt
cp .env.example .env # Add your keys here
python main.pycd frontend
npm install
npm run devWe have dedicated tests for core logic and integration:
pytest tests/test_graph_nodes.py: Node logic tests.pytest tests/test_duplicates.py: Verifies that duplicate jobs are correctly filtered by URL and Ground Truth (Company/Title).
- LLM Routing: Heavy analysis uses Claude 3.5 Sonnet, while fast data extraction uses Claude 3.5 Haiku. Simple decisions use GPT-4o-mini.
- Multi-Pass Deduplication:
- Pass 1 (Search Pass): Immediately filters out duplicate URLs and identical company/title strings from raw search results to save on LLM extraction costs.
- Pass 2 (Ground Truth Pass): After the LLM extracts the actual company and title from the job page, the system performs a second check. This catches the same job posted on different platforms (e.g., Greenhouse vs LinkedIn) with different URLs.
- Stable Job IDs: Job IDs are generated using MD5 hashes of normalized URLs. This ensures that IDs remain identical across server restarts, preventing duplicate results in the frontend if the user refreshes.
- Parallelism: Product research and HR searching happen simultaneously using LangGraph's fan-out capability.