Skip to content

omkar-platform-ai/intelligent-candidate-discovery

Repository files navigation

🎯 Intelligent Candidate Discovery

INDIA.RUNS Hackathon — Track 01: Data & AI Challenge | Redrob AI

Python FastAPI Streamlit License: MIT

Beyond keywords. Beyond filters. The AI brain for modern hiring.

Traditional keyword filters miss the hidden gems—candidates whose true potential is buried in behavioral signals and career trajectories. This system builds a multi-signal predictive ranking engine that finds them.


🏗 Architecture

JD (raw text)
     │
     ▼
[LLM JD Parser]─────────────────────────────────────────┐
(Gemini Flash)                                           │
     │                                                   │
     ▼                                                   ▼
[Embedding Engine]                            [Structured Requirements]
(all-mpnet-base-v2)                           required_skills, seniority,
     │                                        min/max experience, domain
     ▼
[Vector DB ANN Retrieval]  ← Candidate Pool (pre-indexed embeddings)
(ChromaDB + HNSW)
     │
     ▼ Top-200 candidates
[Multi-Signal Ranking Engine]
  ├── Semantic Score     (40%) — embedding cosine similarity
  ├── Skill Match Score  (25%) — required + nice-to-have skill overlap
  ├── Behavioral Score   (20%) — recency decay, intent signals, engagement
  └── Career Score       (15%) — velocity, trajectory, hidden gem bonus
     │
     ▼ Top-20
[LLM Re-Ranker + Explainer]  ← Gemini Flash generates match explanation
(optional, adds explainability)
     │
     ▼
[Ranked Output]
  ranked_output.csv | REST API | Streamlit Demo UI

🚀 Quickstart

1. Install dependencies

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Set API key (optional — system works without it using fallback)

cp .env.example .env
# Add your Gemini API key to .env
GEMINI_API_KEY=your_key_here

3. Run the Streamlit demo

streamlit run frontend/streamlit_app.py

4. Run the API

uvicorn backend.api.main:app --reload
# Swagger UI: http://localhost:8000/docs

5. Run via Docker

docker-compose up

📊 Methodology

Signal 1: Semantic Similarity (40%)

Uses sentence-transformers/all-mpnet-base-v2 to encode both JD and candidate profiles into 768-dimensional vectors. Cosine similarity captures semantic meaning beyond keyword overlap — "built NLP pipelines" matches "natural language processing experience" even with zero shared keywords.

Signal 2: Skill Match (25%)

Fuzzy skill matching with alias normalization (tftensorflow, k8skubernetes). Separates required skills (weighted 3x) from nice-to-haves.

Signal 3: Behavioral Signals (20%)

Converts activity data into intent scores using exponential decay:

  • Recency: score = exp(-0.023 × days_since_active) — active today = 1.0, active 30 days ago ≈ 0.5
  • Engagement: Applications, profile views, job clicks
  • Intent: Open-to-work status, resume recency, active application behavior

Signal 4: Career Trajectory (15%)

Detects growth velocity and hidden potential:

  • Velocity: Reached current seniority in fewer years than expected → fast-tracker bonus
  • Progression: Promotions count, distinct title changes
  • Hidden Gem Bonus: Open-source contributions (+6%), side projects (+4%), publications (+5%), recent career switcher (+8%)

LLM Re-Ranking

Top 20 candidates are sent to Gemini Flash for contextual re-scoring and human-readable match explanations. This adds explainability — each ranked candidate comes with a one-sentence reason.


📁 Output Format

data/ranked_output.csv columns:

Column Description
rank Final rank (1 = best fit)
candidate_id Unique identifier
name Candidate name
current_title Current job title
composite_score Final weighted score (0-1)
semantic_score JD-profile semantic similarity
skill_match_score Skill overlap score
behavioral_score Activity/intent signal score
career_score Career trajectory score
confidence HIGH / MEDIUM / LOW
match_explanation LLM-generated reason for ranking
matched_skills Skills that matched the JD

🎯 Key Differentiators vs. Baseline

Feature Keyword Filter Cosine Similarity Only This System
Semantic understanding
Behavioral signals
Career trajectory
Hidden gem detection
Explainable ranking
Sub-second retrieval ⚠️ ✅ (HNSW ANN)
Tuneable weights

🛠 Tech Stack

Component Technology
LLM / Parsing Gemini 1.5 Flash
Embeddings sentence-transformers/all-mpnet-base-v2
Vector DB ChromaDB (HNSW-based ANN)
Backend API FastAPI + Uvicorn
Demo UI Streamlit
Containerization Docker + docker-compose

📁 Repository Structure

intelligent-candidate-discovery/
├── README.md
├── CLAUDE.md                      ← AI-assistant context for the repo
├── requirements.txt
├── Makefile                       ← make install · test · run-api · demo · index · rank
├── Dockerfile
├── docker-compose.yml
├── .env.example
├── intelligent_candidate_discovery_architecture.svg              ← MVP pipeline
├── intelligent_candidate_discovery_production_architecture.svg   ← Production target
├── data/
│   └── ranked_output.csv          ← Submission file (output)
├── data_tier/
│   ├── vector_store.py            ← ChromaDB HNSW ANN wrapper
│   └── fixtures/
│       ├── sample_candidates.json
│       └── sample_jd.txt
├── backend/
│   ├── api/main.py                ← FastAPI: POST /rank · /index · GET /health
│   ├── core/
│   │   ├── jd_parser.py           ← Gemini Flash · keyword fallback
│   │   ├── candidate_parser.py    ← Raw record → internal schema
│   │   └── ranking_engine.py      ← 4-signal weighted fusion
│   └── services/
│       └── embedder.py            ← sentence-transformers wrapper
├── frontend/
│   └── streamlit_app.py           ← Demo UI
├── shared/
│   └── config.py                  ← All tuneable parameters
├── docs/
│   ├── architecture.md            ← MVP layers + module map
│   └── methodology.md             ← Per-signal scoring formulas
└── tests/
    ├── test_jd_parser.py
    ├── test_ranking_engine.py
    ├── test_behavioral_scorer.py
    └── test_skill_matcher.py

🗺 MVP vs Production

Two architecture diagrams ship in the repo:

Diagram Purpose
intelligent_candidate_discovery_architecture.svg The MVP pipeline running in this repo
intelligent_candidate_discovery_production_architecture.svg The production target: 18 MVP components running in a 10-service docker-compose stack · 8 production-shaped stubs · 6 honest deferrals (ATS webhooks, scrapers, HRIS CDC, BigQuery, Feast, drift pipelines)

See docs/architecture.md for the layer-by-layer walkthrough.


🏆 Built for INDIA.RUNS Hackathon

Track 01: The Data & AI Challenge — Intelligent Candidate Discovery
Redrob AI × Hack2Skill | 42-Day Challenge | ₹10 Lakh Prize Pool

Built by Omkar — solving India's talent discovery problem with AI

About

Multi-signal AI candidate ranking engine — INDIA.RUNS Hackathon, Team Velocity Labs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages