🎯 Intelligent Candidate Discovery

INDIA.RUNS Hackathon — Track 01: Data & AI Challenge | Redrob AI

Beyond keywords. Beyond filters. The AI brain for modern hiring.

Traditional keyword filters miss the hidden gems—candidates whose true potential is buried in behavioral signals and career trajectories. This system builds a multi-signal predictive ranking engine that finds them.

🏗 Architecture

JD (raw text)
     │
     ▼
[LLM JD Parser]─────────────────────────────────────────┐
(Gemini Flash)                                           │
     │                                                   │
     ▼                                                   ▼
[Embedding Engine]                            [Structured Requirements]
(all-mpnet-base-v2)                           required_skills, seniority,
     │                                        min/max experience, domain
     ▼
[Vector DB ANN Retrieval]  ← Candidate Pool (pre-indexed embeddings)
(ChromaDB + HNSW)
     │
     ▼ Top-200 candidates
[Multi-Signal Ranking Engine]
  ├── Semantic Score     (40%) — embedding cosine similarity
  ├── Skill Match Score  (25%) — required + nice-to-have skill overlap
  ├── Behavioral Score   (20%) — recency decay, intent signals, engagement
  └── Career Score       (15%) — velocity, trajectory, hidden gem bonus
     │
     ▼ Top-20
[LLM Re-Ranker + Explainer]  ← Gemini Flash generates match explanation
(optional, adds explainability)
     │
     ▼
[Ranked Output]
  ranked_output.csv | REST API | Streamlit Demo UI

🚀 Quickstart

1. Install dependencies

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Set API key (optional — system works without it using fallback)

cp .env.example .env
# Add your Gemini API key to .env
GEMINI_API_KEY=your_key_here

3. Run the Streamlit demo

streamlit run frontend/streamlit_app.py

4. Run the API

uvicorn backend.api.main:app --reload
# Swagger UI: http://localhost:8000/docs

5. Run via Docker

docker-compose up

📊 Methodology

Signal 1: Semantic Similarity (40%)

Uses sentence-transformers/all-mpnet-base-v2 to encode both JD and candidate profiles into 768-dimensional vectors. Cosine similarity captures semantic meaning beyond keyword overlap — "built NLP pipelines" matches "natural language processing experience" even with zero shared keywords.

Signal 2: Skill Match (25%)

Fuzzy skill matching with alias normalization (tf → tensorflow, k8s → kubernetes). Separates required skills (weighted 3x) from nice-to-haves.

Signal 3: Behavioral Signals (20%)

Converts activity data into intent scores using exponential decay:

Recency: score = exp(-0.023 × days_since_active) — active today = 1.0, active 30 days ago ≈ 0.5
Engagement: Applications, profile views, job clicks
Intent: Open-to-work status, resume recency, active application behavior

Signal 4: Career Trajectory (15%)

Detects growth velocity and hidden potential:

Velocity: Reached current seniority in fewer years than expected → fast-tracker bonus
Progression: Promotions count, distinct title changes
Hidden Gem Bonus: Open-source contributions (+6%), side projects (+4%), publications (+5%), recent career switcher (+8%)

LLM Re-Ranking

Top 20 candidates are sent to Gemini Flash for contextual re-scoring and human-readable match explanations. This adds explainability — each ranked candidate comes with a one-sentence reason.

📁 Output Format

data/ranked_output.csv columns:

Column	Description
rank	Final rank (1 = best fit)
candidate_id	Unique identifier
name	Candidate name
current_title	Current job title
composite_score	Final weighted score (0-1)
semantic_score	JD-profile semantic similarity
skill_match_score	Skill overlap score
behavioral_score	Activity/intent signal score
career_score	Career trajectory score
confidence	HIGH / MEDIUM / LOW
match_explanation	LLM-generated reason for ranking
matched_skills	Skills that matched the JD

🎯 Key Differentiators vs. Baseline

Feature	Keyword Filter	Cosine Similarity Only	This System
Semantic understanding	❌	✅	✅
Behavioral signals	❌	❌	✅
Career trajectory	❌	❌	✅
Hidden gem detection	❌	❌	✅
Explainable ranking	❌	❌	✅
Sub-second retrieval	✅	⚠️	✅ (HNSW ANN)
Tuneable weights	❌	❌	✅

🛠 Tech Stack

Component	Technology
LLM / Parsing	Gemini 1.5 Flash
Embeddings	sentence-transformers/all-mpnet-base-v2
Vector DB	ChromaDB (HNSW-based ANN)
Backend API	FastAPI + Uvicorn
Demo UI	Streamlit
Containerization	Docker + docker-compose

📁 Repository Structure

intelligent-candidate-discovery/
├── README.md
├── CLAUDE.md                      ← AI-assistant context for the repo
├── requirements.txt
├── Makefile                       ← make install · test · run-api · demo · index · rank
├── Dockerfile
├── docker-compose.yml
├── .env.example
├── intelligent_candidate_discovery_architecture.svg              ← MVP pipeline
├── intelligent_candidate_discovery_production_architecture.svg   ← Production target
├── data/
│   └── ranked_output.csv          ← Submission file (output)
├── data_tier/
│   ├── vector_store.py            ← ChromaDB HNSW ANN wrapper
│   └── fixtures/
│       ├── sample_candidates.json
│       └── sample_jd.txt
├── backend/
│   ├── api/main.py                ← FastAPI: POST /rank · /index · GET /health
│   ├── core/
│   │   ├── jd_parser.py           ← Gemini Flash · keyword fallback
│   │   ├── candidate_parser.py    ← Raw record → internal schema
│   │   └── ranking_engine.py      ← 4-signal weighted fusion
│   └── services/
│       └── embedder.py            ← sentence-transformers wrapper
├── frontend/
│   └── streamlit_app.py           ← Demo UI
├── shared/
│   └── config.py                  ← All tuneable parameters
├── docs/
│   ├── architecture.md            ← MVP layers + module map
│   └── methodology.md             ← Per-signal scoring formulas
└── tests/
    ├── test_jd_parser.py
    ├── test_ranking_engine.py
    ├── test_behavioral_scorer.py
    └── test_skill_matcher.py

🗺 MVP vs Production

Two architecture diagrams ship in the repo:

Diagram	Purpose
`intelligent_candidate_discovery_architecture.svg`	The MVP pipeline running in this repo
`intelligent_candidate_discovery_production_architecture.svg`	The production target: 18 MVP components running in a 10-service `docker-compose` stack · 8 production-shaped stubs · 6 honest deferrals (ATS webhooks, scrapers, HRIS CDC, BigQuery, Feast, drift pipelines)

See docs/architecture.md for the layer-by-layer walkthrough.

🏆 Built for INDIA.RUNS Hackathon

Track 01: The Data & AI Challenge — Intelligent Candidate Discovery
Redrob AI × Hack2Skill | 42-Day Challenge | ₹10 Lakh Prize Pool

Built by Omkar — solving India's talent discovery problem with AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 Intelligent Candidate Discovery

INDIA.RUNS Hackathon — Track 01: Data & AI Challenge | Redrob AI

🏗 Architecture

🚀 Quickstart

1. Install dependencies

2. Set API key (optional — system works without it using fallback)

3. Run the Streamlit demo

4. Run the API

5. Run via Docker

📊 Methodology

Signal 1: Semantic Similarity (40%)

Signal 2: Skill Match (25%)

Signal 3: Behavioral Signals (20%)

Signal 4: Career Trajectory (15%)

LLM Re-Ranking

📁 Output Format

🎯 Key Differentiators vs. Baseline

🛠 Tech Stack

📁 Repository Structure

🗺 MVP vs Production

🏆 Built for INDIA.RUNS Hackathon

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude		.claude
alembic		alembic
backend		backend
data		data
data_tier		data_tier
docs		docs
frontend		frontend
shared		shared
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
alembic.ini		alembic.ini
challenge-overview.md		challenge-overview.md
docker-compose.yml		docker-compose.yml
intelligent_candidate_discovery_architecture.svg		intelligent_candidate_discovery_architecture.svg
intelligent_candidate_discovery_production_architecture.svg		intelligent_candidate_discovery_production_architecture.svg
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎯 Intelligent Candidate Discovery

INDIA.RUNS Hackathon — Track 01: Data & AI Challenge | Redrob AI

🏗 Architecture

🚀 Quickstart

1. Install dependencies

2. Set API key (optional — system works without it using fallback)

3. Run the Streamlit demo

4. Run the API

5. Run via Docker

📊 Methodology

Signal 1: Semantic Similarity (40%)

Signal 2: Skill Match (25%)

Signal 3: Behavioral Signals (20%)

Signal 4: Career Trajectory (15%)

LLM Re-Ranking

📁 Output Format

🎯 Key Differentiators vs. Baseline

🛠 Tech Stack

📁 Repository Structure

🗺 MVP vs Production

🏆 Built for INDIA.RUNS Hackathon

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages