Skip to content

Satvik-art-creator/Onboarding-Engine

Repository files navigation

AI-Adaptive Onboarding Engine

Personalized learning paths built from your actual skill gap — not a generic curriculum.

A deterministic, explainable onboarding system that parses a candidate's resume and a target job description, identifies the exact skill gap, and generates a prerequisite-ordered learning roadmap from a fixed course catalog. Every recommendation is traceable. Nothing is hallucinated.


The Problem

Corporate onboarding is static. Every new hire — regardless of experience — receives the same training sequence. Experienced hires waste hours on things they already know. Beginners get dropped into advanced modules without foundations. Neither group reaches role-specific competency efficiently.

The Solution

Upload a resume. Upload a job description (or select a role template). The engine:

  1. Extracts skills and proficiency levels from both documents using a structured LLM call
  2. Normalizes extracted skills against a 104-skill canonical taxonomy (O*NET-grounded)
  3. Computes the exact gap using set operations — deterministic, no heuristics
  4. Generates a prerequisite-ordered roadmap from a fixed 61-course catalog
  5. Attaches a personalized reasoning trace to every recommended course

The LLM is used only for text extraction. All gap analysis and roadmap generation is deterministic code — same inputs always produce the same output.


Key Features

Hybrid Skill Extraction

  • AI-Powered: Groq (Llama 3.3) extracts skill names and proficiency levels (Strong vs Weak) from raw text
  • Deterministic Fallback: Automatically falls back to keyword matching if LLM is unavailable
  • Proficiency Detection: Phrases like "familiar with Docker" or "some experience with PostgreSQL" are detected as weak proficiency — triggering refresher courses instead of foundational ones

Deterministic Pathing Engine

  • Normalization: Maps raw text signals to a fixed canonical taxonomy — "React.js", "ReactJS", "react-dom" all resolve to react
  • Gap Analysis: Pure set operations — confirmed (strong match), weak (needs refresher), missing (not present)
  • Prerequisite Expansion: Recursively pulls in foundation courses when gap skills depend on them
  • Topological Sort: Kahn's Algorithm guarantees prerequisites always appear before dependent courses

Personalized Reasoning Traces

Every roadmap course includes a candidate-specific explanation using evidence from the resume itself:

"Your resume mentions 'familiar with Docker' — the role expects containerization at a production level. This course closes the depth gap."

ROI Metrics

  • Hours Saved: Difference between the full 535-hour curriculum and your personalized roadmap
  • Redundancy Reduction: Percentage of standard training skipped because you already know it
  • Role Coverage: Real-time percentage of role requirements already met

Cross-Domain Scalability

The same engine handles both technical and operational roles:

Domain Example Roles
Backend Junior/Senior Software Engineer
Frontend Frontend Developer
Data Data Analyst
DevOps DevOps Engineer
Operations Warehouse Supervisor, Operations Manager
Soft Skills HR Coordinator

Demo Scenarios

Three verified test cases included in backend/data/demoScenarios.js:

Scenario Coverage Result
Strong Match 100% NO_GAP_FOUND — candidate is role-ready
Significant Gaps 42% 6-course roadmap, 48h, full reasoning traces
Uncovered Gaps 67% 2 courses + catalog limitation notice

Skill-Gap Analysis Logic

The engine runs in 4 deterministic stages after LLM extraction:

Stage 1 — Normalization Raw skill strings are matched against a 104-skill canonical taxonomy using alias lookup. Unrecognized skills are flagged and excluded — they never enter the algorithm.

Stage 2 — Gap Analysis Pure set operations on normalized skill IDs:

confirmedSkills = intersection(candidateSkills, roleSkills)
weakSkills      = role skills where candidate proficiency = "weak"
gapSkills       = difference(roleSkills, candidateSkills)
coveragePercent = (confirmed + weak) / roleTotal × 100

Stage 3 — Roadmap Generation For each gap/weak skill, the lowest-level matching course is selected from the fixed catalog. Prerequisites are resolved recursively using a visited set to prevent duplicates. Kahn's algorithm performs topological sort — no course appears before its dependencies.

Stage 4 — Reasoning Trace Every course gets a deterministic, candidate-specific reasoning string generated from templates — not from the LLM. Evidence phrases from the resume are embedded directly in the explanation.


Architecture

Resume + JD (text or file)
         │
         ▼
extractionService.js      ← ONLY LLM call in the system
         │ { resumeSkills: [{name, proficiency}], jdSkills: [...] }
         ▼
adaptivePathingService.js ← fully deterministic from here
         ├── Stage 1: Normalize against 104-skill taxonomy
         ├── Stage 2: Set-operation gap analysis
         ├── Stage 3A: Map gaps to catalog courses
         ├── Stage 3B: Recursive prerequisite expansion
         ├── Stage 3C: Topological sort (Kahn's algorithm)
         └── Stage 3D: Candidate-specific reasoning trace
         │
         ▼
MongoDB  ← session persistence (24h TTL)
         │
         ▼
Ordered roadmap with reasoning traces

AI Boundary: The LLM never touches recommendation logic. All course recommendations come exclusively from the seeded catalog via deterministic algorithms. Same input always produces the same roadmap.


Tech Stack

Layer Technology Version
Frontend React 18.2.0
Build Tool Vite 5.1.4
Styling Tailwind CSS 3.4.1
Icons Lucide React 0.383.0
Backend Node.js 22.12.0
Framework Express 4.18.2
Database MongoDB 7.0
ODM Mongoose 8.2.1
File Parsing pdf-parse 1.1.1
LLM Primary Groq — Llama 3.3 70B llama-3.3-70b-versatile

Dependencies

Backend

{
  "express": "4.18.2",
  "mongoose": "8.2.1",
  "multer": "1.4.5-lts.1",
  "express-validator": "7.0.1",
  "express-rate-limit": "7.2.0",
  "cors": "2.8.5",
  "helmet": "7.1.0",
  "morgan": "1.10.0",
  "pdf-parse": "1.1.1",
  "openai": "4.29.0",
  "groq-sdk": "0.3.3",
  "uuid": "9.0.0",
  "dotenv": "16.4.1"
}

Frontend

{
  "react": "18.2.0",
  "react-dom": "18.2.0",
  "lucide-react": "0.383.0",
  "vite": "5.1.4",
  "tailwindcss": "3.4.1"
}

Data Sources & Citations

Asset Source Link
Skill Taxonomy (104 skills) O*NET 28.2 Database onetcenter.org/db_releases.html
SOC codes referenced 15-1252, 15-1254, 15-2051, 15-1244, 11-1021, 13-1082, 43-5061, 13-1071 US Dept of Labor
Extraction validation dataset Kaggle Resume Dataset kaggle.com/snehaanbhawal/resume-dataset
LLM Groq API — Llama 3.3 70B (open-weight) groq.com

Catalog Statistics

Metric Value
Total skills in taxonomy 104
Active courses in catalog 61
Role templates 8
Full curriculum duration 535 hours
Catalog coverage rate 85.6% (89/104 skills have courses)
Max prerequisite chain depth 3 levels
Domains covered 6

Setup Instructions

Prerequisites

  • Node.js 22.12.0+
  • MongoDB 7.0 (local or Docker)
  • Groq API key — free at console.groq.com (no credit card required)

1. Clone the Repository

git clone https://github.com/Satvik-art-creator/Onboarding-Engine.git
cd onboarding-engine

2. Install Dependencies

cd backend && npm install
cd ../frontend && npm install

3. Configure Environment

cd backend
cp .env.example .env

Open backend/.env and fill in your API key:

PORT=5000
NODE_ENV=development
MONGODB_URI=mongodb://localhost:27017/onboarding-engine
DB_NAME=onboarding-engine
SEED_ON_STARTUP=true

LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your-key-here
GROQ_MODEL=llama-3.3-70b-versatile

CORS_ORIGIN=http://localhost:5173

4. Start MongoDB

# Using Docker (recommended)
docker run -d -p 27017:27017 --name mongo-onboarding mongo:7.0

# OR local MongoDB
mongod --dbpath ./data/db

5. Seed the Database

cd backend && npm run seed

Expected:

✓ Skills validated: 104 entries, no duplicates
✓ Courses validated: 61 entries, no cycles
✓ Role templates validated: 8 entries
✓ Seed complete — Skills: 104 | Courses: 61 | Templates: 8

6. Run

# Terminal 1
cd backend && npm run dev

# Terminal 2
cd frontend && npm run dev

Open http://localhost:5173


Docker

cp .env.example .env
# Add your GROQ_API_KEY to .env

docker-compose up --build

Open http://localhost:80


API Endpoints

Method Route Description
POST /api/analyze Full pipeline — resume + JD → roadmap
GET /api/results/:sessionId Retrieve previous analysis
GET /api/catalog/skills Browse skill taxonomy
GET /api/catalog/courses Browse course catalog
GET /api/catalog/templates List role templates
GET /api/health Health check

Quick Test

curl http://localhost:5000/api/health

curl -X POST http://localhost:5000/api/analyze \
  -F "mode=text" \
  -F "resume=Experienced React developer with 3 years using Node.js, PostgreSQL, and Git." \
  -F "jobDescription=We need a backend engineer with Python, Docker, Kubernetes, and CI/CD experience."

Project Structure

onboarding-engine/
├── backend/
│   ├── data/
│   │   ├── skills.seed.json           ← 104-skill O*NET-grounded taxonomy
│   │   ├── courses.seed.json          ← 61-course catalog, max depth 3
│   │   ├── roleTemplates.seed.json    ← 8 role profiles across 6 domains
│   │   └── demoScenarios.js           ← 3 verified demo inputs
│   ├── models/                        ← Mongoose schemas
│   ├── routes/analyze.js              ← API route definitions
│   ├── services/
│   │   ├── extractionService.js       ← LLM call (only AI in the system)
│   │   ├── adaptivePathingService.js  ← 4-stage deterministic algorithm
│   │   └── catalogService.js          ← MongoDB data access layer
│   ├── middleware/                    ← validation, rate limiting
│   ├── utils/fileParser.js            ← PDF/TXT text extraction
│   └── scripts/
│       ├── seed.js                    ← Database seeder with validation
│       └── validateCatalog.js         ← Integrity checker
├── frontend/
│   └── src/
│       ├── components/
│       │   ├── UploadForm.jsx          ← File/text/template input modes
│       │   ├── SkillsPanel.jsx         ← Confirmed/weak/missing breakdown
│       │   ├── MetricsBar.jsx          ← ROI metrics
│       │   ├── RoadmapGraph.jsx        ← Prerequisite dependency graph
│       │   ├── RoadmapView.jsx         ← Ordered course list
│       │   └── RoadmapCard.jsx         ← Course card with reasoning trace
│       └── api/api.js                  ← All fetch calls to backend
├── Dockerfile.backend
├── Dockerfile.frontend
├── docker-compose.yml
├── .env.example
└── README.md

Design Principles

  1. AI for extraction only — LLM never touches recommendation logic
  2. Deterministic output — same resume + same JD = same roadmap, always
  3. Catalog-bound — zero hallucinations, every course exists in the seeded catalog
  4. Prerequisite-aware — foundation courses auto-included when needed
  5. Evidence-based reasoning — traces use actual phrases from the candidate's resume

Validate Before Demo

cd backend && npm run validate
# Expected: ✓ Catalog valid — 104 skills, 61 active courses

Built for the AI-Adaptive Onboarding Hackathon.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors