AI-Adaptive Onboarding Engine

Personalized learning paths built from your actual skill gap — not a generic curriculum.

A deterministic, explainable onboarding system that parses a candidate's resume and a target job description, identifies the exact skill gap, and generates a prerequisite-ordered learning roadmap from a fixed course catalog. Every recommendation is traceable. Nothing is hallucinated.

The Problem

Corporate onboarding is static. Every new hire — regardless of experience — receives the same training sequence. Experienced hires waste hours on things they already know. Beginners get dropped into advanced modules without foundations. Neither group reaches role-specific competency efficiently.

The Solution

Upload a resume. Upload a job description (or select a role template). The engine:

Extracts skills and proficiency levels from both documents using a structured LLM call
Normalizes extracted skills against a 104-skill canonical taxonomy (O*NET-grounded)
Computes the exact gap using set operations — deterministic, no heuristics
Generates a prerequisite-ordered roadmap from a fixed 61-course catalog
Attaches a personalized reasoning trace to every recommended course

The LLM is used only for text extraction. All gap analysis and roadmap generation is deterministic code — same inputs always produce the same output.

Key Features

Hybrid Skill Extraction

AI-Powered: Groq (Llama 3.3) extracts skill names and proficiency levels (Strong vs Weak) from raw text
Deterministic Fallback: Automatically falls back to keyword matching if LLM is unavailable
Proficiency Detection: Phrases like "familiar with Docker" or "some experience with PostgreSQL" are detected as weak proficiency — triggering refresher courses instead of foundational ones

Deterministic Pathing Engine

Normalization: Maps raw text signals to a fixed canonical taxonomy — "React.js", "ReactJS", "react-dom" all resolve to react
Gap Analysis: Pure set operations — confirmed (strong match), weak (needs refresher), missing (not present)
Prerequisite Expansion: Recursively pulls in foundation courses when gap skills depend on them
Topological Sort: Kahn's Algorithm guarantees prerequisites always appear before dependent courses

Personalized Reasoning Traces

Every roadmap course includes a candidate-specific explanation using evidence from the resume itself:

"Your resume mentions 'familiar with Docker' — the role expects containerization at a production level. This course closes the depth gap."

ROI Metrics

Hours Saved: Difference between the full 535-hour curriculum and your personalized roadmap
Redundancy Reduction: Percentage of standard training skipped because you already know it
Role Coverage: Real-time percentage of role requirements already met

Cross-Domain Scalability

The same engine handles both technical and operational roles:

Domain	Example Roles
Backend	Junior/Senior Software Engineer
Frontend	Frontend Developer
Data	Data Analyst
DevOps	DevOps Engineer
Operations	Warehouse Supervisor, Operations Manager
Soft Skills	HR Coordinator

Demo Scenarios

Three verified test cases included in backend/data/demoScenarios.js:

Scenario	Coverage	Result
Strong Match	100%	NO_GAP_FOUND — candidate is role-ready
Significant Gaps	42%	6-course roadmap, 48h, full reasoning traces
Uncovered Gaps	67%	2 courses + catalog limitation notice

Skill-Gap Analysis Logic

The engine runs in 4 deterministic stages after LLM extraction:

Stage 1 — Normalization Raw skill strings are matched against a 104-skill canonical taxonomy using alias lookup. Unrecognized skills are flagged and excluded — they never enter the algorithm.

Stage 2 — Gap Analysis Pure set operations on normalized skill IDs:

confirmedSkills = intersection(candidateSkills, roleSkills)
weakSkills      = role skills where candidate proficiency = "weak"
gapSkills       = difference(roleSkills, candidateSkills)
coveragePercent = (confirmed + weak) / roleTotal × 100

Stage 3 — Roadmap Generation For each gap/weak skill, the lowest-level matching course is selected from the fixed catalog. Prerequisites are resolved recursively using a visited set to prevent duplicates. Kahn's algorithm performs topological sort — no course appears before its dependencies.

Stage 4 — Reasoning Trace Every course gets a deterministic, candidate-specific reasoning string generated from templates — not from the LLM. Evidence phrases from the resume are embedded directly in the explanation.

Architecture

Resume + JD (text or file)
         │
         ▼
extractionService.js      ← ONLY LLM call in the system
         │ { resumeSkills: [{name, proficiency}], jdSkills: [...] }
         ▼
adaptivePathingService.js ← fully deterministic from here
         ├── Stage 1: Normalize against 104-skill taxonomy
         ├── Stage 2: Set-operation gap analysis
         ├── Stage 3A: Map gaps to catalog courses
         ├── Stage 3B: Recursive prerequisite expansion
         ├── Stage 3C: Topological sort (Kahn's algorithm)
         └── Stage 3D: Candidate-specific reasoning trace
         │
         ▼
MongoDB  ← session persistence (24h TTL)
         │
         ▼
Ordered roadmap with reasoning traces

AI Boundary: The LLM never touches recommendation logic. All course recommendations come exclusively from the seeded catalog via deterministic algorithms. Same input always produces the same roadmap.

Tech Stack

Layer	Technology	Version
Frontend	React	18.2.0
Build Tool	Vite	5.1.4
Styling	Tailwind CSS	3.4.1
Icons	Lucide React	0.383.0
Backend	Node.js	22.12.0
Framework	Express	4.18.2
Database	MongoDB	7.0
ODM	Mongoose	8.2.1
File Parsing	pdf-parse	1.1.1
LLM Primary	Groq — Llama 3.3 70B	llama-3.3-70b-versatile

Dependencies

Backend

{
  "express": "4.18.2",
  "mongoose": "8.2.1",
  "multer": "1.4.5-lts.1",
  "express-validator": "7.0.1",
  "express-rate-limit": "7.2.0",
  "cors": "2.8.5",
  "helmet": "7.1.0",
  "morgan": "1.10.0",
  "pdf-parse": "1.1.1",
  "openai": "4.29.0",
  "groq-sdk": "0.3.3",
  "uuid": "9.0.0",
  "dotenv": "16.4.1"
}

Frontend

{
  "react": "18.2.0",
  "react-dom": "18.2.0",
  "lucide-react": "0.383.0",
  "vite": "5.1.4",
  "tailwindcss": "3.4.1"
}

Data Sources & Citations

Asset	Source	Link
Skill Taxonomy (104 skills)	O*NET 28.2 Database	onetcenter.org/db_releases.html
SOC codes referenced	15-1252, 15-1254, 15-2051, 15-1244, 11-1021, 13-1082, 43-5061, 13-1071	US Dept of Labor
Extraction validation dataset	Kaggle Resume Dataset	kaggle.com/snehaanbhawal/resume-dataset
LLM	Groq API — Llama 3.3 70B (open-weight)	groq.com

Catalog Statistics

Metric	Value
Total skills in taxonomy	104
Active courses in catalog	61
Role templates	8
Full curriculum duration	535 hours
Catalog coverage rate	85.6% (89/104 skills have courses)
Max prerequisite chain depth	3 levels
Domains covered	6

Setup Instructions

Prerequisites

Node.js 22.12.0+
MongoDB 7.0 (local or Docker)
Groq API key — free at console.groq.com (no credit card required)

1. Clone the Repository

git clone https://github.com/Satvik-art-creator/Onboarding-Engine.git
cd onboarding-engine

2. Install Dependencies

cd backend && npm install
cd ../frontend && npm install

3. Configure Environment

cd backend
cp .env.example .env

Open backend/.env and fill in your API key:

PORT=5000
NODE_ENV=development
MONGODB_URI=mongodb://localhost:27017/onboarding-engine
DB_NAME=onboarding-engine
SEED_ON_STARTUP=true

LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your-key-here
GROQ_MODEL=llama-3.3-70b-versatile

CORS_ORIGIN=http://localhost:5173

4. Start MongoDB

# Using Docker (recommended)
docker run -d -p 27017:27017 --name mongo-onboarding mongo:7.0

# OR local MongoDB
mongod --dbpath ./data/db

5. Seed the Database

cd backend && npm run seed

Expected:

✓ Skills validated: 104 entries, no duplicates
✓ Courses validated: 61 entries, no cycles
✓ Role templates validated: 8 entries
✓ Seed complete — Skills: 104 | Courses: 61 | Templates: 8

6. Run

# Terminal 1
cd backend && npm run dev

# Terminal 2
cd frontend && npm run dev

Open http://localhost:5173

Docker

cp .env.example .env
# Add your GROQ_API_KEY to .env

docker-compose up --build

Open http://localhost:80

API Endpoints

Method	Route	Description
`POST`	`/api/analyze`	Full pipeline — resume + JD → roadmap
`GET`	`/api/results/:sessionId`	Retrieve previous analysis
`GET`	`/api/catalog/skills`	Browse skill taxonomy
`GET`	`/api/catalog/courses`	Browse course catalog
`GET`	`/api/catalog/templates`	List role templates
`GET`	`/api/health`	Health check

Quick Test

curl http://localhost:5000/api/health

curl -X POST http://localhost:5000/api/analyze \
  -F "mode=text" \
  -F "resume=Experienced React developer with 3 years using Node.js, PostgreSQL, and Git." \
  -F "jobDescription=We need a backend engineer with Python, Docker, Kubernetes, and CI/CD experience."

Project Structure

onboarding-engine/
├── backend/
│   ├── data/
│   │   ├── skills.seed.json           ← 104-skill O*NET-grounded taxonomy
│   │   ├── courses.seed.json          ← 61-course catalog, max depth 3
│   │   ├── roleTemplates.seed.json    ← 8 role profiles across 6 domains
│   │   └── demoScenarios.js           ← 3 verified demo inputs
│   ├── models/                        ← Mongoose schemas
│   ├── routes/analyze.js              ← API route definitions
│   ├── services/
│   │   ├── extractionService.js       ← LLM call (only AI in the system)
│   │   ├── adaptivePathingService.js  ← 4-stage deterministic algorithm
│   │   └── catalogService.js          ← MongoDB data access layer
│   ├── middleware/                    ← validation, rate limiting
│   ├── utils/fileParser.js            ← PDF/TXT text extraction
│   └── scripts/
│       ├── seed.js                    ← Database seeder with validation
│       └── validateCatalog.js         ← Integrity checker
├── frontend/
│   └── src/
│       ├── components/
│       │   ├── UploadForm.jsx          ← File/text/template input modes
│       │   ├── SkillsPanel.jsx         ← Confirmed/weak/missing breakdown
│       │   ├── MetricsBar.jsx          ← ROI metrics
│       │   ├── RoadmapGraph.jsx        ← Prerequisite dependency graph
│       │   ├── RoadmapView.jsx         ← Ordered course list
│       │   └── RoadmapCard.jsx         ← Course card with reasoning trace
│       └── api/api.js                  ← All fetch calls to backend
├── Dockerfile.backend
├── Dockerfile.frontend
├── docker-compose.yml
├── .env.example
└── README.md

Design Principles

AI for extraction only — LLM never touches recommendation logic
Deterministic output — same resume + same JD = same roadmap, always
Catalog-bound — zero hallucinations, every course exists in the seeded catalog
Prerequisite-aware — foundation courses auto-included when needed
Evidence-based reasoning — traces use actual phrases from the candidate's resume

Validate Before Demo

cd backend && npm run validate
# Expected: ✓ Catalog valid — 104 skills, 61 active courses

Built for the AI-Adaptive Onboarding Hackathon.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.agent/skills/ui-ux-pro-max		.agent/skills/ui-ux-pro-max
backend		backend
docs		docs
frontend		frontend
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
README.md		README.md
docker-compose.yml		docker-compose.yml
dummy_jd.txt		dummy_jd.txt
dummy_resume.txt		dummy_resume.txt
sum+x.durationHours		sum+x.durationHours
x.skillsCovered)).size		x.skillsCovered)).size

Folders and files

Latest commit

History

Repository files navigation

AI-Adaptive Onboarding Engine

The Problem

The Solution

Key Features

Hybrid Skill Extraction

Deterministic Pathing Engine

Personalized Reasoning Traces

ROI Metrics

Cross-Domain Scalability

Demo Scenarios

Skill-Gap Analysis Logic

Architecture

Tech Stack

Dependencies

Backend

Frontend

Data Sources & Citations

Catalog Statistics

Setup Instructions

Prerequisites

1. Clone the Repository

2. Install Dependencies

3. Configure Environment

4. Start MongoDB

5. Seed the Database

6. Run

Docker

API Endpoints

Quick Test

Project Structure

Design Principles

Validate Before Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages