Personalized learning paths built from your actual skill gap — not a generic curriculum.
A deterministic, explainable onboarding system that parses a candidate's resume and a target job description, identifies the exact skill gap, and generates a prerequisite-ordered learning roadmap from a fixed course catalog. Every recommendation is traceable. Nothing is hallucinated.
Corporate onboarding is static. Every new hire — regardless of experience — receives the same training sequence. Experienced hires waste hours on things they already know. Beginners get dropped into advanced modules without foundations. Neither group reaches role-specific competency efficiently.
Upload a resume. Upload a job description (or select a role template). The engine:
- Extracts skills and proficiency levels from both documents using a structured LLM call
- Normalizes extracted skills against a 104-skill canonical taxonomy (O*NET-grounded)
- Computes the exact gap using set operations — deterministic, no heuristics
- Generates a prerequisite-ordered roadmap from a fixed 61-course catalog
- Attaches a personalized reasoning trace to every recommended course
The LLM is used only for text extraction. All gap analysis and roadmap generation is deterministic code — same inputs always produce the same output.
- AI-Powered: Groq (Llama 3.3) extracts skill names and proficiency levels (Strong vs Weak) from raw text
- Deterministic Fallback: Automatically falls back to keyword matching if LLM is unavailable
- Proficiency Detection: Phrases like "familiar with Docker" or "some experience with PostgreSQL" are detected as weak proficiency — triggering refresher courses instead of foundational ones
- Normalization: Maps raw text signals to a fixed canonical taxonomy — "React.js", "ReactJS", "react-dom" all resolve to
react - Gap Analysis: Pure set operations — confirmed (strong match), weak (needs refresher), missing (not present)
- Prerequisite Expansion: Recursively pulls in foundation courses when gap skills depend on them
- Topological Sort: Kahn's Algorithm guarantees prerequisites always appear before dependent courses
Every roadmap course includes a candidate-specific explanation using evidence from the resume itself:
"Your resume mentions 'familiar with Docker' — the role expects containerization at a production level. This course closes the depth gap."
- Hours Saved: Difference between the full 535-hour curriculum and your personalized roadmap
- Redundancy Reduction: Percentage of standard training skipped because you already know it
- Role Coverage: Real-time percentage of role requirements already met
The same engine handles both technical and operational roles:
| Domain | Example Roles |
|---|---|
| Backend | Junior/Senior Software Engineer |
| Frontend | Frontend Developer |
| Data | Data Analyst |
| DevOps | DevOps Engineer |
| Operations | Warehouse Supervisor, Operations Manager |
| Soft Skills | HR Coordinator |
Three verified test cases included in backend/data/demoScenarios.js:
| Scenario | Coverage | Result |
|---|---|---|
| Strong Match | 100% | NO_GAP_FOUND — candidate is role-ready |
| Significant Gaps | 42% | 6-course roadmap, 48h, full reasoning traces |
| Uncovered Gaps | 67% | 2 courses + catalog limitation notice |
The engine runs in 4 deterministic stages after LLM extraction:
Stage 1 — Normalization Raw skill strings are matched against a 104-skill canonical taxonomy using alias lookup. Unrecognized skills are flagged and excluded — they never enter the algorithm.
Stage 2 — Gap Analysis Pure set operations on normalized skill IDs:
confirmedSkills = intersection(candidateSkills, roleSkills)
weakSkills = role skills where candidate proficiency = "weak"
gapSkills = difference(roleSkills, candidateSkills)
coveragePercent = (confirmed + weak) / roleTotal × 100
Stage 3 — Roadmap Generation For each gap/weak skill, the lowest-level matching course is selected from the fixed catalog. Prerequisites are resolved recursively using a visited set to prevent duplicates. Kahn's algorithm performs topological sort — no course appears before its dependencies.
Stage 4 — Reasoning Trace Every course gets a deterministic, candidate-specific reasoning string generated from templates — not from the LLM. Evidence phrases from the resume are embedded directly in the explanation.
Resume + JD (text or file)
│
▼
extractionService.js ← ONLY LLM call in the system
│ { resumeSkills: [{name, proficiency}], jdSkills: [...] }
▼
adaptivePathingService.js ← fully deterministic from here
├── Stage 1: Normalize against 104-skill taxonomy
├── Stage 2: Set-operation gap analysis
├── Stage 3A: Map gaps to catalog courses
├── Stage 3B: Recursive prerequisite expansion
├── Stage 3C: Topological sort (Kahn's algorithm)
└── Stage 3D: Candidate-specific reasoning trace
│
▼
MongoDB ← session persistence (24h TTL)
│
▼
Ordered roadmap with reasoning traces
AI Boundary: The LLM never touches recommendation logic. All course recommendations come exclusively from the seeded catalog via deterministic algorithms. Same input always produces the same roadmap.
| Layer | Technology | Version |
|---|---|---|
| Frontend | React | 18.2.0 |
| Build Tool | Vite | 5.1.4 |
| Styling | Tailwind CSS | 3.4.1 |
| Icons | Lucide React | 0.383.0 |
| Backend | Node.js | 22.12.0 |
| Framework | Express | 4.18.2 |
| Database | MongoDB | 7.0 |
| ODM | Mongoose | 8.2.1 |
| File Parsing | pdf-parse | 1.1.1 |
| LLM Primary | Groq — Llama 3.3 70B | llama-3.3-70b-versatile |
{
"express": "4.18.2",
"mongoose": "8.2.1",
"multer": "1.4.5-lts.1",
"express-validator": "7.0.1",
"express-rate-limit": "7.2.0",
"cors": "2.8.5",
"helmet": "7.1.0",
"morgan": "1.10.0",
"pdf-parse": "1.1.1",
"openai": "4.29.0",
"groq-sdk": "0.3.3",
"uuid": "9.0.0",
"dotenv": "16.4.1"
}{
"react": "18.2.0",
"react-dom": "18.2.0",
"lucide-react": "0.383.0",
"vite": "5.1.4",
"tailwindcss": "3.4.1"
}| Asset | Source | Link |
|---|---|---|
| Skill Taxonomy (104 skills) | O*NET 28.2 Database | onetcenter.org/db_releases.html |
| SOC codes referenced | 15-1252, 15-1254, 15-2051, 15-1244, 11-1021, 13-1082, 43-5061, 13-1071 | US Dept of Labor |
| Extraction validation dataset | Kaggle Resume Dataset | kaggle.com/snehaanbhawal/resume-dataset |
| LLM | Groq API — Llama 3.3 70B (open-weight) | groq.com |
| Metric | Value |
|---|---|
| Total skills in taxonomy | 104 |
| Active courses in catalog | 61 |
| Role templates | 8 |
| Full curriculum duration | 535 hours |
| Catalog coverage rate | 85.6% (89/104 skills have courses) |
| Max prerequisite chain depth | 3 levels |
| Domains covered | 6 |
- Node.js 22.12.0+
- MongoDB 7.0 (local or Docker)
- Groq API key — free at console.groq.com (no credit card required)
git clone https://github.com/Satvik-art-creator/Onboarding-Engine.git
cd onboarding-enginecd backend && npm install
cd ../frontend && npm installcd backend
cp .env.example .envOpen backend/.env and fill in your API key:
PORT=5000
NODE_ENV=development
MONGODB_URI=mongodb://localhost:27017/onboarding-engine
DB_NAME=onboarding-engine
SEED_ON_STARTUP=true
LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your-key-here
GROQ_MODEL=llama-3.3-70b-versatile
CORS_ORIGIN=http://localhost:5173# Using Docker (recommended)
docker run -d -p 27017:27017 --name mongo-onboarding mongo:7.0
# OR local MongoDB
mongod --dbpath ./data/dbcd backend && npm run seedExpected:
✓ Skills validated: 104 entries, no duplicates
✓ Courses validated: 61 entries, no cycles
✓ Role templates validated: 8 entries
✓ Seed complete — Skills: 104 | Courses: 61 | Templates: 8
# Terminal 1
cd backend && npm run dev
# Terminal 2
cd frontend && npm run devcp .env.example .env
# Add your GROQ_API_KEY to .env
docker-compose up --buildOpen http://localhost:80
| Method | Route | Description |
|---|---|---|
POST |
/api/analyze |
Full pipeline — resume + JD → roadmap |
GET |
/api/results/:sessionId |
Retrieve previous analysis |
GET |
/api/catalog/skills |
Browse skill taxonomy |
GET |
/api/catalog/courses |
Browse course catalog |
GET |
/api/catalog/templates |
List role templates |
GET |
/api/health |
Health check |
curl http://localhost:5000/api/health
curl -X POST http://localhost:5000/api/analyze \
-F "mode=text" \
-F "resume=Experienced React developer with 3 years using Node.js, PostgreSQL, and Git." \
-F "jobDescription=We need a backend engineer with Python, Docker, Kubernetes, and CI/CD experience."onboarding-engine/
├── backend/
│ ├── data/
│ │ ├── skills.seed.json ← 104-skill O*NET-grounded taxonomy
│ │ ├── courses.seed.json ← 61-course catalog, max depth 3
│ │ ├── roleTemplates.seed.json ← 8 role profiles across 6 domains
│ │ └── demoScenarios.js ← 3 verified demo inputs
│ ├── models/ ← Mongoose schemas
│ ├── routes/analyze.js ← API route definitions
│ ├── services/
│ │ ├── extractionService.js ← LLM call (only AI in the system)
│ │ ├── adaptivePathingService.js ← 4-stage deterministic algorithm
│ │ └── catalogService.js ← MongoDB data access layer
│ ├── middleware/ ← validation, rate limiting
│ ├── utils/fileParser.js ← PDF/TXT text extraction
│ └── scripts/
│ ├── seed.js ← Database seeder with validation
│ └── validateCatalog.js ← Integrity checker
├── frontend/
│ └── src/
│ ├── components/
│ │ ├── UploadForm.jsx ← File/text/template input modes
│ │ ├── SkillsPanel.jsx ← Confirmed/weak/missing breakdown
│ │ ├── MetricsBar.jsx ← ROI metrics
│ │ ├── RoadmapGraph.jsx ← Prerequisite dependency graph
│ │ ├── RoadmapView.jsx ← Ordered course list
│ │ └── RoadmapCard.jsx ← Course card with reasoning trace
│ └── api/api.js ← All fetch calls to backend
├── Dockerfile.backend
├── Dockerfile.frontend
├── docker-compose.yml
├── .env.example
└── README.md
- AI for extraction only — LLM never touches recommendation logic
- Deterministic output — same resume + same JD = same roadmap, always
- Catalog-bound — zero hallucinations, every course exists in the seeded catalog
- Prerequisite-aware — foundation courses auto-included when needed
- Evidence-based reasoning — traces use actual phrases from the candidate's resume
cd backend && npm run validate
# Expected: ✓ Catalog valid — 104 skills, 61 active coursesBuilt for the AI-Adaptive Onboarding Hackathon.