This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
SoupBoss is a comprehensive AI-powered job matching platform with both CLI and web interfaces. It intelligently matches resumes to job postings using semantic embeddings and provides detailed similarity scoring.
ONLY use uv for package management, NEVER pip:
- Install packages:
uv add package - Run tools:
uv run tool - Run the main application:
uv run python main.py
FORBIDDEN commands:
uv pip install@latestsyntax- Any direct pip usage
# Start the web interface - provides full GUI experience
uv run python webapp_manager.py restart
# Or use the legacy script: ./restart_webapp.sh
# Access at: http://localhost:5000# Run the main CLI application
uv run python main.py# Check system status and configuration
uv run python main.py status
# Test embedding functionality
uv run python main.py test-embedding
# Generate comprehensive reports
uv run python main.py report --format html
# Run speed tests and model comparisons
uv run python main.py match speed-test
uv run python main.py match compare-models# 1. Add companies and test job board availability
uv run python main.py companies add spacex --source greenhouse
uv run python main.py companies add leverdemo --source lever
uv run python main.py companies add dynatrace1 --source smartrecruiters
# 2. Fetch jobs from all sources
uv run python main.py jobs fetch --source greenhouse --company spacex
uv run python main.py jobs fetch --source lever --company leverdemo
uv run python main.py jobs fetch --source smartrecruiters --company dynatrace1
# 3. Add and manage resumes
uv run python main.py resumes add /path/to/resume.pdf --name "Senior Developer Resume"
uv run python main.py resumes add /path/to/resume2.docx --name "DevOps Resume"
uv run python main.py resumes list
# 4. Generate embeddings and run matching
uv run python main.py match generate # Generate embeddings for all content
uv run python main.py match generate --force # Force regenerate (fixes model compatibility)
uv run python main.py match run # Run similarity matching
# 5. View and export results
uv run python main.py match show 1 # Show matches for resume ID 1
uv run python main.py match show 1 --limit 10 # Show top 10 matches
uv run python main.py match export --format json --limit 50
uv run python main.py match export --format csv --output results.csv
uv run python main.py match export --format html --output report.html
# 6. System maintenance
uv run python main.py maintenance cleanup # Clean up old data
uv run python main.py maintenance backup # Backup database
uv run python main.py maintenance optimize # Optimize database| Feature | Web Interface | CLI |
|---|---|---|
| Ease of Use | ✅ Point-and-click, visual | ⚡ Command-based, fast |
| Resume Upload | ✅ Drag-and-drop, instant | 📁 File paths required |
| Company Testing | ✅ One-click testing | 🔧 Manual command execution |
| Progress Updates | ✅ Real-time WebSocket updates | 📊 Terminal progress bars |
| Results Display | ✅ Beautiful cards, sorting | 📋 Table format |
| Error Handling | ✅ Smart suggestions, auto-fix | 🔍 Manual troubleshooting |
| Remote Access | ✅ Any device on network | 🖥️ Server access required |
| Batch Operations | ⚡ Manual, step-by-step | ✅ Scriptable, automation |
| Advanced Features | 🔧 Core features only | ✅ Full feature access |
| Learning Curve | 🟢 Beginner-friendly | 🟡 Requires CLI knowledge |
Recommendation: Use the web interface for daily operations and the CLI for automation, batch processing, and advanced features.
# Test if company has job board
uv run python greenhouse_fetch.py -test spacex
uv run python lever_fetch.py -test leverdemo
uv run python smartrecruiters_fetch.py -test dynatrace1
# Fetch jobs from APIs
uv run python greenhouse_fetch.py -fetch spacex
uv run python lever_fetch.py -fetch leverdemo
uv run python smartrecruiters_fetch.py -fetch dynatrace1
# Import Disney data files
uv run python disney_data_importer.py -file disney_workday_html_100.json
# Bulk process from file
uv run python greenhouse_fetch.py -in companies.txt -split -out ./data/SoupBoss is an intelligent job matching system with a complete CLI interface organized into 7 command groups:
- cli.py - Complete Click-based CLI with 40+ commands across 7 groups
- db.py - SQLite database with vector support using SoupBossDB class
- embeddings.py - Ollama client integration with nomic-embed-text model
- ingestion.py - Unified job fetching from Greenhouse and Lever APIs
- matching.py - Intelligence engine with embedding generation and cosine similarity
- export.py - Professional export system (CSV/JSON/HTML)
- maintenance.py - Data cleanup, backup, optimization, and system reset
- config.py - Configuration management with .env and JSON persistence
- Ingestion: Jobs fetched from APIs (Greenhouse, Lever, SmartRecruiters) and Disney data files
- Storage: SQLite database with vector extensions for embeddings
- Processing: Multi-format resume processing (PDF/DOCX/TXT/MD)
- AI Matching: Semantic embeddings via Ollama + cosine similarity scoring
- Export: Professional reporting in CSV/JSON/HTML formats
jobs- Job management and ingestion from APIscompanies- Company source managementresumes- Resume file management and processingmatch- AI matching operations and similarity scoringmaintenance- System maintenance and data cleanupconfig- Configuration management (.env and JSON)
.env- Environment variables for runtime configurationsoupboss.config.json- Persistent JSON configuration settingsdata/soupboss.db- SQLite database with vector extensions
- Python 3.13+ required
- Key dependencies: click, rich, ollama, sqlite-vec, pandas, PyPDF2, python-docx
- Entry point:
soupboss = "soupboss.cli:main"
Access the web interface at: http://localhost:5000
# New consolidated manager (recommended)
uv run python webapp_manager.py restart # Restart webapp
uv run python webapp_manager.py start # Start webapp
uv run python webapp_manager.py stop # Stop webapp
uv run python webapp_manager.py status # Check status
# Legacy script (still works)
./restart_webapp.sh
# Direct start (minimal output)
uv run python webapp.pyThe SoupBoss web interface provides a complete graphical workflow for job matching:
- ✅ Resume Upload: Drag-and-drop file upload (PDF, DOCX, TXT, MD)
- ✅ Company Testing: Test if companies have job boards before adding
- ✅ Company Management: Add companies from all three sources
- ✅ Job Fetching: Real-time job retrieval with progress monitoring
- ✅ AI Processing: Generate embeddings and run matching algorithms
- ✅ Force Regeneration: Fix embedding compatibility issues
- ✅ Professional Job Cards: Beautiful cards with similarity scores
- ✅ Smart Sorting: Sort by similarity, company, title, or date
- ✅ Detailed Information: Job title, company, location, department, resume name
- ✅ Color-coded Scores: Green (high), orange (medium), red (low) similarity
- ✅ Summary Statistics: Total matches, average similarity, current sort method
- Upload multiple resumes in various formats (PDF, DOCX, TXT, MD)
- Automatic text extraction and preprocessing
- Resume naming and management
- Compatible with existing CLI resume database
- Multi-source Support: Greenhouse, Lever, SmartRecruiters
- Company Testing: Verify job board availability before fetching
- Real-time Fetching: Live progress updates via WebSocket
- Bulk Processing: Handle thousands of job postings efficiently
- Semantic Embeddings: Uses Ollama with embeddinggemma:300m model
- Intelligent Scoring: Cosine similarity scoring (0.0-1.0 range)
- Force Regeneration: Automatically fixes embedding dimension mismatches
- Smart Error Handling: Detects and suggests fixes for common issues
- Professional Cards: Each job shows:
- Job title (prominent heading)
- Company name with building icon
- Location with map marker
- Department/team with org chart icon
- Matching resume with user icon
- Color-coded similarity percentage
- Interactive Sorting: Live reordering by similarity, company, title, or date
- Summary Information: Match count, average score, current sort method
Network Access: Works from any computer on your network
- Local:
http://localhost:5000 - Network:
http://[server-ip]:5000(e.g.,http://192.168.1.115:5000)
The ./restart_webapp.sh script provides bulletproof process management:
- Kills all existing webapp processes using multiple methods
- Frees port 5000 completely
- Starts webapp cleanly with proper error handling
- Provides clear status feedback throughout
- WebSocket Integration: Live progress updates for long operations
- Progress Monitoring: Real-time logs in the activity panel
- Error Handling: Smart error detection with suggested solutions
- Status Updates: Connection status and system health monitoring
- Real-time operation logs with timestamps
- Color-coded messages (success=green, error=red, info=blue, warning=orange)
- Automatic scrolling and log size management
- Clear operation status and progress tracking
- Embedding Mismatch Detection: Automatically detects model compatibility issues
- Smart Suggestions: Provides actionable solutions for common problems
- Graceful Fallbacks: Falls back to alternative methods when primary approaches fail
- User-friendly Messages: Clear error descriptions without technical jargon
- Flask 3.1.2+: Web framework with JSON API endpoints
- Flask-SocketIO 5.5.1+: Real-time WebSocket communication
- Subprocess Integration: Calls existing SoupBoss CLI commands
- File Management: Temporary file handling with automatic cleanup
- JSON Processing: Smart parsing of CLI output into structured data
- Bootstrap 5.1.3: Professional UI framework (CDN-loaded)
- Socket.IO 4.0.1: Client-side real-time communication
- JavaScript Classes: Modular, maintainable code organization
- Responsive Design: Works on desktop, tablet, and mobile devices
- User Input: Upload resumes, test companies, configure settings
- API Calls: Frontend makes requests to Flask API endpoints
- CLI Integration: Backend calls SoupBoss CLI commands via subprocess
- Real-time Updates: WebSocket events provide live progress feedback
- Data Processing: JSON export files parsed and formatted for display
- Results Presentation: Professional job cards with interactive sorting
- Efficient Processing: Handles 1000+ jobs with real-time progress
- Smart Caching: Temporary file system with automatic cleanup
- Memory Management: Streaming data processing for large datasets
- Network Optimized: Compressed data transfer and efficient APIs
- Target: Single-user local application with network access (not enterprise production)
- AI Model: Uses Ollama with embeddinggemma:300m for semantic embeddings (384-dimensional vectors)
- Database: SQLite with sqlite-vec extension for vector similarity search
- Environment: Runs on headless Ubuntu server with local compute resources
- Interfaces: Dual interface - Web GUI and CLI for different use cases
- Resume Formats: PDF, DOCX, TXT, MD with automatic text extraction
- Export Formats: CSV, JSON, HTML, PDF with customizable templates
- Data Storage: SQLite database with efficient vector indexing
- Web Server: Flask development server on port 5000
- Remote Access: Full network accessibility from any device
- Real-time Communication: WebSocket support for live updates
- Cross-platform: Works on desktop, tablet, and mobile browsers
- Scalability: Handles 1000+ job postings efficiently
- Processing Speed: Real-time embedding generation with progress tracking
- Memory Efficiency: Streaming processing for large datasets
- Network Optimized: Compressed data transfer and smart caching
- Start: Run
./restart_webapp.sh - Access: Open
http://localhost:5000in any browser - Upload: Drag-and-drop your resume files
- Test: Enter company name and click "Test Company"
- Add: Click "Add Company" if test succeeds
- Fetch: Click "Fetch Jobs" to retrieve job postings
- Process: Click "Generate Embeddings" then "Run Matching"
- View: Switch to "Results & Matches" tab to see professional job cards
- Sort: Use dropdown to sort by similarity, company, title, or date
# Complete workflow in 6 commands
uv run python main.py companies add spacex --source greenhouse
uv run python main.py jobs fetch --source greenhouse --company spacex
uv run python main.py resumes add /path/to/resume.pdf --name "My Resume"
uv run python main.py match generate
uv run python main.py match run
uv run python main.py match show 1 --limit 20- Embedding Issues: Use "Force Regenerate" button or
--forceflag - Process Conflicts: Run
./restart_webapp.shto clean restart - Remote Access: Use server IP address (e.g.,
http://192.168.1.115:5000) - Performance: Check
uv run python main.py statusfor system health
- ✅ Multi-source Job Fetching: Greenhouse, Lever, SmartRecruiters
- ✅ AI-Powered Matching: Semantic similarity scoring with 74%+ accuracy
- ✅ Professional Results: Color-coded match cards with detailed information
- ✅ Real-time Processing: Live progress updates and WebSocket communication
- ✅ Cross-platform Access: Works from any device on your network
- ✅ Robust Error Handling: Smart detection and automatic recovery suggestions
- Greenhouse:
https://boards-api.greenhouse.io/v1/boards/{company}/jobs - Lever:
https://api.lever.co/v0/postings/{company} - SmartRecruiters:
https://api.smartrecruiters.com/v1/companies/{company}/postings - Disney: JSON data files from Workday scraper
- Converts HTML job descriptions to both text and preserves HTML
- Semantic embedding generation for all content
- Cosine similarity scoring for job-resume matching
- Complete audit trail and data integrity validation