A production-style SQL portfolio project that upgrades a resume analyzer into a PostgreSQL-backed job intelligence platform for technical and scientific careers.
This version is alligned with my other projects in the following domains:
- quantum networking and scheduling
- experiment automation and benchmarking
- QAOA / QEC research prototypes
- FastAPI, Docker, Linux, CI/CD
- ML/NLP-assisted application tooling
Instead of stopping at notebook-style similarity scoring, this repo models a complete data product:
- PostgreSQL as the system of record
- Alembic migrations for schema evolution
- repeatable ingestion scripts for jobs and resume versions
- SQL analytics views/materialized views
- FastAPI endpoints for ingestion + reporting
- Streamlit dashboard for quick demos
- pytest for core scoring logic
- Docker Compose for local startup
Some of the features are as follows,
- raw documents are ingested
- data is normalized into relational tables
- skills are extracted into reusable feature tables
- matching runs are versioned
- reports are exposed through SQL views and an API
That makes it much more credible for backend, data, analytics engineering, platform, or technical product roles.
Career Intelligence Platform answers questions like:
- Which jobs fit the current CV best?
- Which missing skills appear most often by role family?
- How does fit change after a new resume version?
- Which roles best match a background in quantum software, experiment automation, scientific ML, and backend systems?
flowchart LR
A[data/sample_resume.md or uploaded resume] --> B[scripts/ingest_resume.py]
C[data/sample_jobs.csv or external jobs CSV] --> D[scripts/ingest_jobs.py]
B --> E[(PostgreSQL)]
D --> E
E --> F[match_runs + job_matches + match_skill_gaps]
F --> G[analytics views / materialized views]
E --> H[FastAPI]
H --> I[Streamlit dashboard]
Core tables:
candidate_profilesresume_versionsskillsskill_aliasesresume_skill_mentionscompaniesrole_familiesjob_postingsjob_skill_requirementsmatch_runsjob_matchesmatch_skill_gapsapplication_events
Professional touches included:
- primary / foreign keys
- uniqueness constraints
- indexed timestamps
- role family normalization
- weighted skill requirements
- analytics views in a dedicated schema
See docs/schema.md for the detailed model.
career-intelligence-postgres-api/
app/
alembic/
dashboard/
data/
docs/
scripts/
sql/
tests/
docker-compose.yml
README.md
requirements.txt
cp .env.example .env
docker compose up --builddocker compose exec api alembic upgrade headdocker compose exec api python scripts/seed_data.py
docker compose exec api python scripts/create_views.py- FastAPI:
http://localhost:8000/docs - Streamlit:
http://localhost:8501
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
alembic upgrade head
python scripts/seed_data.py
uvicorn app.main:app --reloadcurl -X POST http://localhost:8000/api/v1/resumes/ingest \
-H "Content-Type: application/json" \
-d '{
"file_path": "data/sample_resume.md",
"version_label": "industry-v2"
}'curl -X POST http://localhost:8000/api/v1/jobs/ingest \
-H "Content-Type: application/json" \
-d '{"csv_path": "data/sample_jobs.csv"}'curl -X POST http://localhost:8000/api/v1/match/run \
-H "Content-Type: application/json" \
-d '{"resume_version_id": 1}'curl "http://localhost:8000/api/v1/reports/latest-matches?limit=5"curl "http://localhost:8000/api/v1/reports/top-missing-skills?role_family=quantum-software&limit=10"SELECT *
FROM analytics.v_latest_job_matches
ORDER BY score DESC
LIMIT 10;SELECT role_family_slug, skill_name, missing_count
FROM analytics.v_top_missing_skills
WHERE role_family_slug = 'quantum-software'
ORDER BY missing_count DESC, total_missing_weight DESC
LIMIT 10;REFRESH MATERIALIZED VIEW analytics.mv_role_family_fit;
SELECT role_family_slug, avg_score, jobs_evaluated
FROM analytics.mv_role_family_fit
ORDER BY avg_score DESC;- The scoring logic is explainable and intentionally transparent.
- The SQL model is stronger than a single-script ML demo because it preserves history and supports reporting.