Skip to content

Latest commit

 

History

History
45 lines (36 loc) · 2.01 KB

File metadata and controls

45 lines (36 loc) · 2.01 KB

dbt Project: Job Market Intelligence Pipeline

Overview

This dbt project implements a modern ELT pipeline to transform raw job posting data into actionable insights. It focuses on scoring job listings based on title relevance and required technical skills (SQL, Python, dbt).

Project Architecture

1. Data Ingestion (Seeds)

  • jobsjumble_sliced: Raw job postings data including titles, companies, locations, and full descriptions.

2. Staging Layer (models/staging/)

  • stg_job_postings:
    • Description: Standardizes raw seed data.
    • Operations: Renames columns for clarity, cleans whitespace.
    • Tests: not_null checks on job_title and company_name.

3. Marts Layer (models/marts/)

  • fct_job_scoring:
    • Description: The core fact table that calculates a relevance_score for each posting.
    • Logic:
      • Title Scoring: Assigns points for specific keywords like 'Analyst', 'Engineer', and 'Senior'.
      • Skill Scoring: Parses the job_description for technical keywords like 'SQL', 'Python', and 'dbt'.
    • Materialization: Table.
    • Tests: not_null check on relevance_score.

4. Data Reliability (Snapshots)

  • job_postings_snapshot:
    • Strategy: check on relevance_score.
    • Purpose: Tracks how job scoring changes over time as data is updated or refined.

Macros

  • select_state: A utility macro for filtering data by state (included for legacy/utility demonstration).

Key Features for Resume

  • Automated Testing: Implemented schema tests to ensure data integrity.
  • SCD Type 2 Modeling: Used snapshots to capture history of transformed data.
  • Complex Transformations: Logic-driven scoring system using SQL CASE statements and string parsing.
  • Layered Architecture: Separation of concerns between staging (cleaning) and marts (business logic).

Execution

Run the full pipeline using Docker:

# Seed, Run, Test, Snapshot
dbt seed && dbt run && dbt test && dbt snapshot