Skip to content

BigZeeke/contact_center_analytics_pipeline

Repository files navigation

Contact Center Analytics Pipeline

A production-grade Jupyter notebook pipeline for daily contact center reporting. Processes raw telephony platform exports and produces a supervisor coaching alert PDF, a daily performance email (HTML), and an Excel output — fully automated, no manual steps.

Note on data: This repository contains no real agent data, PII, or company identifiers. All sample outputs were generated using the included synthetic data generator. See Data & Privacy below.


What It Does

The pipeline ingests five Excel exports from an 8x8-style telephony platform and runs them through a structured sequence of transformations:

  • Call classification — wrap codes are matched against defined sets (live contact, voicemail, dead air, excluded) to produce clean OB/IB channel splits
  • KPI calculation — per-agent metrics including productive DPH, median handle time by disposition type, wrap compliance gap, away time, on-break %, and IB contact rate
  • Flag logic — agents are tiered (Immediate Attention / Watch / On Track) based on DPH target, handle time zones, away time threshold, and wrap compliance
  • DPH suppression — agents below a productive-time threshold have DPH shown as N/A with an explanatory note, avoiding misleading ratios
  • Output generation — coaching alert PDF (supervisor use), daily email HTML (distribution-ready), Excel summary

The pipeline handles a multi-team contact center with both outbound (dialer-driven) and inbound (queue-driven) channels, including agents working both simultaneously.


Architecture

input/                        # Raw telephony platform exports (Excel, skiprows=6)
  login.xlsx                  # Agent login/logout times
  interactions.xlsx           # Wrap code dispositions (OB + IB calls)
  statusCode.xlsx             # Per-agent status code activity
  timeStatus.xlsx             # Time-on-status aggregates
  leads_daily.xlsx            # Presented/Accepted/Rejected/Abandoned by agent

staging/                      # Intermediate files (cleaned, merged)
output/                       # Final deliverables + master history

contact_center_analysis_patched_v9.1.ipynb   # Main pipeline (44 cells, paired code/markdown)
generate_synthetic_data.py                   # Synthetic data generator
sanitize_outputs.py                          # Output sanitizer

Notebook Structure (44 cells, 1:1 code/markdown pairs)

Cell Role
1–3 Config, imports, wrap code classification sets
4–9 Data ingestion and cleaning (one cell per source file)
10 KPI calculation — all per-agent metrics derived here
11 Flag logic and agent tiering
12 58-test validation suite — data quality checks across 9 sections
13 Coaching alert PDF generator
14–16 Excel output, master history append, agent reconciliation checks
17 Email HTML + weekly rollup generator
18+ Ad hoc analysis cells

Key Design Decisions

Wrap code classification over duration inference. Contact vs. voicemail is determined by exact wrap code match, not call length heuristics. This eliminates misclassification at the tails (very short contacts, unusually long voicemails).

Productive DPH, not raw. The primary DPH metric excludes working offline, meetings, 1:1s, and training from the denominator. Raw DPH is shown for reference. Agents below a productive-time floor have DPH suppressed to N/A rather than showing a misleadingly inflated number.

OB/IB channel separation throughout. Metrics, flags, and outputs treat outbound and inbound as distinct workstreams. Agents on both channels appear in both sections of every output.

Wrap compliance gap sourced from Leads Daily, not wrap file. The presented call count comes from the telephony platform's Leads Daily export (authoritative), not summed from the wrap code file. This avoids drift from partial dispositions and correctly scopes the gap to OB calls only.

Master history append. Each daily run appends to output/master_history.xlsx, enabling weekly rollup emails without a separate data store.


Sample Outputs

The output/ directory contains sample outputs generated from synthetic data:

Coaching Alert PDF — one page per supervisor showing agent tiers, flag reasons, and a team metrics snapshot. Sorted by tier within each channel section.

Daily Email HTML — KPI boxes for OB and IB snapshots, per-agent performance tables with color-coded handle time zones and DPH flags, hourly call volume breakdown, and methodology footnotes.

Handle time color coding:

Color Range Meaning
🟥 Red < 6 min or > 18 min Outside acceptable range
🟨 Yellow 6–9 min or 15–18 min Approaching boundary
🟩 Green 9–15 min Target range

Data & Privacy

No real data is present anywhere in this repository. The repository was sanitized before publication:

  • All real agent names replaced with Agent_A01, Agent_B02, etc.
  • All real team names replaced with Team_Alpha, Team_Bravo, etc.
  • All agent IDs replaced with synthetic hashes
  • All company names, file paths, and internal references removed from notebook cells and outputs

Synthetic Data Generator

generate_synthetic_data.py produces realistic telephony platform exports that exactly mirror the structure the notebook expects — including the 6-row platform metadata header that the notebook skips on ingest.

# Generate one day of data (default: today)
python generate_synthetic_data.py

# Generate a specific date
python generate_synthetic_data.py --date 2026-01-15

# Generate multiple days (archives dated copies, writes current set)
python generate_synthetic_data.py --days 5 --date 2026-01-13

The generator produces five files matching the notebook's expected input schema: login.xlsx, interactions.xlsx, statusCode.xlsx, timeStatus.xlsx, leads_daily.xlsx.

A shared absence set ensures consistent agent rosters across all five files for a given day — roughly 10% of agents are absent on any given generated day.


Running the Pipeline

Requirements

pip install -r requirements.txt

Dependencies include: pandas, openpyxl, jinja2, reportlab, jupyter, numpy.

Steps

  1. Generate synthetic input data (or drop real exports into input/):

    python generate_synthetic_data.py
  2. Open the notebook:

    jupyter notebook contact_center_analysis_patched_v9.1.ipynb
  3. Run all cells. The pipeline is designed to run top-to-bottom without manual intervention. After Cell 12, review the test suite output — all 58 tests should pass before proceeding.

  4. Outputs are written to output/:

    • coaching_alert_YYYY-MM-DD.pdf
    • email_daily_YYYY-MM-DD.html
    • email_weekly_YYYY-MM-DD.html (if master history exists)
    • master_history.xlsx (appended)

Validation Suite (Cell 12)

Cell 12 runs 58 unit tests across 9 sections before any output is produced:

Section Checks
Data Load Row counts, required columns, null agent IDs
Call Classification Coverage, contact/VM presence, unknown call rate
Handle Time Duration bounds, median ranges by disposition
Daily Aggregation Deduplication, call count reconciliation
Calls Per Hour CPH bounds and spot-checks
KPI Flags Flag column existence and internal consistency
Median Handle Time Column existence, skew direction
Time-on-Status Required columns present
Raw-to-KPI Fidelity OB/IB totals, contact rates, logged hours within tolerance

If any test fails, a summary is printed with the failure reason. The pipeline should not be run to outputs with failing tests.


Production Context

This pipeline mirrors a production system running on a multi-team contact center operation. The production version runs on a scheduled basis against live 8x8 exports, with outputs distributed to supervisors via automated email. The architecture deliberately avoids a database dependency — all state lives in Excel files and the master history workbook — to match the operational constraints of the environment it was built for.

The patterns here (wrap code classification, KPI flag logic, DPH suppression, OB/IB channel separation) are directly portable to a Databricks/Delta Lake stack, where the notebook cells map naturally to a medallion architecture: raw ingest → cleaned staging → KPI gold layer → output generation.


License

MIT