A production-grade Jupyter notebook pipeline for daily contact center reporting. Processes raw telephony platform exports and produces a supervisor coaching alert PDF, a daily performance email (HTML), and an Excel output — fully automated, no manual steps.
Note on data: This repository contains no real agent data, PII, or company identifiers. All sample outputs were generated using the included synthetic data generator. See Data & Privacy below.
The pipeline ingests five Excel exports from an 8x8-style telephony platform and runs them through a structured sequence of transformations:
- Call classification — wrap codes are matched against defined sets (live contact, voicemail, dead air, excluded) to produce clean OB/IB channel splits
- KPI calculation — per-agent metrics including productive DPH, median handle time by disposition type, wrap compliance gap, away time, on-break %, and IB contact rate
- Flag logic — agents are tiered (Immediate Attention / Watch / On Track) based on DPH target, handle time zones, away time threshold, and wrap compliance
- DPH suppression — agents below a productive-time threshold have DPH shown as N/A with an explanatory note, avoiding misleading ratios
- Output generation — coaching alert PDF (supervisor use), daily email HTML (distribution-ready), Excel summary
The pipeline handles a multi-team contact center with both outbound (dialer-driven) and inbound (queue-driven) channels, including agents working both simultaneously.
input/ # Raw telephony platform exports (Excel, skiprows=6)
login.xlsx # Agent login/logout times
interactions.xlsx # Wrap code dispositions (OB + IB calls)
statusCode.xlsx # Per-agent status code activity
timeStatus.xlsx # Time-on-status aggregates
leads_daily.xlsx # Presented/Accepted/Rejected/Abandoned by agent
staging/ # Intermediate files (cleaned, merged)
output/ # Final deliverables + master history
contact_center_analysis_patched_v9.1.ipynb # Main pipeline (44 cells, paired code/markdown)
generate_synthetic_data.py # Synthetic data generator
sanitize_outputs.py # Output sanitizer
| Cell | Role |
|---|---|
| 1–3 | Config, imports, wrap code classification sets |
| 4–9 | Data ingestion and cleaning (one cell per source file) |
| 10 | KPI calculation — all per-agent metrics derived here |
| 11 | Flag logic and agent tiering |
| 12 | 58-test validation suite — data quality checks across 9 sections |
| 13 | Coaching alert PDF generator |
| 14–16 | Excel output, master history append, agent reconciliation checks |
| 17 | Email HTML + weekly rollup generator |
| 18+ | Ad hoc analysis cells |
Wrap code classification over duration inference. Contact vs. voicemail is determined by exact wrap code match, not call length heuristics. This eliminates misclassification at the tails (very short contacts, unusually long voicemails).
Productive DPH, not raw. The primary DPH metric excludes working offline, meetings, 1:1s, and training from the denominator. Raw DPH is shown for reference. Agents below a productive-time floor have DPH suppressed to N/A rather than showing a misleadingly inflated number.
OB/IB channel separation throughout. Metrics, flags, and outputs treat outbound and inbound as distinct workstreams. Agents on both channels appear in both sections of every output.
Wrap compliance gap sourced from Leads Daily, not wrap file. The presented call count comes from the telephony platform's Leads Daily export (authoritative), not summed from the wrap code file. This avoids drift from partial dispositions and correctly scopes the gap to OB calls only.
Master history append. Each daily run appends to output/master_history.xlsx, enabling weekly rollup emails without a separate data store.
The output/ directory contains sample outputs generated from synthetic data:
Coaching Alert PDF — one page per supervisor showing agent tiers, flag reasons, and a team metrics snapshot. Sorted by tier within each channel section.
Daily Email HTML — KPI boxes for OB and IB snapshots, per-agent performance tables with color-coded handle time zones and DPH flags, hourly call volume breakdown, and methodology footnotes.
Handle time color coding:
| Color | Range | Meaning |
|---|---|---|
| 🟥 Red | < 6 min or > 18 min | Outside acceptable range |
| 🟨 Yellow | 6–9 min or 15–18 min | Approaching boundary |
| 🟩 Green | 9–15 min | Target range |
No real data is present anywhere in this repository. The repository was sanitized before publication:
- All real agent names replaced with
Agent_A01,Agent_B02, etc. - All real team names replaced with
Team_Alpha,Team_Bravo, etc. - All agent IDs replaced with synthetic hashes
- All company names, file paths, and internal references removed from notebook cells and outputs
generate_synthetic_data.py produces realistic telephony platform exports that exactly mirror the structure the notebook expects — including the 6-row platform metadata header that the notebook skips on ingest.
# Generate one day of data (default: today)
python generate_synthetic_data.py
# Generate a specific date
python generate_synthetic_data.py --date 2026-01-15
# Generate multiple days (archives dated copies, writes current set)
python generate_synthetic_data.py --days 5 --date 2026-01-13The generator produces five files matching the notebook's expected input schema: login.xlsx, interactions.xlsx, statusCode.xlsx, timeStatus.xlsx, leads_daily.xlsx.
A shared absence set ensures consistent agent rosters across all five files for a given day — roughly 10% of agents are absent on any given generated day.
pip install -r requirements.txtDependencies include: pandas, openpyxl, jinja2, reportlab, jupyter, numpy.
-
Generate synthetic input data (or drop real exports into
input/):python generate_synthetic_data.py
-
Open the notebook:
jupyter notebook contact_center_analysis_patched_v9.1.ipynb
-
Run all cells. The pipeline is designed to run top-to-bottom without manual intervention. After Cell 12, review the test suite output — all 58 tests should pass before proceeding.
-
Outputs are written to
output/:coaching_alert_YYYY-MM-DD.pdfemail_daily_YYYY-MM-DD.htmlemail_weekly_YYYY-MM-DD.html(if master history exists)master_history.xlsx(appended)
Cell 12 runs 58 unit tests across 9 sections before any output is produced:
| Section | Checks |
|---|---|
| Data Load | Row counts, required columns, null agent IDs |
| Call Classification | Coverage, contact/VM presence, unknown call rate |
| Handle Time | Duration bounds, median ranges by disposition |
| Daily Aggregation | Deduplication, call count reconciliation |
| Calls Per Hour | CPH bounds and spot-checks |
| KPI Flags | Flag column existence and internal consistency |
| Median Handle Time | Column existence, skew direction |
| Time-on-Status | Required columns present |
| Raw-to-KPI Fidelity | OB/IB totals, contact rates, logged hours within tolerance |
If any test fails, a summary is printed with the failure reason. The pipeline should not be run to outputs with failing tests.
This pipeline mirrors a production system running on a multi-team contact center operation. The production version runs on a scheduled basis against live 8x8 exports, with outputs distributed to supervisors via automated email. The architecture deliberately avoids a database dependency — all state lives in Excel files and the master history workbook — to match the operational constraints of the environment it was built for.
The patterns here (wrap code classification, KPI flag logic, DPH suppression, OB/IB channel separation) are directly portable to a Databricks/Delta Lake stack, where the notebook cells map naturally to a medallion architecture: raw ingest → cleaned staging → KPI gold layer → output generation.
MIT