Contact Center Analytics Pipeline

A production-grade Jupyter notebook pipeline for daily contact center reporting. Processes raw telephony platform exports and produces a supervisor coaching alert PDF, a daily performance email (HTML), and an Excel output — fully automated, no manual steps.

Note on data: This repository contains no real agent data, PII, or company identifiers. All sample outputs were generated using the included synthetic data generator. See Data & Privacy below.

What It Does

The pipeline ingests five Excel exports from an 8x8-style telephony platform and runs them through a structured sequence of transformations:

Call classification — wrap codes are matched against defined sets (live contact, voicemail, dead air, excluded) to produce clean OB/IB channel splits
KPI calculation — per-agent metrics including productive DPH, median handle time by disposition type, wrap compliance gap, away time, on-break %, and IB contact rate
Flag logic — agents are tiered (Immediate Attention / Watch / On Track) based on DPH target, handle time zones, away time threshold, and wrap compliance
DPH suppression — agents below a productive-time threshold have DPH shown as N/A with an explanatory note, avoiding misleading ratios
Output generation — coaching alert PDF (supervisor use), daily email HTML (distribution-ready), Excel summary

The pipeline handles a multi-team contact center with both outbound (dialer-driven) and inbound (queue-driven) channels, including agents working both simultaneously.

Architecture

input/                        # Raw telephony platform exports (Excel, skiprows=6)
  login.xlsx                  # Agent login/logout times
  interactions.xlsx           # Wrap code dispositions (OB + IB calls)
  statusCode.xlsx             # Per-agent status code activity
  timeStatus.xlsx             # Time-on-status aggregates
  leads_daily.xlsx            # Presented/Accepted/Rejected/Abandoned by agent

staging/                      # Intermediate files (cleaned, merged)
output/                       # Final deliverables + master history

contact_center_analysis_patched_v9.1.ipynb   # Main pipeline (44 cells, paired code/markdown)
generate_synthetic_data.py                   # Synthetic data generator
sanitize_outputs.py                          # Output sanitizer

Notebook Structure (44 cells, 1:1 code/markdown pairs)

Cell	Role
1–3	Config, imports, wrap code classification sets
4–9	Data ingestion and cleaning (one cell per source file)
10	KPI calculation — all per-agent metrics derived here
11	Flag logic and agent tiering
12	58-test validation suite — data quality checks across 9 sections
13	Coaching alert PDF generator
14–16	Excel output, master history append, agent reconciliation checks
17	Email HTML + weekly rollup generator
18+	Ad hoc analysis cells

Key Design Decisions

Wrap code classification over duration inference. Contact vs. voicemail is determined by exact wrap code match, not call length heuristics. This eliminates misclassification at the tails (very short contacts, unusually long voicemails).

Productive DPH, not raw. The primary DPH metric excludes working offline, meetings, 1:1s, and training from the denominator. Raw DPH is shown for reference. Agents below a productive-time floor have DPH suppressed to N/A rather than showing a misleadingly inflated number.

OB/IB channel separation throughout. Metrics, flags, and outputs treat outbound and inbound as distinct workstreams. Agents on both channels appear in both sections of every output.

Wrap compliance gap sourced from Leads Daily, not wrap file. The presented call count comes from the telephony platform's Leads Daily export (authoritative), not summed from the wrap code file. This avoids drift from partial dispositions and correctly scopes the gap to OB calls only.

Master history append. Each daily run appends to output/master_history.xlsx, enabling weekly rollup emails without a separate data store.

Sample Outputs

The output/ directory contains sample outputs generated from synthetic data:

Coaching Alert PDF — one page per supervisor showing agent tiers, flag reasons, and a team metrics snapshot. Sorted by tier within each channel section.

Daily Email HTML — KPI boxes for OB and IB snapshots, per-agent performance tables with color-coded handle time zones and DPH flags, hourly call volume breakdown, and methodology footnotes.

Handle time color coding:

Color	Range	Meaning
🟥 Red	< 6 min or > 18 min	Outside acceptable range
🟨 Yellow	6–9 min or 15–18 min	Approaching boundary
🟩 Green	9–15 min	Target range

Data & Privacy

No real data is present anywhere in this repository. The repository was sanitized before publication:

All real agent names replaced with Agent_A01, Agent_B02, etc.
All real team names replaced with Team_Alpha, Team_Bravo, etc.
All agent IDs replaced with synthetic hashes
All company names, file paths, and internal references removed from notebook cells and outputs

Synthetic Data Generator

generate_synthetic_data.py produces realistic telephony platform exports that exactly mirror the structure the notebook expects — including the 6-row platform metadata header that the notebook skips on ingest.

# Generate one day of data (default: today)
python generate_synthetic_data.py

# Generate a specific date
python generate_synthetic_data.py --date 2026-01-15

# Generate multiple days (archives dated copies, writes current set)
python generate_synthetic_data.py --days 5 --date 2026-01-13

The generator produces five files matching the notebook's expected input schema: login.xlsx, interactions.xlsx, statusCode.xlsx, timeStatus.xlsx, leads_daily.xlsx.

A shared absence set ensures consistent agent rosters across all five files for a given day — roughly 10% of agents are absent on any given generated day.

Running the Pipeline

Requirements

pip install -r requirements.txt

Dependencies include: pandas, openpyxl, jinja2, reportlab, jupyter, numpy.

Steps

Generate synthetic input data (or drop real exports into input/):
```
python generate_synthetic_data.py
```

Open the notebook:

jupyter notebook contact_center_analysis_patched_v9.1.ipynb

Run all cells. The pipeline is designed to run top-to-bottom without manual intervention. After Cell 12, review the test suite output — all 58 tests should pass before proceeding.
Outputs are written to output/:
- coaching_alert_YYYY-MM-DD.pdf
- email_daily_YYYY-MM-DD.html
- email_weekly_YYYY-MM-DD.html (if master history exists)
- master_history.xlsx (appended)

Validation Suite (Cell 12)

Cell 12 runs 58 unit tests across 9 sections before any output is produced:

Section	Checks
Data Load	Row counts, required columns, null agent IDs
Call Classification	Coverage, contact/VM presence, unknown call rate
Handle Time	Duration bounds, median ranges by disposition
Daily Aggregation	Deduplication, call count reconciliation
Calls Per Hour	CPH bounds and spot-checks
KPI Flags	Flag column existence and internal consistency
Median Handle Time	Column existence, skew direction
Time-on-Status	Required columns present
Raw-to-KPI Fidelity	OB/IB totals, contact rates, logged hours within tolerance

If any test fails, a summary is printed with the failure reason. The pipeline should not be run to outputs with failing tests.

Production Context

This pipeline mirrors a production system running on a multi-team contact center operation. The production version runs on a scheduled basis against live 8x8 exports, with outputs distributed to supervisors via automated email. The architecture deliberately avoids a database dependency — all state lives in Excel files and the master history workbook — to match the operational constraints of the environment it was built for.

The patterns here (wrap code classification, KPI flag logic, DPH suppression, OB/IB channel separation) are directly portable to a Databricks/Delta Lake stack, where the notebook cells map naturally to a medallion architecture: raw ingest → cleaned staging → KPI gold layer → output generation.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
input		input
output		output
staging		staging
.DS_Store		.DS_Store
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
contact_center_analysis_pipeline.ipynb		contact_center_analysis_pipeline.ipynb
generate_synthetic_data.py		generate_synthetic_data.py
gitignore		gitignore
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contact Center Analytics Pipeline

What It Does

Architecture

Notebook Structure (44 cells, 1:1 code/markdown pairs)

Key Design Decisions

Sample Outputs

Data & Privacy

Synthetic Data Generator

Running the Pipeline

Requirements

Steps

Validation Suite (Cell 12)

Production Context

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Contact Center Analytics Pipeline

What It Does

Architecture

Notebook Structure (44 cells, 1:1 code/markdown pairs)

Key Design Decisions

Sample Outputs

Data & Privacy

Synthetic Data Generator

Running the Pipeline

Requirements

Steps

Validation Suite (Cell 12)

Production Context

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages