Urban Intelligence (CivicPulse) – City Service Analytics Pipeline

Urban Cities Live
Intelligence and Reporting

Fragmented and delayed reporting from raw 311 datasets
Limited visibility into service backlog, SLA compliance, and service equity
Data latency during peak periods such as storms, holidays, and major incidents
Lack of automated monitoring, validation, and data lineage
Reduced confidence in operational reporting due to inconsistent or delayed data

Without a robust analytics pipeline, operational decisions become slower, riskier, and harder to justify.

CivicPulse addresses these issues by creating a scalable, transparent, and cloud-native data platform that transforms raw public data into trusted operational intelligence.

3. Solution Summary

CivicPulse delivers a modern end-to-end analytics solution for city-service reporting.

Core Solution Capabilities

Scalable near real-time ingestion from the NYC 311 API
Automated orchestration, retries, and SLA monitoring using Apache Airflow
Raw and clean landing zones in Azure Blob Storage
Managed data movement and transformation using Azure Data Factory
Curated analytics-ready tables in Azure PostgreSQL
Self-service Power BI dashboards for operational and leadership reporting
Infrastructure provisioning and environment consistency using Terraform

Business Value

The solution prioritises:

Data trust
Transparency
Operational usability
Reproducibility
Faster decision-making
Explainable KPI reporting

CivicPulse helps convert raw resident requests into actionable insights across:

Request volumes
Backlogs
SLA performance
Resolution times
Geographic service distribution

4. High-Level Architecture

The pipeline follows a modern API → Lake → Warehouse → Analytics architecture pattern.

Core Components

Component	Purpose
NYC 311 API	Source system for non-emergency service request data
Apache Airflow	Workflow orchestration, retries, scheduling, SLA monitoring
Azure Blob Storage	Raw and clean landing zone for pipeline data
Azure Data Factory	Managed ETL and movement into curated layers
Azure PostgreSQL	Analytics-ready database for structured reporting
Power BI	Dashboarding and operational insight consumption
Terraform	Infrastructure as Code for Azure resource provisioning

Architecture Flow

API → Airflow → Azure Blob Storage → Azure Data Factory → Azure PostgreSQL → Power BI

Technology Stack

Python – API integration, schema handling, validation, and transformations
Apache Airflow – orchestration backbone for scheduling, retries, and alerts
Azure Blob Storage – cloud landing zone for raw and clean files
Azure Data Factory – managed orchestration for transformation and database loading
Azure PostgreSQL – curated storage for typed staging and analytics marts
Power BI – operational dashboards and KPI reporting
Terraform – Azure infrastructure provisioning using code

5. Architecture Diagram

6. Data Pipeline Logic

The CivicPulse pipeline is designed as a staged data workflow that supports reliability, traceability, and analytics readiness.

Step 1: Extract Data from NYC 311 API

Python scripts connect to the NYC 311 public API and extract service request records.

Typical extraction responsibilities include:

API connection handling
Pagination
Error handling
Schema capture
Raw response retrieval

Step 2: Load Raw Data to Azure Blob Storage

The extracted raw data is stored in the raw landing zone in Azure Blob Storage.

Purpose of raw storage:

Preserve original source records
Support auditability and reprocessing
Maintain lineage from source to reporting layer

Example raw path

storage/blob/raw2/

Step 3: Transform and Standardise Data

The raw dataset is cleaned and standardised before loading into downstream services.

Typical transformation activities include:

Column standardisation
Data type casting
Null handling
Filtering invalid records
Date formatting
Basic business rule enforcement

The clean dataset is then stored in the clean landing zone.

storage/blob/clean2/

Step 4: Load Curated Data with Azure Data Factory

Azure Data Factory moves clean data from Blob Storage into Azure PostgreSQL.

This stage typically includes:

Data mapping
Incremental or batch loading
Structured table loading
Curated layer preparation
Error logging and monitoring

Step 5: Serve Analytics Through Azure PostgreSQL

Azure PostgreSQL stores the curated analytics-ready datasets used by Power BI.

Step 6: Visualise Operational Insights in Power BI

Power BI connects to Azure PostgreSQL and presents dashboards for operational and leadership consumption.

Example insight areas include:

Service volumes by borough
Complaint type trends
Open vs closed request status
SLA compliance performance
Resolution time analysis
Backlog ageing
Daily and weekly trend analysis

7. Project Structure

urban-intelligence/
├── README.md
├── .gitignore
├── .env
├── requirements.txt
│
├── api/
│   ├── api_connect.py
│   ├── auth.py
│   ├── extract.py
│   ├── load_to_data_lake.py
│   ├── transform_data.py
│   └── load_clean_data_to_data_lake.py
│
├── airflow/
│   ├── dags/
│   │   └── civicpulse_dag.py
│   └── airflow_config.md
│
├── storage/
│   └── blob/
│       ├── raw2/
│       └── clean2/
│
├── adf/
│   ├── pipelines/
│   ├── datasets/
│   └── linked_services/
│
├── database/
│   └── postgres/
│       ├── schema.sql
│       ├── staging.sql
│       └── marts.sql
│
├── powerbi/
│   ├── dashboards/
│   ├── model.md
│   └── measures.md
│
├── terraform/
│   ├── main.tf
│   ├── provider.tf
│   ├── variables.tf
│   ├── outputs.tf
│   └── terraform.tfvars.example
│
└── docs/
    ├── architecture.md
    ├── data_lineage.md
    ├── runbook.md
    └── images/
        └── architecture.png

Structure Explanation

api/ – contains Python scripts for extraction, transformation, and loading
airflow/ – contains DAGs for orchestration
storage/ – represents Blob landing zones
adf/ – stores Azure Data Factory assets
database/ – stores SQL scripts for PostgreSQL schema and curated models
powerbi/ – contains reporting logic and dashboard documentation
terraform/ – contains Infrastructure as Code definitions
docs/ – stores project documentation and architecture assets

8. Prerequisites

Before running this project locally, make sure the following tools and services are available.

Required Tools

Python 3.10 or above
Git
Docker Desktop
Terraform
Azure CLI
Power BI Desktop
Apache Airflow environment (local or Docker-based)

Required Cloud Resources

Azure Subscription
Azure Storage Account
Azure Data Factory
Azure Database for PostgreSQL
Power BI environment

Recommended Skills

Basic Python scripting
SQL
Azure fundamentals
Power BI reporting
Terraform basics

Required Accounts:

Azure Subscription
GitHub

9. How to Run Locally

Clone the Repository

git clone https://github.com/madesina2025/urban-intelligence.git
cd urban-intelligence

### Create Python Environment

```bash
python -m venv venv
source venv/bin/activate

### Install Dependencies

```bash
pip install -r requirements.txt

### Configure Environment Variables

- Create a .env file in the project root and add the following variables:

```bash
API_URL=
AZURE_STORAGE_CONNECTION_STRING=
POSTGRES_HOST=
POSTGRES_DB=
POSTGRES_USER=
POSTGRES_PASSWORD=

'''

### Provision Azure Infrastructure

- Navigate to the Terraform directory:

```bash
cd terraform


=======

Initialize terraform

terraform init

Preview infrastructure changes:

terraform plan


=======

Apply the infrastructure:

terraform apply

This will provision:
Resource Group
Blob Storage
Azure Data Factory
Azure PostgreSQL

Run Data Ingestion

python api/api_connect.py
python api/extract.py
python api/load_to_data_lake.py

Run Data Transformation

python api/transform_data.py
python api/load_clean_data_to_data_lake.py

Trigger Airflow DAG

civicpulse_dag

Airflow orchestrates the pipeline execution.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
airflow_orchestration		airflow_orchestration
assets		assets
doc/images		doc/images
src		src
terraform		terraform
.gitignore		.gitignore
README.md		README.md
READMEOLD.md		READMEOLD.md

Folders and files

Latest commit

History

Repository files navigation

Urban Intelligence (CivicPulse) – City Service Analytics Pipeline

Table of Contents

1. Project Overview

End-to-End Flow

2. Problem Statement

3. Solution Summary

Core Solution Capabilities

Business Value

4. High-Level Architecture

Core Components

Architecture Flow

Technology Stack

5. Architecture Diagram

6. Data Pipeline Logic

Step 1: Extract Data from NYC 311 API

Step 2: Load Raw Data to Azure Blob Storage

Step 3: Transform and Standardise Data

Step 4: Load Curated Data with Azure Data Factory

Step 5: Serve Analytics Through Azure PostgreSQL

Step 6: Visualise Operational Insights in Power BI

7. Project Structure

Structure Explanation

8. Prerequisites

Required Tools

Required Cloud Resources

Recommended Skills

Required Accounts:

9. How to Run Locally

Clone the Repository

Run Data Ingestion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages