From 75cf4e49b05577c74aac4698ad9814f84b66f545 Mon Sep 17 00:00:00 2001 From: Bryan Melvida <126201239+BLMgithub@users.noreply.github.com> Date: Tue, 19 May 2026 01:49:07 +0800 Subject: [PATCH] docs: simplify and update all project docs --- README.md | 238 +++++++----------- assets/benchmarks/polars/README.md | 4 +- docs/data_extract/drive_extractor.md | 78 +++--- docs/data_pipeline/assembly_stage.md | 100 ++++---- docs/data_pipeline/contract_stage.md | 102 ++++---- docs/data_pipeline/pipeline_orchestrator.md | 96 +++---- docs/data_pipeline/publishing_stage.md | 66 ++--- docs/data_pipeline/semantic_stage.md | 86 +++---- docs/data_pipeline/validation_stage.md | 91 ++++--- docs/terraform/gcp-iac.md | 105 ++++---- .../customer_experience/dax_dictionary.md | 54 ++-- .../customer_experience/operational_guide.md | 40 +-- .../technical_architecture.md | 60 ++--- .../fulfillment_monitor/dax_dictionary.md | 76 +++--- .../fulfillment_monitor/operational_guide.md | 45 ++-- .../technical_architecture.md | 68 ++--- .../docs/product_friction/dax_dictionary.md | 134 ++++------ .../product_friction/operational_guide.md | 43 ++-- .../technical_architecture.md | 75 ++++++ .../technical_architecutre.md | 71 ------ 20 files changed, 757 insertions(+), 875 deletions(-) create mode 100644 power_bi/docs/product_friction/technical_architecture.md delete mode 100644 power_bi/docs/product_friction/technical_architecutre.md diff --git a/README.md b/README.md index 91cfe62..a30e98b 100644 --- a/README.md +++ b/README.md @@ -6,195 +6,137 @@ [![CD * Data Extractor](https://github.com/BLMgithub/operations-analytics-pipeline/actions/workflows/cd-extract.yml/badge.svg)](https://github.com/BLMgithub/operations-analytics-pipeline/actions/workflows/cd-extract.yml) ## Overview -Small to mid-sized organizations are trapped in a cycle where they outgrow the capabilities of spreadsheets but lack the technical infrastructure to migrate to databases. This creates a "structural debt" where the very tools that allow a business to be agile (spreadsheets) are the same tools that make their data fundamentally untrustworthy for scaling or reporting. +Organizations outgrowing spreadsheet-based workflows often face technical barriers when migrating to relational databases. This transition can lead to inconsistent data structures that limit scaling and reduce reporting reliability. -## The Result: Operational Decision Systems -The ultimate purpose of this defensive architecture is to serve the **Presentation Layer** with absolute reliability. By the time data reaches Power BI, it has been stripped of anomalies, structurally validated, and semantically flattened. +## System Architecture: Event-Driven Integrity -Because the pipeline utilizes **Atomic View Swaps** in BigQuery, the semantic models are updated instantaneously. This guarantees that Power BI scheduled refreshes (Import Mode) never encounter locked tables or partial data, resulting in zero downtime for business consumers. +This project delivers a highly resilient, event-driven data pipeline on Google Cloud Platform designed to defend analytical integrity through a strict Medallion architecture and automated validation gates. -> **Dynamic Sensitivity Calibration** -> ->The reporting suite features interactive "Smoke Detectors." Built with dynamic What-If parameters, these dashboards allow operators to manually adjust alert sensitivity thresholds to match changing business realities (e.g., loosening thresholds during peak holiday seasons). -> -> Explore the **[Power BI Directory](/power_bi)** to read detailed [operational guides](/power_bi/docs) or download the `.pbix` [releases](/power_bi/releases/). - -### Customer Experience & Revenue Exposure -A decision-support system designed to monitor financial risk driven by fulfillment failures. It correlates delivery delays with buyer drop-off rates, allowing leadership to quantify the "cost of friction." - -* **Calibration Mechanism:** Utilizes a **Delay Threshold Parameter** to dynamically define what constitutes a critical "Danger Zone" for segment and regional prioritization. - -![Customer Experince Image](/assets/screenshots/customer-experience-monitor.png) - -### Fulfillment Decision Monitor -An operational early-warning system that identifies logistics partners and sellers requiring immediate intervention by focusing on statistical deviations in network speed rather than absolute failure. - -* **Calibration Mechanism:** Controlled via **Standard Deviation & Slippage Thresholds** to filter out normal variance and flag true systemic slowdowns. - -![Fulfillment Decision Monitor](/assets/screenshots/seller-fulfillment-monitor.png) +### Isolated Stateless Orchestration +![pipeline-orchestration-diagram](assets/diagrams/01-pipeline-orchestration-diagram.png) -### Product Friction Monitor - -Designed to identify physical and structural fulfillment bottlenecks driven by product specifications (e.g., weight-driven outliers). - -* **Calibration Mechanism:** Leverages **Standard Deviation & Slippage Thresholds** applied to *lead-time volatility*, alerting operations teams to route oversize items to specialized freight before delays impact the customer. - -![Product Friction Monitor](/assets/screenshots/product-friction-monitor.png) - -## The Solution -This project solves that challenge by delivering a highly resilient, event-driven data pipeline on Google Cloud Platform for reliable operational analytics. It guarantees data integrity through a strict Medallion architecture (Bronze, Silver, Gold) that relies on rigid data contracts and validation gates to catch and isolate bad data early in the lifecycle. - -### Defensive Pipeline Architecture -![pipeline-orchestration-diagram](/assets/diagrams/01-pipeline-orchestration-diagram.png) +To eliminate the risk of cross-run data contamination and memory exhaustion, the system employs isolated execution environments where local compute state is strictly temporary: +* **Stateless Workspace:** Each run operates in a deterministic `run_id` workspace cleared immediately after completion. +* **Memory-Optimized Joins:** Maps 36-byte UUID strings to 4-byte UInt32 surrogates, reducing join-key memory overhead by ~16x to maintain serverless resource limits. +* **Cloud-Native Sync:** After processing the Silver (Contract) layer, the system syncs results to Cloud Storage and purges the local environment. +* **Linear Integrity Gating:** Stages are strictly gated; failure at any tier (Ingestion, Contract, or Assembly) stops downstream processing to prevent the promotion of partial or malformed data. +* **Lazy Streaming Engine:** Leverages the Polars Rust engine to process large-scale datasets within the strict memory constraints of serverless Cloud Run instances. -To eliminate the risk of cross-run data contamination and memory bloat, the pipeline employs a defensive state-management strategy where local compute environments are strictly temporary: -* **Stateless Orchestration:** Every execution operates within an isolated, deterministic `run_id` workspace that is aggressively purged post-run. -* **Primitive Integer Pipeline:** Optimizes high-volume joins by mapping 36-byte UUID strings to 4-byte UInt32 surrogates, reducing join-key memory overhead by ~16x and protecting the serverless memory ceiling. -* **Cloud Sync & Purge:** After processing data into the Silver layer, the system syncs the output to Cloud Storage, purging the local environment. -* **Historical Context Pull:** It then safely streams the complete historical state for Gold layer aggregation, ensuring every run builds analytical models in a clean, untainted environment. -* **Linear Gating:** Stages are strictly gated; failure at any tier (Ingestion, Contract, or Assembly) prevents downstream processing and ensures partial data is never promoted. -* **BigQuery Atomic Swap:** Final semantic models are delivered via Authorized Views that atomically swap pointers to new data versions, providing zero-downtime connectivity for BI consumers. -* **Resource-Optimized Compute:** Leverages a highly efficient lazy-evaluation engine to process large-scale datasets seamlessly within the strict memory constraints of serverless environments. - -### Event-Driven Cloud Infrastructure +### Serverless Infrastructure & Eventarc Triggers ![gcp-orchestration-diagram](assets/screenshots/gcp-orchestration-diagram.png) -The underlying infrastructure is entirely serverless, decoupled, and codified via Terraform to ensure reproducibility and security: -* **Orchestrated Compute:** Cloud Scheduler initiates daily extraction via Cloud Run, separating the extraction layer from the main processing logic. -* **Event-Driven Triggers:** Eventarc monitors Cloud Storage for `.success` flags, triggering the main processing job via Cloud Workflows only when extraction succeeds. -* **Zero-Trust CI/CD:** GitHub Actions leverage Workload Identity Federation (WIF) for keyless, secure deployments of all infrastructure and Cloud Run jobs. -* **Integrated Observability:** Native Cloud Logging and Cloud Monitoring provide comprehensive telemetry and automated responder alerts for pipeline health. - -## Architecture & System Design +The underlying infrastructure is entirely serverless, decoupled, and codified via Terraform: +* **Orchestrated Extraction:** Cloud Scheduler initiates daily extraction via Cloud Run, separating the extraction layer from the main processing logic. +* **Event-Driven Dispatch:** Eventarc monitors Cloud Storage for `.success` flags, triggering the main processing job via Cloud Workflows only when extraction succeeds. +* **Zero-Trust Deployment:** GitHub Actions leverage Workload Identity Federation (WIF) for secure, keyless deployments of all infrastructure and containerized jobs. -### The Medallion Data Model & Contracts +## Data Defense: The Registry Rule Engine -The pipeline does not just move data; it actively defends the analytical layer from upstream anomalies. It enforces a strict Medallion architecture governed by a registry-driven rule engine. +The pipeline actively manages upstream anomalies by enforcing a Medallion architecture governed by a registry-driven validation suite. **Bronze (Raw Snapshots)** -* **Purpose:** Immutable, un-typed snapshots of source systems. -* **State:** Temporarily downloaded into the isolated workspace. Data here is assumed to be structurally untrustworthy, containing nulls, duplicates, and orphaned records. +* **Role:** Immutable snapshots of source systems. Data here is assumed to be structurally untrustworthy, containing nulls, duplicates, or orphaned records. **Silver (The Contract Layer)** -* **Philosophy (Subtractive-Only Logic):** The pipeline never guesses, imputes, or "repairs" bad data. If a record violates the contract, it is explicitly dropped, and the loss is logged in the telemetry report. -* **Primitive Integer Pipeline:** Optimizes downstream high-volume joins by mapping 36-byte UUID strings to 4-byte UInt32 surrogates, reducing join-key memory overhead by ~16x and ensuring the pipeline stays within serverless memory constraints. -* **Role-Based Rules:** Tables are classified by role (`event_fact`, `transaction_detail`, `entity_reference`) and subjected to specific registry rules (e.g., deduplication, non-null assertions). -* **Referential Integrity (Cascade Cleanup):** The pipeline tracks invalidated parent IDs (e.g., malformed `order_id`s) and propagates them downstream. If an order is dropped, all associated child records (like line items) are cascade-dropped to prevent orphan data from polluting joins. -* **Schema Freeze:** Output files are strictly cast to predefined data types and projected to contain only approved columns before being written to Cloud Storage. +* **Primitive Integer Pipeline:** Maps 36-byte UUID strings to 4-byte UInt32 surrogates. This reduces join-key memory overhead by ~16x and ensuring the pipeline stays within serverless memory constraints. +* **Subtractive-Only Logic:** The pipeline never guesses or "repairs" bad data. Records violating the contract are explicitly dropped and logged in the telemetry report. +* **Cascade Cleanup:** The system tracks invalidated parent IDs and propagates drops downstream, ensuring child records (like line items) are removed to prevent orphaned data in joins. +* **Schema Enforcement:** Output files are strictly cast to predefined types and projected to approved columns before storage. **Gold (The Semantic Layer)** -* **Purpose:** High-fidelity analytical modeling through advanced integration and entity-centric aggregation. The Gold layer is partitioned into two distinct stages to maintain a strict separation between integration logic and business metrics. -* **Stage I: Assembly (The Analytical Backbone)** - * **Role:** Integrates normalized relational tables (`orders`, `items`, `payments`) into a unified, analytical "Event" dataset. - * **Invariants:** Guaranteed 1:1 grain per `order_id_int`. It performs analytical flattening and calculates fulfillment lead times while enforcing referential integrity (e.g., purging orders without items). - * **Dimension Extraction:** Generates strictly deduplicated reference tables for Customers and Products, ensuring a single source of truth for entity attributes. -* **Stage II: Semantic (The Business Logic Engine)** - * **Role:** Transforms unified order-grain events into specialized Fact and Dimension modules tailored for cohort and entity-centric analysis (Sellers, Customers, Products). - * **Strict Grain Enforcement:** - * **Temporal:** All fact tables are deterministically aligned to an ISO-Week grain (`W-MON`). - * **Entity-Fact:** Strictly 1 row per `(Entity_ID, order_year_week)`. - * **Entity-Dim:** Strictly 1 row per `Entity_ID`. -* **Technical Invariants:** - * **Integer Key Optimization:** Both stages leverage the Primitive Integer Pipeline for grouping and joins, maintaining a constant memory profile by avoiding string-based hash tables. - * **Schema Freeze:** Both stages output files are strictly cast to predefined data types and projected to contain only approved columns - -### Validation Gates & Deployment Integrity +* **Assembly Stage:** Integrates normalized relational tables into a unified analytical dataset, enforcing a 1:1 grain per order. +* **Semantic Stage:** Transforms events into specialized Fact and Dimension modules tailored for entity-centric analysis (Sellers, Customers, Products). +* **Strict Grain Enforcement:** Fact tables are deterministically aligned to an ISO-Week grain (`W-MON`) with exactly one row per `(Entity_ID, order_year_week)`. + +### Integrity Gates & Atomic Deployment * **Dual-Pass Validation Strategy:** - * **Initial Validation (Raw Gate):** The orchestrator evaluates raw snapshots. At this stage, `warnings` (like duplicate IDs or nulls) are tolerated and passed down to the Contract Stage for subtractive cleanup. Only fatal structural errors abort the run. - * **Post-Contract Revalidation (Silver Gate):** After contract rules are applied, the system re-runs validation. In this phase, `warnings` are escalated to fatal. Because the contract stage guarantees a clean schema, any remaining warnings trigger a terminal `RuntimeError`, halting the pipeline immediately to prevent downstream corruption. -* **Atomic Publishing Lifecycle:** - * **Staged Execution (Isolated Buffer):** The pipeline protects the Gold layer by writing intermediate analytical models to isolated temporary directories during computation. Only when all semantic modules successfully finish processing does the system execute a multi-system atomic publish. - * **Atomic Deployment (BigQuery View Swap):** This multi-system swap redirects BigQuery Authorized Views to fresh External Tables and updates the latest_version.json manifest, ensuring BI tools like Power BI always query complete, validated datasets without downtime. -* **Comprehensive Telemetry:** - * **End-to-End Traceability:** A single `run_id` is propagated through all raw snapshots, metadata logs, and published artifacts to provide absolute lineage tracking. - * **Resilient Logging:** Even in the event of a fatal crash, the orchestrator's `finally` block guarantees that partial logs and stage reports are synced back to cloud storage before the local workspace is purged, ensuring debuggability. + * **Initial Raw Gate:** Evaluates raw snapshots. Structural warnings are tolerated but passed to the Silver stage for subtractive cleanup. + * **Post-Contract Silver Gate:** Re-validates data after contract rules are applied. Remaining warnings are escalated to fatal errors, triggering a `RuntimeError` to stop downstream corruption. +* **Atomic BigQuery Publishing:** Final semantic models are delivered via Authorized Views that atomically swap pointers to new data versions. This ensures BI tools always query complete, validated datasets with no downtime during updates. ## Performance & Scalability (Cloud-Native Benchmarks) -The pipeline is explicitly engineered to process massive datasets within the rigid memory constraints of serverless compute (Cloud Run). By leveraging the Polars Rust engine (Lazy API & Streaming), the system achieves near-perfect memory density, operating consistently at the physical hardware ceiling. +By leveraging the Polars Rust engine (Lazy API), the system achieves near-optimal resource utilization within the rigid memory constraints of serverless compute. -### GCP Stress-Test Metrics (Scaling Efficiency) +### GCP Stress-Test Metrics -| 40M Snapshot (8GB / 4 vCPU) with mounted temporary disk| +| 40M Snapshot (8GB / 4 vCPU) | | :---: | -| ![engine-performance-8gb](/assets/screenshots/engine-performance-8gb-4cpu.png) | - -> Benchmark data: [`40m_stats_log.csv`](/assets/benchmarks/polars/40mrows_dataset_stats_log.csv) -> Dataset : [`Dataset Information`](/data/README.md) +| ![engine-performance-8gb](assets/screenshots/engine-performance-8gb-4cpu.png) | | Metric | Data | |:---|:---| -| Dataset |~40 Million Rows / ~5.3 GB Parquet| +| Dataset | ~40 Million Rows / ~5.3 GB Parquet | | Provision Spec | 8 GB RAM / 4 vCPU | | Efficiency (Processing) | ~307k Rows / Second | | Total Runtime (Wall-Clock) | 130 Seconds | -* **Maximized Memory Density:** Enabled by the **Primitive Integer Pipeline**, mapping 36-byte UUID strings to 4-byte UInt32 keys shrunk join-key memory overhead by ~16x. This allowed a ~5.34GB analytical model (40M rows) to easily process entirely within the 8GB RAM limit -* **Near-Linear Performance Scaling:** The Polars engine saturates the available vCPUs, yielding ultra-high throughput (307k rows/s) during streaming execution. -* **Zero-Idle Economics:** 100% serverless execution ensures zero billable time during idle periods, significantly reducing the Total Cost of Ownership (TCO) compared to dedicated cluster solutions. +* **Maximized Memory Density:** The **Primitive Integer Pipeline** allows a ~5.34GB analytical model to process within the 8GB RAM limit by shrinking join-key overhead by ~16x. +* **Near-Linear Performance Scaling:** The engine saturates available vCPUs, yielding high throughput during streaming execution. +* **Zero-Idle Economics:** 100% serverless execution ensures zero billable time during idle periods. + +### Measurement Methodology +* **Performance Profiling:** Captured from production telemetry via the pipeline's native `run_duration` metadata, calculating the precise delta between `started_at` and `completed_at` timestamps. +* **Memory Utilization:** Monitored via an integrated [`psutil.virtual_memory().used`](assets/benchmarks/polars/) profiling implementation to verify the actual resource footprint and confirm the physical ceiling for 8GB provision. -### Cost Efficiency & Free-Tier +### **Scaling Roadmap: From Serverless to Enterprise Lakehouse** -The pipeline's processing speed allows for a full analytical rebuild of 40M rows while remaining comfortably within the **GCP Cloud Run Free Tier** (180k vCPU-sec, 360k GB-sec). This means a small-to-mid-sized organization can run this production-grade pipeline multiple times a day with **zero compute costs.** +#### **Stage 1: Incremental Delta Propagation** +* **Strategy:** Transition to a "Stateless Delta Propagation" model using Polars' streaming engine to process only new `.parquet` deltas, reducing I/O and CPU time by 80-90%. -| Compute Provision | Dataset | vCPU-Seconds / Run | GB-Seconds / Run | Monthly Free-Tier Runs | -| :--- | :--- | :--- | :--- | :--- | -| **8 GB / 4 vCPU** | ~40M rows | 520 | 1,040 | **~346 Runs / Month** | -| **16 GB / 6 vCPU** | ~80M rows | 1040 | 2,773 | **~129 Runs / Month** | -| **32 GB / 8 vCPU** | ~160M rows | 2,080 | 8,320 | **~43 Runs / Month** | +#### **Stage 2: Event-Driven Real-Time Streaming** +* **Strategy:** Integrate GCS Pub/Sub notifications with Cloud Run streaming sinks to trigger sub-minute validation and assembly as files are uploaded. -> *Calculations based on verified benchmarks. Even at the highest 32GB tier, the pipeline can execute a full state rebuild over 43 times per month for $0 within the GCP free tier.* +#### **Stage 3: BigQuery "Engine-as-a-Service"** +* **Strategy:** Offload high-volume compute layers entirely to BigQuery using SQL-driven logic. Provides petabyte-scale capacity while the Python pipeline manages integrity gates. -### Measurement Methodology -* **Performance Profiling:** Captured from production telemetry via the pipeline's native `run_duration` metadata, calculating the precise delta between `started_at` and `completed_at` timestamps. -* **Memory Utilization:** Monitored via an integrated [`psutil.virtual_memory().used`](/assets/benchmarks/polars/README.md) profiling implementation to verify the actual resource footprint and confirm the physical ceiling for 8GB provision. +## System Health & Observability -### **Scaling Roadmap: From Serverless to Enterprise Lakehouse** +![ops_dashboard_monitoring](assets/screenshots/ops-analytics-pipeline-db.png) -To ensure the architecture survives the transition from millions to billions of rows, the pipeline is designed to evolve across three validated scaling paths. This roadmap prioritizes cost-efficiency at low volumes while providing a clear architectural pivot for enterprise-scale workloads. +The pipeline features a comprehensive observability suite managed natively via Google Cloud Monitoring and Cloud Logging, codified entirely in Terraform. -#### **Stage 1: Incremental Delta Propagation (Efficiency Pivot)** -* **Strategy:** Transition from a "Full Rebuild" batch model to a **Stateless Delta Propagation** model using Polars' streaming engine to process only newly arrived `.parquet` deltas. -* **Optimization:** Leverages the existing BigQuery View infrastructure to perform "Last-Mile" merging of incremental updates with the historical state, eliminating the need for redundant full-table re-reads. -* **Trade-off:** **Operational Complexity vs. Compute Cost.** Reduces GCS I/O and CPU time by 80-90% for daily runs, but requires more sophisticated state-tracking in the metadata layer. +### Monitored Telemetry & Alerting +The system tracks granular operational metrics to proactively identify resource bottlenecks and execution failures: + +* **Pipeline Job Metrics:** Tracks execution status (Success/Fail), workflow traffic, and memory allocation bottlenecks against the 8GB threshold. +* **Extractor Job Metrics:** Monitors Drive API latencies and instance billable time to track API usage costs. +* **Automated Responders:** Dispatches `CRITICAL` email alerts for ingestion failures, extractor crashes, or pipeline fatal errors (OOMs), ensuring debuggability through resilient lineage tracking. + +## Operational Intelligence: BI Decision Support + +This architecture serves the Presentation Layer with high reliability, ensuring that dashboards are built upon validated, semantically flattened data models. + +> **Dynamic Sensitivity Calibration** +> +>The reporting suite features interactive "Smoke Detectors." Built with dynamic What-If parameters, these dashboards allow operators to manually adjust alert sensitivity thresholds to match changing business realities. +> +> Explore the **[Power BI Directory](/power_bi)** to read detailed [operational guides](power_bi/docs) or download the `.pbix` [releases](power_bi/releases/). -#### **Stage 2: Event-Driven Real-Time Streaming (Latency Pivot)** -* **Strategy:** Integrate GCS Pub/Sub notifications with **Cloud Run streaming sinks** to trigger sub-minute validation and assembly. -* **Architecture:** Moves from a daily batch schedule to a continuous ingestion loop where each file upload triggers a micro-run. The BigQuery Atomic View Swap acts as the transactional boundary, ensuring dashboards always see the latest validated data without waiting for the daily window. -* **Trade-off:** **Responsiveness vs. Throughput.** Provides near real-time insights but increases the frequency of small I/O operations. +### Customer Experience & Revenue Exposure +Monitors financial risk by correlating delivery delays with buyer drop-off rates, allowing leadership to quantify the "cost of friction." -#### **Stage 3: BigQuery "Engine-as-a-Service" (The Enterprise Pivot)** -* **Strategy:** Offload the high-volume `Assemble` and `Semantic` compute layers entirely to **BigQuery (ELT Pattern)** using SQL-driven logic. -* **Scalability:** Provides an infinite scaling ceiling (Petabyte-scale) and removes all local infrastructure bounds, while the Python pipeline acts as an "Air-Traffic Controller" managing integrity gates and view swaps. -* **Trade-off:** **Scalability vs. Vendor Lock-in.** Simplifies the compute environment but moves the primary cost from serverless RAM to BigQuery slot usage. +![Customer Experince Image](assets/screenshots/customer-experience-monitor.png) +### Fulfillment Decision Monitor +An operational early-warning system focusing on statistical deviations in network speed rather than total failure to identify partners requiring intervention. -## Observability & Alerting +![Fulfillment Decision Monitor](assets/screenshots/seller-fulfillment-monitor.png) -![ops_dashboard_monitoring](/assets/screenshots/ops-analytics-pipeline-db.png) +### Product Friction Monitor +Identifies structural fulfillment bottlenecks driven by product specifications (e.g., weight outliers) to route items to specialized freight. -Operational maturity requires assuming things will eventually break. The pipeline features a comprehensive observability suite managed natively via Google Cloud Monitoring and Cloud Logging, codified entirely in Terraform. +![Product Friction Monitor](assets/screenshots/product-friction-monitor.png) -### Monitored Telemetry & Dashboards -The custom Cloud Monitoring dashboard tracks granular operational metrics to proactively identify resource bottlenecks and execution failures: +## CI/CD & Security -**Pipeline Job Metrics:** -1. **Workflow Execution Traffic:** Measures the volume of finished pipeline runs. -2. **Execution Status Ratio:** Tracks the count of `SUCCESS` vs. `FAILED` runs to monitor overall reliability. -3. **Memory Allocation Bottlenecks:** Plots the actual Cloud Run memory usage against a hardcoded 8GB horizontal threshold to visualize proximity to OOM exhaustion. +The project adheres to a strict **Zero-Trust** deployment model. -**Extractor Job Metrics:** -1. **Drive Extractor Latency:** Tracks the billable instance time of the extractor job (the most accurate proxy for API usage cost, as the extractor utilizes the Drive API continuously during runtime). -2. **Drive API Latencies (Median):** Monitors the median response times for core Google Workspace API calls (e.g., `google.apis.drive.v3.DriveFiles.Get` for extraction and `DriveFiles.List` for directory parsing). -3. **Memory Allocation Bottlenecks:** Plots the extractor's memory usage against its specific 1GB hardcoded Cloud Run threshold. +* **Workload Identity Federation (WIF):** Authenticates GitHub Actions to Google Cloud via short-lived OIDC tokens. +* **Infrastructure as Code:** IAM bindings and infrastructure are strictly managed via automated Terraform workflows. +* **Containerized Artifacts:** Codebases are packaged into Docker images and pushed to the GCP Artifact Registry only after passing CI checks. -### Automated Responders & Alerts -The system monitors specific log payloads across the infrastructure and dispatches `CRITICAL` severity email alerts to on-call responders with actionable, markdown-formatted runbooks. Alerts are configured for: -* **Ingestion Failures (`midnight_scheduler_failed`):** Detects IAM permission revokes (403) or token expiries (401) preventing the daily trigger. -* **Extraction Crashes (`extractor_crashed`):** Captures Python tracebacks if the Drive Extractor fails to pull raw data or plant the `.success` flag. -* **Orchestration Breakdowns (`pipeline_dispatch_failed`):** Catches Eventarc workflow failures if downstream routing breaks. -* **Pipeline Fatals (`pipeline_crashed`):** Detects out-of-memory (OOM) events or unhandled exceptions within the main processing logic, ensuring dashboard consumers are never silently served stale data. ## Repository Structure @@ -216,7 +158,7 @@ operations-analytics-pipeline/ │ └── run_extract.py # The Drive extractor orchestrator ├── data_pipeline/ │ ├── .shared/ # Storage adapters, IO wrappers, and registry configurations -│ ├── assembly/ # Delta merging and event mapping logic (Gold Pre-processing) +│ ├── assembly/ # Delta merging and event mapping logic │ ├── contract/ # Subtractive filtering logic (Silver Layer) │ ├── publish/ # Manages the atomic publish lifecycle of semantic datasets │ ├── semantic/ # Fact/Dimension table builders (Gold Layer) @@ -228,13 +170,5 @@ operations-analytics-pipeline/ └── power_bi/ ├── .shared/ # Global Dashboards BI assets (e.g. Themes, .json files, etc.) ├── dashboards/ # Source Control (PBIP) - └──releases # Deliverables (PBIX) -``` - -## CI/CD & Security - -The project adheres to a strict "Zero-Trust" deployment model. No permanent service account keys are generated, downloaded, or stored as GitHub Secrets. - -* **Workload Identity Federation (WIF):** GitHub Actions is authenticated to Google Cloud via short-lived, dynamically requested OIDC tokens. -* **Infrastructure as Code:** The deployment of the infrastructure and the configuration of IAM bindings are strictly managed via automated `terraform plan` and `terraform apply` workflows. -* **Containerized Artifacts:** Upon passing CI checks, the pipeline and extractor codebases are packaged into Docker images and pushed to the GCP Artifact Registry. \ No newline at end of file + └── releases # Deliverables (PBIX) +``` \ No newline at end of file diff --git a/assets/benchmarks/polars/README.md b/assets/benchmarks/polars/README.md index d79caab..4390e09 100644 --- a/assets/benchmarks/polars/README.md +++ b/assets/benchmarks/polars/README.md @@ -1,6 +1,6 @@ # Measurement Methodology -This section details the methodology used to capture the memory metrics in the [`GCP Stress-Test Metrics (Scaling Efficiency)`](/README.md#gcp-stress-test-metrics-scaling-efficiency) +This section details the methodology used to capture the memory metrics in the [`GCP Stress-Test Metrics (Scaling Efficiency)`](../../../README.md#gcp-stress-test-metrics-scaling-efficiency) The telemetry logger below was added to the orchestrator for a specific benchmarking run. @@ -28,7 +28,7 @@ finally: stop_event.set() logger_thread.join() ``` -Since `psutil` requires C-extensions to compile, the **Dockerfile** was modified to include the necessary build tools and the package itself. This allowed for benchmarking without altering the project's permanent [`requirements.txt`](/data_pipeline/requirements.txt). +Since `psutil` requires C-extensions to compile, the **Dockerfile** was modified to include the necessary build tools and the package itself. This allowed for benchmarking without altering the project's permanent [`requirements.txt`](../../../data_pipeline/requirements.txt). ```docker FROM python:3.11-slim diff --git a/docs/data_extract/drive_extractor.md b/docs/data_extract/drive_extractor.md index 37f3ccf..555c2a0 100644 --- a/docs/data_extract/drive_extractor.md +++ b/docs/data_extract/drive_extractor.md @@ -1,65 +1,63 @@ -# **Data Extractor Stage** +# Data Extractor Stage **Files:** * **Executor:** [`run_extract.py`](../../data_extract/run_extract.py) * **Logic:** [`extract_logic.py`](../../data_extract/shared/extract_logic.py) * **Utilities:** [`utils.py`](../../data_extract/shared/utils.py) -**Role:** Source Ingestion and Cloud Mirroring Gateway. +**Role:** Source Ingestion and Storage Mirroring. -## **System Contract** +## System Contract **Purpose** -Automates the secure transfer of source data from Google Drive to Google Cloud Storage (GCS). It ensures that raw inputs are preserved in an immutable archival zone while simultaneously providing a clean trigger-point for the downstream data pipeline. +Automates the transfer of source data from Google Drive to Google Cloud Storage (GCS). It preserves raw inputs in an archival zone and provides a trigger-point for the downstream data pipeline. **Invariants** -* **Folder-Level Deduplication:** Every source folder is processed exactly once. Re-execution is blocked by the existence of a `.success` marker in the archival bucket. -* **Dual-Mirroring Guarantee:** Every extracted file must be successfully written to both the **Archival Bucket** (compliance/audit) and the **Pipeline Bucket** (raw landing zone) before the extraction is considered successful. -* **Namespace Protection:** The extractor only operates on subfolders belonging to the strictly defined `PARENT_FOLDER`. It cannot "see" or extract files from the wider Drive environment. -* **Metadata Lineage:** Every extraction run generates a unique `execution_id` (UUID) and a JSON manifest documenting file names, timestamps, and status. +* **Idempotency:** Each source folder is processed once. Re-execution is prevented by checking for a `.success` marker in the archival bucket. +* **Storage Mirroring:** Extracted files are written to both the **Archival Bucket** and the **Pipeline Bucket** for a transfer to be considered successful. +* **Access Scoping:** The extractor only operates on subfolders within the defined `PARENT_FOLDER`. It cannot access files outside this scope. +* **Metadata Logging:** Each extraction run generates a unique `execution_id` and a JSON manifest documenting file names, timestamps, and status. **Inputs** -* `target_child_folder`: `str` (The identifier of the operational batch to ingest). -* **Drive Service Account**: (Credentials with read-access to the `operations-upload-folder`). +* `target_child_folder`: Identifier of the folder to ingest. +* **Drive Service Account**: Credentials with read-access to the source folder. **Outputs** -* **Archival Artifacts:** Mirror of source files in `gs://ops-archival-storage-dev/archive/{folder_name}/`. -* **Pipeline Artifacts:** Mirror of source files in `gs://ops-pipeline-storage-dev/raw/`. -* **Success Marker:** An empty `gs://.../{folder_name}.success` file used for idempotency. -* **Extraction Log:** A JSON metadata file summarizing the run. +* **Archival Artifacts:** Mirror of source files in the archival bucket. +* **Pipeline Artifacts:** Mirror of source files in the pipeline's raw landing zone. +* **Success Marker:** An empty `.success` file used for idempotency. +* **Extraction Log:** JSON metadata file summarizing the run. -## **Execution Workflow** +## Execution Workflow -The **Extractor** manages the ingestion lifecycle through the following steps: +The extractor manages the ingestion lifecycle through these steps: -1. **Deduplication Check:** Queries GCS for the success marker. If present, the job terminates immediately with a "Skipped" status. -2. **Hierarchy Resolution:** Uses the Drive API to locate the `folder_id` of the target child, ensuring it resides strictly under the authorized parent root. -3. **Manifest Fetching:** Retrieves a list of all files in the target folder, filtering out system-reserved files (e.g., `instruction.txt`). -4. **Extraction Loop:** For each valid file: - * Downloads the binary content from Google Drive into memory. - * Uploads the content to the **Archival Bucket** (long-term persistence). - * Uploads the content to the **Pipeline Bucket** (transient raw landing). -5. **Audit Persistence:** Generates and uploads the run metadata log. -6. **Marker Placement:** Upon 100% success of the file loop, writes the `.success` file to GCS. +1. **Duplicate Check:** Queries GCS for the success marker. If present, the job terminates with a "Skipped" status. +2. **Path Resolution:** Uses the Drive API to locate the target folder ID and verifies its parent root. +3. **File Discovery:** Lists files in the target folder, filtering out non-data files. +4. **Extraction Loop:** For each file: + * Downloads content from Google Drive to memory. + * Uploads content to the archival and pipeline buckets. +5. **Logging:** Generates and uploads the run metadata log. +6. **Finalization:** Writes the `.success` file to GCS after all files are successfully processed. +## Boundaries -## **Boundaries** - -| This component **DOES** | This component **DOES NOT** | +| This component | This component does NOT | | :--- | :--- | -| Extract files from Google Drive to GCS. | Modify or delete any files in the source Drive. | -| Mirror files across two separate administrative buckets. | Validate the internal schema or data quality of files. | -| Enforce folder-grain idempotency. | Rename or transform file content. | -| Log every file-level transfer result. | Trigger the main pipeline directly (Triggered via GCS events). | -| Filter out non-data files (instruction files). | Handle multi-part Drive uploads (expects completed files). | +| Extracts files from Google Drive to GCS. | Modify or delete files in the source Drive. | +| Mirrors files across separate buckets. | Validate internal schemas or data quality. | +| Enforces folder-level idempotency. | Rename or transform file content. | +| Logs file transfer results. | Trigger the main pipeline directly. | +| Filters non-data files. | Handle multi-part Drive uploads. | -## **Failure & Severity Model** +## Failure & Severity Model -### **Operational Failures (System Level)** -* **Missing Hierarchy:** If the `PARENT_FOLDER` or `target_child_folder` cannot be resolved, the orchestrator returns `exit 1`. -* **API Throttling/Auth:** If Drive or GCS credentials fail, the process halts. No success marker is written, allowing for a clean retry. +### System Failures +* **Resolution Failure:** If folders cannot be identified, the orchestrator returns an error code. +* **API/Auth Failure:** If credentials fail, the process stops without writing a success marker, allowing for retries. -### **Functional Findings (Data Level)** -* **Partial Extraction:** If a single file in a batch fails to upload to either bucket, the entire batch is marked as `failed`. The success marker is **not** written, ensuring that a subsequent run will attempt to re-process the *entire* folder. -* **Empty Source:** If the target folder contains no valid data files, the extractor logs a warning but terminates with `exit 1` (Failure) to prevent a "phantom" successful run from triggering downstream processes. \ No newline at end of file +### Data Findings +* **Partial Extraction:** If any file in a batch fails to upload, the entire batch is marked as failed. The success marker is not written to ensure the entire folder is re-processed. +* **Empty Source:** If no valid data files are found, the extractor logs a warning and returns an error code to prevent triggering downstream processes. diff --git a/docs/data_pipeline/assembly_stage.md b/docs/data_pipeline/assembly_stage.md index 957aaae..2c5ff00 100644 --- a/docs/data_pipeline/assembly_stage.md +++ b/docs/data_pipeline/assembly_stage.md @@ -1,4 +1,4 @@ -# **Assembly Stage** +# Assembly Stage **Files:** * **Executor:** [`assembly_executor.py`](../../data_pipeline/assembly/assembly_executor.py) @@ -8,71 +8,71 @@ ![assembled-stage-diagram](/assets/diagrams/04-assemble-stage-diagram.png) -## **System Contract** +## System Contract **Purpose** -Integrates multiple normalized relational tables into a unified, analytical "Event" dataset and extracts high-fidelity "Dimension" references. It transforms raw business facts into a ready-to-model state by enforcing cardinality rules, leveraging the Primitive Integer Pipeline for memory efficiency, and calculating temporal performance metrics. +Integrates normalized relational tables into a unified analytical dataset and extracts dimension references. This stage enforces cardinality rules, applies integer mapping for memory efficiency, and calculates temporal performance metrics. **Invariants** -* **Strict Order-ID Grain:** The primary event output is guaranteed to be exactly 1 row per `order_id_int`. Any operation causing cardinality explosion triggers a terminal failure. -* **Inner-Join Priority:** To maintain analytical integrity, orders without corresponding items are purged. -* **Temporal Determinism:** All lead times, lags, and delays are calculated as integer-day durations based on validated UTC timestamps pre-normalized to microsecond resolution. -* **Reference Uniqueness:** Dimension reference tables (Customers, Products) are strictly deduplicated by their primary keys. +* **Order-ID Grain:** The primary event output maintains a 1:1 grain per `order_id_int`. Cardinality issues trigger a terminal failure. +* **Join Logic:** Orders without corresponding items are removed to maintain integrity. +* **Temporal Calculations:** Lead times and delays are calculated as integer-day durations using UTC timestamps. +* **Reference Uniqueness:** Dimension tables (Customers, Products) are deduplicated by their primary keys. **Inputs** -* `run_context`: `RunContext` (Path resolution for Silver/contracted and Gold/assembled zones). -* **Source Tables:** `df_orders`, `df_order_items`, `df_payments` (from the contracted layer). +* `run_context`: Path resolution for Silver and Gold zones. +* **Source Tables:** `df_orders`, `df_order_items`, `df_payments` from the Silver layer. **Outputs** -* **Assembly Report:** `dict` (Step-level status and informational logs). -* **Assembled Events:** `parquet` (The unified analytical order-grain table). -* **Dimension Refs:** `parquet` (Unique snapshots of customer and product attributes). +* **Assembly Report:** Status and informational logs for each step. +* **Assembled Events:** Unified analytical table at the order grain. +* **Dimension Refs:** Deduplicated snapshots of customer and product attributes. -## **Execution Workflow** +## Execution Workflow -The **Executor** coordinates two distinct sub-orchestrations: +The executor manages two distinct workflows: -### **Workflow I: Event Assembly** -1. **Batch Load:** Fetches the required triplet (`orders`, `items`, `payments`) from the Silver zone. -2. **Merge:** Joins datasets using `merge_data`. It performs an inner join on items and a left join on payments to preserve financial data without losing order context. - * **Optimization:** Employs **Integer-Joins** on pre-mapped `UInt32/UInt64` IDs (e.g., `order_id_int`) provided by the Contract Registrar to drastically reduce memory overhead. Utilizes pre-aggregation on payments and items to ensure a strict 1:1 grain, preventing row explosions. -3. **Derivation:** Executes `derive_fields` to calculate fulfillment lead times and extract ISO-calendar attributes. - * **Optimization:** Applies memory-efficient casting (e.g., `Int16` for durations, `Categorical` for repetitive strings) and drops intermediate columns early to minimize row width. -4. **Schema Freeze:** Projects the final `ASSEMBLE_SCHEMA` and casts all columns to `ASSEMBLE_DTYPES`. - * **Optimization:** Omitted sorting to enable zero-copy streaming, maintaining a non-blocking execution plan compatible with `sink_parquet()`. -5. **Export & Clean:** Persists the table using `sink_parquet()` for streaming output and triggers `gc.collect()` to free memory before dimension processing. +### Workflow I: Event Assembly +1. **Load:** Retrieves `orders`, `items`, and `payments` from the Silver zone. +2. **Merge:** Joins datasets using an inner join for items and a left join for payments. + * **Optimization:** Uses integer joins on `UInt32/UInt64` IDs to reduce memory overhead. Pre-aggregates payments and items to ensure a 1:1 grain. +3. **Derivation:** Calculates fulfillment lead times and extracts ISO-calendar attributes. + * **Optimization:** Applies data type casting (e.g., `Int16` for durations, `Categorical` for repetitive strings) and drops intermediate columns to reduce memory footprint. +4. **Schema Enforcement:** Projects the final `ASSEMBLE_SCHEMA` and casts to `ASSEMBLE_DTYPES`. + * **Optimization:** Skips sorting to enable streaming and non-blocking execution. +5. **Export:** Saves the table using `sink_parquet()` for streaming output and triggers garbage collection. -### **Workflow II: Dimension Reference Extraction** +### Workflow II: Dimension Reference Extraction 1. **Selection:** Iterates through the `DIMENSION_REFERENCES` registry. -2. **Deduplication:** Extracts the required column subset and drops duplicate primary keys. -3. **Export:** Persists each dimension (e.g., `df_customers`) as an independent artifact. +2. **Deduplication:** Extracts required columns and removes duplicate primary keys. +3. **Export:** Saves each dimension table as an independent artifact. -## **Optimization & Memory Invariants** +## Optimization & Resource Management -* **Primitive Integer Pipeline:** To operate within 4GB RAM, the pipeline converts 36-byte UUID strings into 8-byte `UInt64` hashes for joins, and 4-byte `UInt32` categoricals for payloads. This is the primary driver of memory efficiency for 36M+ row datasets. -* **Streaming-First Join:** By deferring aggregations until after raw joins on `order_id`, leveraging Polars' streaming engine to avoid massive, materialized hash tables. -* **Low-Level Memory Reclamation:** The executor utilizes `ctypes.CDLL('libc.so.6').malloc_trim(0)` at high-water mark transitions. This forces the Linux allocator to release free memory back to the OS, preventing Cloud Run from terminating the process due to bloated (but unused) heap memory. -* **Zero-Copy Streaming:** `sink_parquet()` is used to prevent the pipeline from fully materializing the assembly result set in memory. +* **Integer Mapping:** Converts UUID strings to `UInt64` hashes for joins and `UInt32` for payloads. This reduces memory overhead when processing large datasets. +* **Streaming Joins:** Defers aggregations until after joins, leveraging the Polars streaming engine to avoid large materialized tables. +* **Memory Reclamation:** Uses `malloc_trim(0)` during high-water mark transitions to release free memory back to the OS. +* **Zero-Copy Streaming:** Employs `sink_parquet()` to avoid materializing the entire result set in memory. -## **Boundaries** +## Boundaries -| This component **DOES** | This component **DOES NOT** | +| This component | This component does NOT | | :--- | :--- | -| Join multiple relational tables into a flat grain. | Perform data cleaning (handled in Contract stage). | -| Calculate time-deltas (e.g., `lead_time_days`). | Perform complex multi-stage aggregations (delegated to Semantic stage). | -| Enforce 1:1 cardinality for the final event grain. | Handle schema validation of raw data. | -| Deduplicate dimension attributes. | Manage partitioning logic (managed by the loader/exporter). | -| Manage peak memory via explicit `gc` triggers and concurrency control. | Change historical values or re-map IDs. | -| Utilize Hash-Joins for high-cardinality keys. | Perform blocking sorts on large datasets. | - -## **Failure & Severity Model** - -### **Operational Failures (System Level)** -* **Task Failure:** Individual transformation steps (Merge, Derive, Freeze) are wrapped in a fail-safe `task_wrapper`. Exceptions are trapped, logged, and return a `failed` status for that step. -* **Executor Trapping:** The top-level orchestration in `assembly_executor.py` uses `try-except-finally` blocks to catch and log unexpected pipeline crashes while ensuring memory reclamation. -* **Loading Missing Table:** If a required table (e.g., `df_orders`) is missing from the Silver zone, the stage returns `failed` immediately. -* **Export Failure:** Disk I/O errors or path resolution issues during the `export_file` call halt the lifecycle. - -### **Functional Findings (Data Level)** -* **Partial Payments:** Orders without payments are allowed (via Left Join); the system fills these with `None/NaN`, which is considered a valid business state rather than a failure. \ No newline at end of file +| Joins relational tables into a flat grain. | Perform data cleaning (handled in Contract stage). | +| Calculates time-deltas (e.g., lead times). | Perform multi-stage aggregations (delegated to Semantic stage). | +| Enforces 1:1 cardinality for the event grain. | Validate raw data schemas. | +| Deduplicates dimension attributes. | Manage partitioning logic. | +| Manages peak memory and garbage collection. | Change historical values or re-map IDs. | +| Uses Hash-Joins for high-cardinality keys. | Perform blocking sorts on large datasets. | + +## Failure & Severity Model + +### System Failures +* **Task Failure:** Transformation steps are wrapped in a handler that logs exceptions and returns a failure status. +* **Executor Safety:** Top-level orchestration uses `try-except-finally` blocks to catch crashes and ensure resource cleanup. +* **Missing Data:** If a required table is missing from the Silver zone, the stage fails. +* **I/O Failure:** Storage or path errors during export halt the lifecycle. + +### Data Findings +* **Optional Joins:** Orders without payments are allowed; the system fills missing values with nulls, which is treated as a valid state. diff --git a/docs/data_pipeline/contract_stage.md b/docs/data_pipeline/contract_stage.md index ab7218c..5a70a82 100644 --- a/docs/data_pipeline/contract_stage.md +++ b/docs/data_pipeline/contract_stage.md @@ -1,4 +1,4 @@ -# **Data Contract Stage** +# Data Contract Stage **Files:** * **Executor:** [`contract_executor.py`](../../data_pipeline/contract/contract_executor.py) @@ -6,72 +6,72 @@ * **Registrar:** [`id_registrar.py`](../../data_pipeline/contract/id_registrar.py) * **Registry:** [`registry.py`](../../data_pipeline/contract/registry.py) -**Role:** Structural Enforcement, Subtractive Filtering, and Discovery-First ID Mapping. +**Role:** Structural Enforcement, Subtractive Filtering, and ID Mapping. ![contract-stage-diagram](/assets/diagrams/03-contract-stage-diagram.png) -## **System Contract** +## System Contract **Purpose** -Enforces role-based structural rules and referential integrity on raw snapshots to ensure that only "contract-compliant" records reach the Silver layer. It acts as a gate that prunes malformed data, enforces referential integrity via ID propagation, and freezes the technical schema using a discovery-first integer mapping approach. +Enforces role-based structural rules and referential integrity on raw snapshots. This stage removes non-compliant records, propagates invalidated IDs to maintain integrity across tables, and applies technical schemas through an integer mapping process. **Invariants** -* **Subtractive-Only Row Logic:** With the exception of type casting, this stage never modifies business values or "repairs" data. If a row is non-compliant, it is dropped. -* **Structural Parity:** Every file within a logical table's contracted zone MUST share an identical schema width and data types to support high-speed vertical concatenation in the Assembly stage. -* **ID Propagation:** If an `order_id` is invalidated (e.g., due to nulls or unparsable dates), that ID is propagated to child tables to ensure a clean cascade drop. -* **Discovery-First Mapping:** Guarantees that all UUIDs are resolved and mapped to deterministic `UInt32` integers BEFORE table enforcement begins, preventing join collisions and schema drift. -* **Final Schema Freeze:** The terminal step for every role always executes `enforce_schema` to project only required columns and cast to strictly defined types. +* **Subtractive Row Logic:** This stage does not modify business values or impute data. Non-compliant rows are removed. +* **Structural Parity:** Files within a logical table's contracted zone share identical schema widths and data types to support concatenation in the Assembly stage. +* **ID Propagation:** If an ID is invalidated (e.g., due to nulls or unparsable dates), that ID is propagated to related tables to ensure a clean cascade removal. +* **Deterministic Mapping:** Resolves and maps UUIDs to `UInt32` integers before table enforcement to prevent join collisions and schema drift. +* **Schema Enforcement:** The final step for every role executes `enforce_schema` to project required columns and cast to defined types. **Inputs** -* `run_context`: `RunContext` (Path resolution for source raw snapshots and destination contracted zone). -* `table_name`: `str` (Logical identifier used to look up role-based rules). -* `master_mappings`: `dict[str, pl.LazyFrame]` (The pre-resolved dictionary of UUID-to-Integer mappings). -* `invalid_order_ids`: `set` (Blacklist of IDs from preceding tables to be dropped). -* `valid_order_ids`: `set` (Whitelist of IDs used to ensure child-parent referential integrity). +* `run_context`: Path resolution for raw snapshots and the contracted zone. +* `table_name`: Logical identifier for role-based rule lookup. +* `master_mappings`: Pre-resolved dictionary of UUID-to-Integer mappings. +* `invalid_order_ids`: IDs from preceding tables to be removed. +* `valid_order_ids`: IDs used to maintain child-parent referential integrity. **Outputs** -* **Contract Report:** `dict` (Telemetry including `initial_rows`, `final_rows`, and counts for each rule applied). -* **Invalidated IDs:** `set` (New IDs discovered to be non-compliant during this run). -* **Valid IDs:** `set` (Emitted specifically by the `orders` table to act as a parent whitelist). -* **Side Effect:** Writes a schema-enforced and integer-mapped Parquet file to the `contracted/` directory. +* **Contract Report:** Telemetry including `initial_rows`, `final_rows`, and rule application counts. +* **Invalidated IDs:** IDs identified as non-compliant during the current run. +* **Valid IDs:** Whitelist of parent IDs emitted by the `orders` table. +* **Side Effect:** Writes schema-enforced and integer-mapped Parquet files to the `contracted/` directory. -## **Execution Workflow** +## Execution Workflow -The Contract stage is split into a global Discovery phase and a table-specific Enforcement phase: +The Contract stage consists of a global Discovery phase and a table-specific Enforcement phase: -### **Phase A: Global Discovery** -1. **Discover:** Scans all raw sources (CSV/Parquet) for the unique set of UUIDs in the current run. -2. **Lookup:** Surgically retrieves existing mappings from Cloud Storage. -3. **Generate:** Maps truly new UUIDs to a continuous integer sequence. -4. **Promote:** Persists new mapping deltas to local disk and synchronizes them to central storage. +### Phase A: Global Discovery +1. **Scan:** Identifies unique UUIDs across all raw sources in the current run. +2. **Lookup:** Retrieves existing mappings from Cloud Storage. +3. **Generate:** Maps new UUIDs to a continuous integer sequence. +4. **Persist:** Saves new mapping deltas to local disk and synchronizes them to central storage. -### **Phase B: Table Enforcement** -1. **Hydrate:** Fetches the raw snapshot from the lake's snapshot zone. -2. **Logic Sequencing:** Fetches rules (dedupe, null-checks, cascade drops) from `ROLE_STEPS`. -3. **Atomic Filtering:** Iteratively applies rules. For `event_fact` roles, it captures IDs triggering violations. -4. **Structural Freeze:** Executes `enforce_schema` as the final step in the registry sequence to project the required columns. -5. **ID Mapping:** Joins the filtered and projected DataFrame against the `master_mappings` to attach integer IDs. -6. **Persistence:** Saves the resulting compliant and integer-mapped dataset to the Silver layer. +### Phase B: Table Enforcement +1. **Load:** Retrieves the raw snapshot from the source zone. +2. **Rule Application:** Executes rules (deduplication, null checks, cascade removals) defined in `ROLE_STEPS`. +3. **Filtering:** Iteratively applies rules and captures IDs triggering violations for `event_fact` roles. +4. **Schema Projection:** Executes `enforce_schema` to project the required columns. +5. **ID Mapping:** Joins the filtered DataFrame against `master_mappings` to attach integer IDs. +6. **Persistence:** Saves the compliant and mapped dataset to the Silver layer. -## **Boundaries** +## Boundaries -| This component **DOES** | This component **DOES NOT** | +| This component | This component does NOT | | :--- | :--- | -| Discover UUIDs across all raw sources (CSV/Parquet) before processing. | Calculate business metrics, KPIs, or aggregates. | -| Subtractively filter rows violating structural or temporal rules. | Impute missing values or repair malformed records. | -| Propagate `order_id` invalidations to child tables (Cascade Drop). | Perform cross-table business joins (delegated to Assembly). | -| Guarantee fixed-width schemas via terminal `enforce_schema`. | Alter business definitions or rename columns. | -| Map UUIDs to UInt32 primitives for optimized joins. | Handle cross-run global state (delegated to Storage Adapter). | - -## **Failure & Severity Model** - -### **Operational Failures (Fatal)** -* **Discovery Failure:** If mappings cannot be resolved, the pipeline halts to prevent schema corruption. -* **Schema Breach:** If `enforce_schema` is called but a required column is missing from the source data. -* **Persistence Failure:** If disk I/O or GCS promotion fails during the write phase. - -### **Functional Findings (Warnings)** -* **Contract Violations:** Data issues (duplicates, nulls) result in row removal and are logged in the telemetry report. -* **Referential Cleanup:** +| Identifies UUIDs across raw sources before processing. | Calculate business metrics or aggregates. | +| Removes rows violating structural or temporal rules. | Impute missing values or repair records. | +| Propagates ID invalidations to child tables. | Perform cross-table business joins. | +| Enforces fixed-width schemas. | Alter business definitions or rename columns. | +| Maps UUIDs to UInt32 primitives. | Manage global state across runs. | + +## Failure & Severity Model + +### System Failures +* **Mapping Failure:** If UUID mappings cannot be resolved, the pipeline halts to prevent corruption. +* **Schema Violation:** Occurs if a required column is missing from the source data during `enforce_schema`. +* **I/O Failure:** Triggered by disk or cloud storage errors during the write phase. + +### Data Findings +* **Contract Violations:** Data issues result in row removal and are logged in the report. +* **Referential Integrity:** * **Cascade:** Dropped child records are logged under `removed_cascade_rows`. - * **Orphans:** Records without parent references are dropped and logged under `removed_ghost_orphan_rows`. + * **Orphans:** Records without parent references are removed and logged under `removed_ghost_orphan_rows`. diff --git a/docs/data_pipeline/pipeline_orchestrator.md b/docs/data_pipeline/pipeline_orchestrator.md index eb91d28..40fd8c5 100644 --- a/docs/data_pipeline/pipeline_orchestrator.md +++ b/docs/data_pipeline/pipeline_orchestrator.md @@ -1,4 +1,4 @@ -# **Pipeline Orchestrator** +# Pipeline Orchestrator **File:** [`run_pipeline.py`](../../data_pipeline/run_pipeline.py) @@ -6,71 +6,71 @@ ![pipeline-orchestration-diagram](/assets/diagrams/01-pipeline-orchestration-diagram.png) -## **System Contract** +## System Contract **Purpose** -Serves as the central nervous system of the pipeline. It synchronizes data between cloud storage and local compute, manages the strict chronological execution of processing stages, and ensures the absolute cleanup of system resources regardless of run outcome. +The orchestration layer coordinates data movement between cloud storage and local compute, enforces the chronological execution of processing stages, and manages system resource cleanup. **Invariants** -* **Linearity Guarantee:** Stages are strictly gated. No stage can execute unless the preceding required stage returns a "SUCCESS" signal. -* **Resource Isolation:** Every execution is isolated within a unique `run_id` workspace. -* **Silver Persistence Invariant:** The local `contracted/` runtime directory is treated as transient. The **Cloud Silver Store** is the only authoritative source of truth for the Silver layer. Data must be synchronized to the cloud before the local environment is purged. -* **Mandatory Cleanup:** The local `workspace_root` is deterministically purged at the end of every lifecycle (via `finally` blocks) to prevent disk saturation and cross-run data contamination. -* **Lineage Consistency:** A single `run_id` is propagated through every stage, metadata file, and published artifact to ensure 100% traceability. +* **Stage Gating:** Stages execute sequentially. Processing halts if a preceding stage does not return a "SUCCESS" signal. +* **Resource Isolation:** Each execution occurs within an isolated `run_id` workspace. +* **Silver Persistence:** The local `contracted/` directory is transient. Data must synchronize to the authoritative Cloud Silver Store before the local environment is cleared. +* **Resource Cleanup:** The local `workspace_root` is cleared at the end of every lifecycle via `finally` blocks to prevent resource saturation and data contamination. +* **Lineage Tracking:** The `run_id` is propagated through every stage, metadata file, and published artifact to maintain lineage. **Inputs** -* `RunContext`: (The global configuration object containing IDs and Path mappings). -* **Cloud State**: Raw snapshots and historical Silver (contracted) deltas stored in Google Cloud Storage. +* `RunContext`: Configuration object containing IDs and path mappings. +* **Cloud State**: Raw snapshots and historical Silver deltas stored in Google Cloud Storage. **Outputs** -* **Operational Telemetry**: `run_metadata.json` and a collection of `.json` stage reports. -* **Persistent Silver Store**: Updated contract-compliant datasets in the cloud `contract/` directory. -* **Semantic Artifacts**: Validated Fact and Dimension tables in the production zone. -* **System State**: An updated `latest_version.json` pointer in the production zone. +* **Operational Telemetry**: `run_metadata.json` and stage-specific reports. +* **Persistent Silver Store**: Validated datasets in the cloud `contract/` directory. +* **Semantic Artifacts**: Fact and Dimension tables in the production zone. +* **System State**: Updated `latest_version.json` pointer in the production zone. -## **Execution Workflow** +## Execution Workflow -The orchestrator manages the lifecycle through a strictly gated 13-step sequence, emphasizing memory efficiency and cloud-local synchronization: +The orchestrator manages the lifecycle through a 13-step sequence: -### **Phase I: Environment Initialization** -1. **Resolve**: Instantiates the `RunContext` and initializes background memory telemetry for real-time benchmarking. -2. **Hydrate (Raw)**: Synchronizes the required raw data snapshot from Cloud Storage to the local workspace. -3. **Initialize**: Registers the run commencement by generating `run_metadata.json` with initial "RUNNING" status. +### Phase I: Environment Initialization +1. **Resolve**: Instantiates `RunContext` and starts background memory telemetry for benchmarking. +2. **Hydrate (Raw)**: Synchronizes the raw data snapshot from Cloud Storage to the local workspace. +3. **Initialize**: Logs run start by creating `run_metadata.json` with a "RUNNING" status. -### **Phase II: Processing & Memory Reclamation** -4. **Validate (Raw)**: Asserts the health of the raw data snapshot; fail-fast on structural errors. -5. **Contract Processing**: Executes subtractive filtering and freezes schemas into the local `contracted/` path (Silver layer). -6. **Gate II (Revalidation)**: Defensive check to ensure contracted data meets downstream semantic requirements. -7. **Promote (Silver)**: Persists the newly contracted datasets to **Cloud Silver Storage**. -8. **Synchronize (BQ)**: Forces a metadata cache refresh for BigQuery External Tables via system procedures (`BQ.REFRESH_EXTERNAL_METADATA_CACHE`) for immediate visibility. -9. **Purge (Local)**: Deterministically deletes local `raw/` and `contracted/` directories and invokes `force_gc()` to reclaim RAM before the high-compute Assembly stage. -10. **Assemble**: Flattens relational data into a unified Gold-layer event grain using the **BigQuery Storage Read API** (bypassing the need for local Silver restoration). -11. **Modeling (Semantic)**: Builds entity-centric analytical modules (Fact/Dim tables). +### Phase II: Processing & Memory Reclamation +4. **Validate (Raw)**: Verifies raw data snapshots; fails on structural errors. +5. **Contract Processing**: Executes subtractive filtering and saves results to the local `contracted/` path. +6. **Gate II (Revalidation)**: Verifies contracted data meets downstream semantic requirements. +7. **Promote (Silver)**: Synchronizes contracted datasets to Cloud Silver Storage. +8. **Synchronize (BQ)**: Refreshes BigQuery External Table metadata for immediate visibility. +9. **Purge (Local)**: Deletes local `raw/` and `contracted/` directories and triggers garbage collection to reclaim RAM before the Assembly stage. +10. **Assemble**: Flattens relational data into a Gold-layer dataset using the BigQuery Storage Read API. +11. **Modeling (Semantic)**: Constructs entity-centric analytical modules. -### **Phase III: Activation & Finalization** -12. **Publish**: Executes final integrity gates, performs the **BigQuery View Swap** for the BI layer, and triggers the atomic pointer swap (`_latest.json`) to activate the new version. -13. **Finalize**: Updates terminal metadata (status, duration), uploads all telemetry/stage reports to Cloud Storage, and purges the entire local workspace. +### Phase III: Activation & Finalization +12. **Publish**: Executes final integrity gates, performs BigQuery View swaps, and updates the atomic pointer (`_latest.json`). +13. **Finalize**: Updates status and duration metadata, uploads reports to Cloud Storage, and clears the local workspace. -## **Boundaries** +## Boundaries -| This component **DOES** | This component **DOES NOT** | +| This component | This component does NOT | | :--- | :--- | -| Coordinate the sequence of high-level executors. | Modify rows, columns, or data values. | -| Manage local/cloud data synchronization and BQ caching. | Implement business logic or aggregation rules. | -| Enforce the "Purge-before-Assembly" memory optimization. | Define the technical schema for Fact/Dim tables. | -| Manage the `finally` block for resource safety. | Direct file-level I/O within a stage (Delegated). | -| Aggregate stage-level reports into a run summary. | Perform granular row-level validation. | -| Monitor and log real-time memory telemetry. | Execute SQL transformations directly (Delegated). | +| Coordinates executor sequencing. | Modify data values or logic. | +| Manages cloud synchronization and BQ caching. | Implement business or aggregation rules. | +| Enforces memory optimization strategies. | Define technical schemas for Fact/Dim tables. | +| Manages resource safety via `finally` blocks. | Direct file-level I/O within stages. | +| Aggregates reports into a run summary. | Perform granular row-level validation. | +| Monitors memory telemetry. | Execute SQL transformations directly. | -## **Failure & Severity Model** +## Failure & Severity Model -### **System Failures (Fatal)** -* **Sync Failure**: If the pipeline cannot upload to or download from the Cloud Silver Store, it raises an exception and halts. This prevents the Assembly stage from running on stale or missing data. -* **Stage Crash**: Any unhandled exception within an executor is caught. The pipeline terminates immediately to prevent "partial processing." +### System Failures +* **Sync Failure**: If cloud synchronization fails, the pipeline halts to prevent processing with stale or missing data. +* **Stage Failure**: Unhandled exceptions within executors trigger immediate termination to prevent partial data processing. -### **Resource Recovery** -* **The "Finally" Contract**: Even if a stage crashes mid-execution, the orchestrator guarantees that the local workspace is purged. This prevents a failed run from leaving "ghost" data that could be picked up by the next run. +### Resource Recovery +* **Workspace Purge**: The orchestrator clears the local workspace at completion. This ensures subsequent runs do not encounter residual data from previous executions. -### **Telemetry on Failure** -* **Partial Logs**: Even on failure, the orchestrator attempts to upload available stage reports to the cloud. This ensures you have the diagnostic data to see which rule in the "Contract" stage triggered the failure before the "Purge" occurred. \ No newline at end of file +### Telemetry on Failure +* **Log Retention**: The orchestrator attempts to upload available stage reports to cloud storage upon failure to preserve diagnostic data. diff --git a/docs/data_pipeline/publishing_stage.md b/docs/data_pipeline/publishing_stage.md index 57a6380..6e6ba6e 100644 --- a/docs/data_pipeline/publishing_stage.md +++ b/docs/data_pipeline/publishing_stage.md @@ -1,4 +1,4 @@ -# **Publish Stage** +# Publish Stage **Files:** * **Executor:** [`publish_executor.py`](../../data_pipeline/publish/publish_executor.py) @@ -6,50 +6,50 @@ **Role:** Production Promotion and Versioning. -## **System Contract** +## System Contract **Purpose** -Serves as the final gate and deployment mechanism for the pipeline. It transitions validated semantic artifacts into a permanent, versioned storage layer and updates a dual-pointer system: a `latest_version.json` manifest for automated systems and BigQuery Authorized Views for Power BI/Business Intelligence tools. +The publish stage serves as the final deployment mechanism for the pipeline. It transitions validated semantic artifacts into versioned storage and updates a dual-pointer system: a `latest_version.json` manifest for automated systems and BigQuery Authorized Views for BI tools. **Invariants** -* **Integrity-Gated Promotion:** Promotion to the production zone is strictly prohibited if any table defined in the `SEMANTIC_MODULES` registry is missing or inaccessible. -* **Atomic Multi-System Swap:** The "switch" to the new version must happen across both GCS and BigQuery. The BigQuery View swap ensures Power BI never experiences "partial data" reads during the file promotion phase. -* **Version Immutability:** Once a run is archived in a `v{run_id}` directory, the files are treated as read-only snapshots; they are never updated or overwritten by subsequent runs. -* **SQL Decoupling:** Dashboards connect to "Stable" Views (e.g., `published_seller_weekly_fact`) which are dynamically redirected to version-specific External Tables (e.g., `seller_weekly_fact_v20260413`). +* **Integrity Gates:** Promotion to the production zone is prevented if any table defined in the `SEMANTIC_MODULES` registry is missing or inaccessible. +* **Coordinated Update:** The transition to a new version occurs across both Cloud Storage and BigQuery. The BigQuery View swap ensures that BI tools do not encounter partial data during the file promotion phase. +* **Version Immutability:** Files archived in a `v{run_id}` directory are read-only and are not overwritten by subsequent runs. +* **SQL Decoupling:** Dashboards connect to stable Views (e.g., `published_seller_weekly_fact`) which are redirected to version-specific External Tables. **Inputs** -* `run_context`: `RunContext` (Contains the unique `run_id` and the semantic/published path configurations). -* `SEMANTIC_MODULES`: `Registry` (The source of truth for which artifacts must exist to pass the integrity gate). +* `run_context`: Configuration containing the `run_id` and path settings. +* `SEMANTIC_MODULES`: Registry defining the required artifacts for a successful release. **Outputs** -* **Publish Report:** `dict` (Telemetry for the integrity check, file promotion status, and SQL/JSON pointer updates). -* **Versioned Artifacts:** A new directory `/published/v{run_id}/` containing the full suite of semantic Fact and Dimension tables. -* **BigQuery Pointers:** Updated External Tables and Authorized Views reflecting the new version. -* **Latest Pointer:** An updated `latest_version.json` file in the root of the published zone. +* **Publish Report:** Status of integrity checks, file promotions, and pointer updates. +* **Versioned Artifacts:** A new `/published/v{run_id}/` directory containing semantic Fact and Dimension tables. +* **BigQuery Pointers:** Updated External Tables and Authorized Views. +* **Latest Pointer:** Updated `latest_version.json` file in the published zone root. -## **Execution Workflow** +## Execution Workflow -The **Executor** ensures the production release follows a fail-fast, four-phase sequence: +The executor manages the production release through a four-phase sequence: -1. **Integrity Gate:** `run_integrity_gate` scans the semantic zone to verify that 100% of the expected tables (defined in the registry) were successfully produced. -2. **Promotion:** `promote_semantic_version` transfers all verified artifacts from the transient run-scoped directory to a permanent versioned path (`/published/v{run_id}`). -3. **SQL Sync:** `swap_bigquery_view` executes DDL commands to create versioned External Tables and atomically redirect the "Published" Views used by dashboards. -4. **Activation:** `activate_published_version` performs the terminal swap of the `latest_version.json` file, effectively "going live" for downstream file-system consumers. +1. **Integrity Gate:** Scans the semantic zone to verify that all expected tables defined in the registry exist. +2. **Promotion:** Transfers verified artifacts from the run-scoped directory to a permanent versioned path (`/published/v{run_id}`). +3. **SQL Sync:** Executes DDL commands to create versioned External Tables and redirect the Authorized Views. +4. **Activation:** Updates the `latest_version.json` file to point to the new version. -## **Boundaries** +## Boundaries -| This component **DOES** | This component **DOES NOT** | +| This component | This component does NOT | | :--- | :--- | -| Verify the physical existence of semantic artifacts. | Re-validate data quality (handled in Validation/Contract). | -| Copy or upload files to a versioned production path. | Perform any data transformation or aggregation. | -| Manage BigQuery DDL for External Tables and Views. | Manage historical version cleanup (Garbage collection). | -| Update the atomic production pointers (SQL and JSON). | Handle automated rollbacks (pointers must be reverted manually). | -| Capture lifecycle metadata (Publication timestamps). | Modify the contents of the `.parquet` files. | - -## **Failure & Severity Model** - -### **Operational Failures (System Level)** -* **Storage Access Denied:** If the service account lacks write permissions to the published zone (Local or GCS), the lifecycle halts before activation. -* **BigQuery DDL Error:** If the SQL swap fails (e.g., dataset permissions or syntax), the `latest_version.json` is never updated, ensuring systems stay in sync. -* **Network/IO Exception:** Interrupted file transfers during the promotion phase result in an immediate `failed` status, ensuring the pointers remain on the previous stable version. \ No newline at end of file +| Verifies the existence of semantic artifacts. | Re-validate data quality (handled in prior stages). | +| Copies files to the versioned production path. | Perform data transformation or aggregation. | +| Manages BigQuery DDL for tables and views. | Manage historical version cleanup. | +| Updates production pointers (SQL and JSON). | Handle automated rollbacks. | +| Captures publication metadata. | Modify the contents of Parquet files. | + +## Failure & Severity Model + +### System Failures +* **Storage Access Failure:** If write permissions are missing for the published zone, the lifecycle halts before activation. +* **BigQuery DDL Error:** If the SQL swap fails, the `latest_version.json` is not updated to keep pointers synchronized. +* **I/O Interruptions:** Errors during file transfer result in a failure status, leaving pointers on the previous stable version. diff --git a/docs/data_pipeline/semantic_stage.md b/docs/data_pipeline/semantic_stage.md index 718b9bf..42393ff 100644 --- a/docs/data_pipeline/semantic_stage.md +++ b/docs/data_pipeline/semantic_stage.md @@ -1,4 +1,4 @@ -# **Semantic Modeling Stage** +# Semantic Modeling Stage **Files:** * **Executor:** [`semantic_executor.py`](../../data_pipeline/semantic/semantic_executor.py) @@ -9,67 +9,67 @@ ![semantic-stage-diagram](/assets/diagrams/05-semantic-stage-diagram.png) -## **System Contract** +## System Contract **Purpose** -Transforms the unified Gold-layer "Order-Grain" event table into entity-centric Fact and Dimension modules. It performs temporal aggregations, calculates long-term performance metrics, and leverages the Primitive Integer Pipeline for efficient, high-fidelity analytical modeling. +Transforms the unified analytical table into entity-centric Fact and Dimension modules. This stage performs temporal aggregations, calculates long-term performance metrics, and applies integer mapping for analytical modeling. **Invariants** -* **Temporal Grain:** All fact tables are aggregated at the ISO-Week level, aligned deterministically to the Monday of each week (`W-MON`). +* **Temporal Grain:** Fact tables are aggregated at the ISO-Week level (`W-MON`). * **Entity Grain:** - * **Fact Tables:** Strictly 1 row per `(Entity_ID, order_year_week)`. - * **Dimension Tables:** Strictly 1 row per `Entity_ID`. -* **Technical Contract:** Every output table is subject to a "Freeze" pass that guarantees 1:1 schema matching and strict dtype casting as defined in the `SEMANTIC_MODULES` registry. + * **Fact Tables:** One row per `(Entity_ID, order_year_week)`. + * **Dimension Tables:** One row per `Entity_ID`. +* **Technical Enforcement:** Output tables are subject to a schema enforcement pass that ensures matching columns and data types as defined in the `SEMANTIC_MODULES` registry. **Inputs** -* `run_context`: `RunContext` (Path resolution and `run_id` lineage). -* `assembled_events`: `pl.LazyFrame` (The unified analytical source from the Assembly stage, optimized for streaming). -* `SEMANTIC_MODULES`: `Registry` (Defines builders, expected tables, grains, and technical schemas). +* `run_context`: Path resolution and lineage tracking. +* `assembled_events`: Unified analytical source from the Assembly stage. +* `SEMANTIC_MODULES`: Registry defining builders, tables, grains, and schemas. **Outputs** -* **Semantic Report:** `dict` (Hierarchical status of module-level and table-level processing). -* **Semantic Modules:** `parquet` (Fact and Dimension tables for Sellers, Customers, and Products). +* **Semantic Report:** Status of module and table processing. +* **Semantic Modules:** Fact and Dimension tables for Sellers, Customers, and Products in Parquet format. -## **Execution Workflow** +## Execution Workflow -The **Executor** coordinates the semantic build through a modular, registry-driven loop: +The executor manages the semantic build through a registry-driven loop: -1. **Source Verification:** Loads the `assembled_events` `LazyFrame`. If the source is missing or cannot be scanned, the stage terminates with a `failed` status. -2. **Module Initialization:** Iterates through `SEMANTIC_MODULES` defined in the registry. -3. **Builder Execution:** Dispatches the `LazyFrame` data to the module's `builder` function (e.g., `build_seller_semantic`). -4. **Contract Enforcement:** For every table returned by a builder, the executor calls `validate_and_freeze_table` to: - * Assert the uniqueness of the defined **Grain**. - * Project the exact **Schema** (dropping internal helper columns). - * Enforce strict **Data Types**. -5. **Partitioned Export:** Persists artifacts into module-specific subdirectories within the semantic zone, using a date-partitioned naming convention. - * **Optimization:** Utilizes `sink_parquet` for `LazyFrame` exports, ensuring zero-copy streaming and constant memory usage. -6. **Memory Management:** Explicitly deletes `LazyFrames` and triggers `gc.collect()` after every individual table export (Fact and Dim) to purge intermediate memory usage. +1. **Source Verification:** Loads the `assembled_events` dataset. If the source is missing, the stage fails. +2. **Module Initialization:** Iterates through the `SEMANTIC_MODULES` registry. +3. **Builder Execution:** Dispatches data to module-specific builder functions. +4. **Enforcement:** For each table produced, the executor: + * Verifies grain uniqueness. + * Projects the defined schema and removes internal helper columns. + * Enforces data types. +5. **Export:** Saves artifacts into module-specific subdirectories using a date-partitioned convention. + * **Optimization:** Uses `sink_parquet` for streaming exports to maintain a low memory footprint. +6. **Memory Management:** Triggers garbage collection after each table export to clear intermediate memory. -## **Optimization & Memory Invariants** +## Optimization & Resource Management -* **Integer Key Optimization:** To optimize memory during grouping operations, builders leverage pre-mapped `UInt32/UInt64` keys (e.g., `seller_id_int`). This maintains a constant memory profile during non-blocking aggregation and eliminates the overhead of string-based hash tables. -* **Narrow Aggregation Payloads:** All aggregation results (counts, sums) are immediately downcast to `Int16` or `Float32` within the `agg()` block. This prevents the materialized result set from expanding in memory. -* **Metric Downcasting:** Durations, counts, and years are forced to `Int16` (2 bytes) to minimize row width during streaming. -* **Streaming Export:** `sink_parquet()` is utilized for all fact and dimension table exports, enabling zero-copy streaming of results directly from the query plan to storage. +* **Integer Keys:** Builders use pre-mapped `UInt32/UInt64` keys for grouping operations to reduce memory overhead compared to string-based joins. +* **Aggregated Data Types:** Aggregation results are downcast to smaller types (e.g., `Int16`, `Float32`) within the `agg()` block to reduce the size of the result set. +* **Type Casting:** Durations, counts, and year attributes are cast to `Int16` to reduce row width. +* **Streaming Export:** Employs `sink_parquet()` to stream results directly to storage. -## **Boundaries** +## Boundaries -| This component **DOES** | This component **DOES NOT** | +| This component | This component does NOT | | :--- | :--- | -| Perform multi-level aggregations (Sum, Mean, Count). | Filter "bad" data (handled in Validation/Contract stages). | -| Derive entity-level attributes (e.g., `first_order_date`). | Resolve order-item join cardinality. | -| Align all temporal metrics to the ISO Week grain. | Mutate the "Assembled Events" source. | -| Utilize Integer-Key grouping for constant memory. | Manage the physical publish/pointer logic. | -| Organize data into Fact/Dimension modules via streaming. | Perform cross-module joins. | +| Performs multi-level aggregations. | Filter malformed data (handled in prior stages). | +| Derives entity-level attributes. | Resolve join cardinality issues. | +| Aligns metrics to the ISO Week grain. | Mutate the "Assembled Events" source. | +| Uses integer keys for grouping. | Manage physical publishing or pointer logic. | +| Organizes data into Fact/Dimension modules. | Perform cross-module joins. | -## **Failure & Severity Model** +## Failure & Severity Model -### **Operational Failures (System Level)** -* **Missing Source:** Failure to load `assembled_events` results in an immediate stage failure. -* **Trapped Exceptions:** Unexpected errors during module building or table processing are caught by `try-except` blocks in the executor. These are logged to the report, and the stage status is set to `failed`. -* **Registry Mismatch:** If a builder returns a table name not defined in the `SEMANTIC_MODULES` registry, the executor raises a `RuntimeError`. +### System Failures +* **Missing Source:** Failure to load the input dataset results in an immediate stage failure. +* **Handled Exceptions:** Errors during building or processing are caught, logged to the report, and the stage is marked as failed. +* **Registry Mismatch:** The executor raises an error if a builder returns a table not defined in the registry. -### **Functional Findings (Data Level)** -* **Schema Violation:** If a required column defined in the registry is missing from the builder's output, the freeze step raises a `KeyError` or `RuntimeError`, which is trapped. \ No newline at end of file +### Data Findings +* **Schema Violation:** If a required column is missing from a builder's output, the enforcement step raises an error that is caught and logged. diff --git a/docs/data_pipeline/validation_stage.md b/docs/data_pipeline/validation_stage.md index b6d5a87..81eee46 100644 --- a/docs/data_pipeline/validation_stage.md +++ b/docs/data_pipeline/validation_stage.md @@ -1,4 +1,4 @@ -# **Validation Stage** +# Validation Stage **Files:** * **Executor:** [`validation_executor.py`](../../data_pipeline/validation/validation_executor.py) @@ -8,67 +8,60 @@ ![validation-stage-diagram](/assets/diagrams/02-validation-stage-diagram.png) -## **System Contract** +## System Contract **Purpose** -Evaluates raw datasets against declared structural contracts before any mutation or transformation occurs. It prevents "garbage-in" scenarios by detecting schema violations, structural inconsistencies, and referential integrity issues. In the modern Polars-native architecture, it also serves as a verification gate for the 'Normalize-at-Source' I/O strategy. +The validation stage evaluates raw datasets against structural contracts before processing. It identifies schema violations, structural inconsistencies, and referential integrity issues to prevent malformed data from entering the pipeline. **Invariants** -* **Non-Mutation Guarantee:** This stage is strictly read-only. It never modifies values, removes rows, or casts types in the source data. -* **Resolution Verification:** Asserts that all timestamps are pre-normalized to microsecond (us) resolution by the I/O layer. +* **Read-Only Operation:** This stage is strictly read-only. It does not modify values, remove rows, or cast data types. +* **Resolution Verification:** Verifies that timestamps are normalized to microsecond (us) resolution. * **Severity Hierarchy:** - * `errors`: Fatal structural violations (e.g., missing columns, duplicate PKs). - * `warnings`: Admissible integrity issues (e.g., orphan records, chronological anomalies). -* **Execution Sequence:** Base structural validations are mandatory and must pass for a table before role-specific or cross-table validations are attempted. + * `errors`: Fatal structural violations (e.g., missing columns, duplicate Primary Keys). + * `warnings`: Integrity issues (e.g., orphaned records, chronological anomalies). +* **Sequential Execution:** Base structural validations must pass for a table before role-specific or cross-table validations occur. **Inputs:** -* `run_context`: `RunContext` (Object containing path resolution for the raw snapshot). -* `TABLE_CONFIG`: `Registry` (Defines Primary Keys, Required Columns, and Entity Roles). -* `base_path`: `Path` (Optional override; defaults to the run-scoped snapshot directory). +* `run_context`: Configuration object for path resolution. +* `TABLE_CONFIG`: Registry defining Primary Keys, Required Columns, and Entity Roles. +* `base_path`: Optional override; defaults to the run-scoped snapshot directory. **Outputs** -* **Validation Report:** `dict` (Standardized telemetry object containing `status`, `errors`, `warnings`, and `info`). +* **Validation Report:** Standardized dictionary containing `status`, `errors`, `warnings`, and `info`. -## **Execution Workflow** +## Execution Workflow -The **Executor** coordinates the validation lifecycle through the following deterministic steps: +The executor coordinates the validation lifecycle through these steps: -1. **Table Discovery:** Iterates through all logical tables registered in `TABLE_CONFIG`. -2. **Data Loading:** Attempts to load each table as a DataFrame. If a table is missing, an `error` is logged to the report. -3. **Base Validation:** Dispatches the DataFrame to `run_base_validations` to check for: - * Presence of required columns. - * Uniqueness of Primary Keys and column names using Polars-native expressions. - * Compliance with non-nullable constraints. -4. **Role-Specific Dispatch:** If base validations pass, the executor applies specialized rules: - * `event_fact`: Triggers `run_event_fact_validations` (temporal chronology and microsecond resolution verification). - * `transaction_detail`: Triggers `run_transaction_detail_validations` (numeric range checks). -5. **Cross-Table Integrity:** Once all tables are processed individually, `run_cross_table_validations` evaluates Foreign Key relationships (e.g., ensuring all Items belong to an existing Order). +1. **Table Discovery:** Iterates through logical tables registered in `TABLE_CONFIG`. +2. **Data Loading:** Loads each table as a DataFrame. Logs an error if a table is missing. +3. **Base Validation:** Checks for required columns, Primary Key uniqueness, and non-nullable constraints using Polars expressions. +4. **Role-Specific Rules:** + * `event_fact`: Verifies temporal chronology and microsecond resolution. + * `transaction_detail`: Performs numeric range checks. +5. **Cross-Table Integrity:** Evaluates Foreign Key relationships once individual table processing is complete. -## **Boundaries** +## Boundaries -| This component **DOES** | This component **DOES NOT** | +| This component | This component does NOT | | :--- | :--- | -| Load logical tables from the snapshot zone. | Remove rows or filter data. | -| Detect schema and primary key violations. | Correct or impute missing values. | -| Verify microsecond (us) timestamp resolution. | Deduplicate records (delegated to Contract stage). | -| Evaluate temporal chronology using clean Polars syntax. | Perform data type casting. | -| Detect numeric anomalies (negative prices/lags). | Mutate the physical state of the data lake. | -| Produce structured, machine-readable reports. | Halt the pipeline (Decision owned by global orchestrator). | - -## **Failure & Severity Model** - -### **Operational Failures (System Level)** -* **Missing Logical Table:** Logged as a fatal `error` in the report; the table is marked as unprocessable. -* **Load Failure:** If a Parquet file is corrupted, the executor traps the exception and logs it as an `error`. - -* **Functional Findings (Data Level)** -* **Structural Errors & Integrity Warnings:** - * Fatal structural violations (e.g., PK duplicates) OR integrity issues (e.g., orphan records) set the stage status to `failed`. - * This informs the orchestrator that the data contains quality issues, though the diagnostic pass will still complete for all tables. -* **The Halting Caveat:** - * **Initial Validation:** The orchestrator allows the pipeline to proceed if only `warnings` are present, delegating the cleanup to the `Contract Stage`. - * **Post-Contract (Revalidation):** In this phase, `warnings` are treated as fatal. Since the contract stage should have already pruned orphans and anomalies, any remaining warning triggers a terminal `RuntimeError` to prevent downstream corruption. - -* **Incomplete Context:** - * If a parent table (e.g., `df_orders`) is missing, cross-table validation is skipped and logged as `info` rather than a failure. \ No newline at end of file +| Loads tables from the snapshot zone. | Remove rows or filter data. | +| Detects schema and primary key violations. | Correct or impute missing values. | +| Verifies microsecond (us) timestamp resolution. | Deduplicate records. | +| Evaluates temporal chronology. | Perform data type casting. | +| Detects numeric anomalies. | Mutate data state. | +| Produces structured reports. | Halt the pipeline independently. | + +## Failure & Severity Model + +### System Failures +* **Missing Table:** Logged as a fatal error; the table is marked as unprocessable. +* **Corrupted Data:** Catches file-loading exceptions and logs them as errors. + +### Data Validation Findings +* **Status Flags:** Structural violations or integrity issues set the stage status to `failed`. The orchestrator determines whether to proceed based on the run phase. +* **Phase-Based Halting:** + * **Initial Validation:** The orchestrator allows the run to continue if only `warnings` are present, delegating cleanup to the Contract Stage. + * **Post-Contract Revalidation:** Any remaining `warnings` are treated as fatal errors to prevent downstream corruption. +* **Contextual Dependencies:** Cross-table validation is skipped and logged as `info` if a required parent table is missing. diff --git a/docs/terraform/gcp-iac.md b/docs/terraform/gcp-iac.md index 5e62211..51ac729 100644 --- a/docs/terraform/gcp-iac.md +++ b/docs/terraform/gcp-iac.md @@ -1,40 +1,40 @@ # GCP Infrastructure: Operations Analytics Pipeline -This repository contains the Terraform configuration for the Operations Analytics data pipeline. The infrastructure is designed to be serverless, event-driven, and highly secure, utilizing Google Cloud Run, Workflows, and Eventarc. +This repository contains the Terraform configuration for the Operations Analytics data pipeline. The infrastructure is serverless and event-driven, using Cloud Run, Workflows, and Eventarc. ## Architecture Overview -The pipeline follows a **Trigger-Action-Archive** flow: -1. **Extraction:** A Cloud Scheduler job triggers the `drive-extractor` Cloud Run job at midnight (PHT). -2. **Archival:** The extractor saves raw data into the **Archival Bucket** (Coldline storage for 3 years). +The pipeline follows this execution flow: +1. **Extraction:** Cloud Scheduler triggers the `drive-extractor` Cloud Run job daily. +2. **Archival:** The extractor saves raw data to the **Archival Bucket** (Coldline storage). 3. **Dispatch:** An Eventarc trigger detects the new file and invokes a Google Workflow (`pipeline-dispatcher`). -4. **Processing:** The Workflow triggers the main `operations-pipeline` Cloud Run job (2 vCPU, 8Gi RAM) for heavy-duty data processing. -5. **Transient Storage:** Intermediate files are stored in the **Pipeline Bucket** with a 7-day TTL on raw data to minimize costs and exposure. -6. **Serving Layer:** The final semantic models are published as **BigQuery External Tables** and presented via stable **Authorized Views** for Power BI and dashboard consumers. +4. **Processing:** The Workflow triggers the main `operations-pipeline` Cloud Run job for data processing. +5. **Transient Storage:** Intermediate files are stored in the **Pipeline Bucket** with a 7-day TTL on raw data. +6. **Serving Layer:** Semantic models are published as **BigQuery External Tables** and accessed via **Authorized Views**. ## Prerequisites * **Terraform:** Version `~> 1.5.0` * **Provider:** `hashicorp/google` version `~> 7.0` -* **Backend:** GCS bucket `operations-terraform-state-vault-2026` must exist for state management. +* **Backend:** GCS bucket `operations-terraform-state-vault-2026` for state management. -## Post-Provisioning (CI/CD Handshake) -The integration between GCP and GitHub Actions requires a one-time "Bootstrap" extraction to populate Repository Secrets. This process completes the cryptographic trust relationship established by Workload Identity Federation (WIF). +## Post-Provisioning (CI/CD Setup) +Integrating GCP with GitHub Actions requires a bootstrap process to populate Repository Secrets and establish the trust relationship via Workload Identity Federation (WIF). -### Secret Injection Matrix -| GitHub Secret | Source / Origin | Purpose | +### Required GitHub Secrets +| GitHub Secret | Source | Purpose | | :--- | :--- | :--- | -| `WIF_PROVIDER` | `terraform output -raw GITHUB_WIF_PROVIDER_NAME` | Logical path for the WIF identity provider handshake. | -| `DEPLOYER_SA_EMAIL` | `github-actions-deployer@...` | Target identity for GitHub OIDC impersonation. | -| `GCP_PROJECT_ID` | `var.project_id` | Project scoping for GCP API and resource discovery. | +| `WIF_PROVIDER` | `terraform output -raw GITHUB_WIF_PROVIDER_NAME` | WIF identity provider path. | +| `DEPLOYER_SA_EMAIL` | `github-actions-deployer@...` | Identity for GitHub OIDC impersonation. | +| `GCP_PROJECT_ID` | `var.project_id` | GCP Project ID. | -### Bootstrapping Constraint -The initial infrastructure provisioning must be executed by a maintainer with `Project IAM Admin` or `Owner` privileges. This "privileged apply" is required to establish the WIF provider and assign the administrative roles to the `github-actions-deployer` service account. Subsequent updates are autonomously managed by the CI/CD identity. +### Provisioning Requirements +The initial infrastructure must be provisioned by a user with `Project IAM Admin` or `Owner` privileges to establish the WIF provider and assign roles to the `github-actions-deployer` service account. Subsequent updates are managed by the CI/CD pipeline. ## Infrastructure Components ### Compute & Jobs (`jobs.tf`) | Resource Name | Type | Memory | Timeout | Purpose | | :--- | :--- | :--- | :--- | :--- | -| `operations-pipeline` | Cloud Run Job | 8Gi | 30m | Main Polars-based processing engine. Includes 10Gi Local SSD mount at `/tmp`. | +| `operations-pipeline` | Cloud Run Job | 8Gi | 30m | Processing engine. Includes 10Gi Local SSD mount at `/tmp`. | | `drive-extractor` | Cloud Run Job | 1Gi | 15m | Pulls source data from external APIs. | | `ops-repo` | Artifact Registry | n/a | n/a | Docker repository for pipeline images. | @@ -43,39 +43,39 @@ The initial infrastructure provisioning must be executed by a maintainer with `P | :--- | :--- | :--- | | `ops-archival-storage` | GCS Bucket | Move to Coldline after 400 days; Delete after 3 years. | | `ops-pipeline-storage` | GCS Bucket | Delete files with prefix `raw/` after 7 days. | -| `seller_semantic` | BQ Dataset | **Protected:** `prevent_destroy = true`; Logical container for Seller fact/dim views. | -| `customer_semantic` | BQ Dataset | **Protected:** `prevent_destroy = true`; Logical container for Customer fact/dim views. | -| `product_semantic` | BQ Dataset | **Protected:** `prevent_destroy = true`; Logical container for Product fact/dim views. | +| `seller_semantic` | BQ Dataset | Protected from destruction. | +| `customer_semantic` | BQ Dataset | Protected from destruction. | +| `product_semantic` | BQ Dataset | Protected from destruction. | -## Infrastructure-as-Code Workarounds +## Implementation Details -### Cloud Run Local SSD Strategy (Preview) -The `operations-pipeline` utilizes a **Local SSD** mount at `/tmp` (10Gi) **by provisioning manually** to offload memory pressure from Polars streaming joins. -* **The Problem:** As of April 2026, the Google Terraform provider does not natively support the `DISK` medium for `empty_dir` volumes (it defaults to `MEMORY`). -* **The Resolution:** Provision manually and utilize lifecycle `ignore_changes` on the `medium` attribute. This allows the job to be created with the SSD partition enabled via the CLI or UI, while preventing Terraform from "correcting" it back to RAM-based storage during subsequent runs. +### Cloud Run Local SSD +The `operations-pipeline` uses a Local SSD mount at `/tmp` to offload memory pressure during joins. +* **Constraint:** The Google Terraform provider does not natively support the `DISK` medium for `empty_dir` volumes (it defaults to `MEMORY`). +* **Configuration:** Provision the SSD partition manually and use `ignore_changes` on the `medium` attribute in Terraform to prevent reversion to RAM-based storage. -### BigQuery Accidental Deletion Protection -To safeguard analytical history, all semantic datasets are configured with: -* `delete_contents_on_destroy = false`: Ensures data/views remain even if the resource is deleted. -* `prevent_destroy = true`: Forces a manual override to destroy the dataset, protecting it from `terraform destroy` or accidental refactoring. +### BigQuery Deletion Protection +To safeguard data, semantic datasets are configured with: +* `delete_contents_on_destroy = false`: Ensures data and views persist if the resource is deleted. +* `prevent_destroy = true`: Requires a manual override to destroy the dataset. ### Orchestration (`orchestration.tf`) -* **Cloud Scheduler:** `0 0 * * *` (Daily 12AM PHT) triggers the Extractor. -* **Eventarc:** Monitors `object.v1.finalized` on the Archival bucket. -* **Workflows:** `pipeline-dispatcher` evaluates logic to trigger the main pipeline. +* **Cloud Scheduler:** Triggers the Extractor daily at 12AM PHT. +* **Eventarc:** Monitors `object.v1.finalized` events on the Archival bucket. +* **Workflows:** `pipeline-dispatcher` triggers the main pipeline. -## IAM & Security Matrix (`iam_bindings.tf`, `wif.tf`) +## IAM & Security (`iam_bindings.tf`, `wif.tf`) -This project implements **Zero Trust** via Workload Identity Federation and granular Service Account (SA) permissions. +This project uses Workload Identity Federation and granular Service Account permissions. -### Identity Registry -| Identity Name | Role/Purpose | +### Service Accounts +| Identity Name | Purpose | | :--- | :--- | -| `github-actions-deployer` | CI/CD automation for infra and code updates. | -| `drive-extractor-sa` | I/O identity for data extraction and archival. | -| `ops-pipeline-sa` | Compute identity for the main processing pipeline. | -| `eventarc-invoker-sa` | Orchestration identity to receive events and trigger workflows. | -| `job-invoker-sa` | Scheduler identity to trigger Cloud Run jobs. | +| `github-actions-deployer` | CI/CD automation. | +| `drive-extractor-sa` | Data extraction and archival. | +| `ops-pipeline-sa` | Main processing pipeline. | +| `eventarc-invoker-sa` | Event routing and workflow triggers. | +| `job-invoker-sa` | Triggering Cloud Run jobs. | ### Permission Bindings | Identity | Target | Roles | Rationale | @@ -89,21 +89,20 @@ This project implements **Zero Trust** via Workload Identity Federation and gran ### Workload Identity Federation * **Pool:** `github-pool` -* **Trust Policy:** Restricted to `${var.github_repo}` to prevent unauthorized repository access. +* **Trust Policy:** Restricted to `${var.github_repo}` to prevent unauthorized access. ## Inputs & Variables (`variables.tf`) -| Name | Type | Sensitive | Description | -| :--- | :--- | :--- | :--- | -| `project_id` | `string` | No | Target Google Cloud Project ID. | -| `region` | `string` | No | The Project GCP region. | -| `environment` | `string` | No | Deployment environment (dev, prod). | -| `github_repo` | `string` | No | Format: `owner/repository`. | -| `bq_dataset_id` | `string` | No | BigQuery dataset containing externalized GCS tables. | -| `alert_email_map` | `map` | **Yes** | Monitoring notification recipients. | - +| Name | Type | Description | +| :--- | :--- | :--- | +| `project_id` | `string` | Target Google Cloud Project ID. | +| `region` | `string` | GCP region. | +| `environment` | `string` | Deployment environment (dev, prod). | +| `github_repo` | `string` | Format: `owner/repository`. | +| `bq_dataset_id` | `string` | BigQuery dataset ID. | +| `alert_email_map` | `map` | Monitoring notification recipients (Sensitive). | ## State Management -State is managed remotely in GCS to ensure consistency and locking. +State is stored remotely in GCS. ```hcl terraform { backend "gcs" { diff --git a/power_bi/docs/customer_experience/dax_dictionary.md b/power_bi/docs/customer_experience/dax_dictionary.md index 0ae2e0c..b7df552 100644 --- a/power_bi/docs/customer_experience/dax_dictionary.md +++ b/power_bi/docs/customer_experience/dax_dictionary.md @@ -1,10 +1,10 @@ -# DAX Data Dictionary: Customer Experience & Revenue Exposure Dashboard +# DAX Data Dictionary: Customer Experience & Revenue Exposure -### Display Folder - _prerequisite_measures -*Fundamental building blocks and base aggregations required for higher-level logic. This section includes primary counts, temporal baselines, and period-over-period comparison logic.* +### Base Measures +*Foundational aggregations and counts used for period-over-period comparison and KPI calculation.* - Measure Name: **`total_revenue`** -- Description: *Gross revenue generated across all customer segments and regions.* +- Description: *Total revenue across all customer segments and regions.* ```dax SUM(customer_weekly_fact[weekly_revenue]) ``` @@ -12,7 +12,7 @@
- Measure Name: **`total_order`** -- Description: *Aggregate count of orders within the filtered period.* +- Description: *Total count of orders within the selected period.* ```dax SUM(customer_weekly_fact[weekly_order_count]) ``` @@ -20,7 +20,7 @@
- Measure Name: **`total_delivered`** -- Description: *Count of orders successfully delivered to customers.* +- Description: *Count of successfully delivered orders.* ```dax SUM(customer_weekly_fact[weekly_delivered_orders]) ``` @@ -28,7 +28,7 @@
- Measure Name: **`total_cancelled`** -- Description: *Volume of orders cancelled by the customer or system.* +- Description: *Total count of cancelled orders.* ```dax SUM(customer_weekly_fact[weekly_cancelled_orders]) ``` @@ -47,7 +47,7 @@
- Measure Name: **`active_customer_prev_month`** -- Description: *Baseline count of unique active customers from the previous calendar month.* +- Description: *Distinct count of unique customers who placed an order in the previous calendar month.* ```dax CALCULATE( DISTINCTCOUNT(customer_weekly_fact[customer_id_int]), @@ -58,7 +58,7 @@
- Measure Name: **`MoM_drop_off_customers`** -- Description: *Identifies customers active in the previous month who have not placed an order in the current month.* +- Description: *Count of customers active in the previous month who have not placed an order in the current month.* ```dax VAR prev_period_customer = CALCULATETABLE( @@ -84,11 +84,11 @@
-### Display Folder - KPI_measures -*High-level performance indicators used for proactive monitoring. These measures quantify financial exposure and customer retention health.* +### KPI Measures +*Performance indicators used to track financial exposure and customer retention.* - Measure Name: **`revenue_at_risk`** -- Description: *Total financial value currently impacted by fulfillment delays exceeding the dynamic threshold.* +- Description: *Total revenue associated with delivery delays exceeding the defined threshold.* ```dax VAR threshold = SELECTEDVALUE(delay_threshold_parameter[threshold_parameter], 3) @@ -108,7 +108,7 @@
- Measure Name: **`MoM_drop_off_rate`** -- Description: *The percentage of last month's active buyers who failed to return in the current month.* +- Description: *Percentage of buyers from the previous month who did not return in the current month.* ```dax DIVIDE([MoM_drop_off_customers], [active_customer_prev_month], 0) ``` @@ -116,7 +116,7 @@
- Measure Name: **`cancellation_rate`** -- Description: *The ratio of cancelled orders to total order volume.* +- Description: *Ratio of cancelled orders to total order volume.* ```dax DIVIDE([total_cancelled], [total_order], 0) ``` @@ -127,11 +127,11 @@
-### Display Folder - visual_measures -*Support measures designed for specific charts and UI elements, including weighted averages and conditional formatting logic.* +### Visual and UI Measures +*Measures supporting visualizations, including weighted averages and conditional formatting logic.* - Measure Name: **`weighted_avg_delivery_delay`** -- Description: *Volume-weighted average of delivery delays to ensure accuracy across segments with varying order counts.* +- Description: *Volume-weighted average of delivery delays.* ```dax DIVIDE( SUMX(customer_weekly_fact, @@ -144,7 +144,7 @@
- Measure Name: **`weighted_avg_approval_lag`** -- Description: *Volume-weighted average of the time taken for order approval.* +- Description: *Volume-weighted average of order processing time (lag).* ```dax DIVIDE( SUMX(customer_weekly_fact, @@ -157,7 +157,7 @@
- Measure Name: **`weighted_avg_lead_time`** -- Description: *Volume-weighted average of total fulfillment duration (Creation to Delivery).* +- Description: *Volume-weighted average of total fulfillment duration.* ```dax DIVIDE( SUMX(customer_weekly_fact, @@ -170,7 +170,7 @@
- Measure Name: **`bubble_size_sensitivity`** -- Description: *Scales scatter plot markers to highlight high-revenue segments experiencing extreme delays.* +- Description: *Scales markers in the scatter plot visualization.* ```dax [revenue_at_risk] ^ 0.8 ``` @@ -178,7 +178,7 @@
- Measure Name: **`color_format_revenue_at_risk`** -- Description: *Calculates the percentage of revenue at risk for conditional formatting rules.* +- Description: *Calculates the ratio of revenue at risk for conditional formatting rules.* ```dax DIVIDE([revenue_at_risk], [total_revenue]) ``` @@ -186,7 +186,7 @@
- Measure Name: **`last_source_update`** -- Description: *Displays the most recent data sync timestamp from the BigQuery source.* +- Description: *Most recent data refresh timestamp from the BigQuery source.* ```dax MAX(source_last_update[Last_Update_Time]) ``` @@ -197,11 +197,11 @@
-### Display Folder - visual_measures_tooltip -*Context-specific measures optimized for on-hover interactivty, providing granular diagnostics and clean string formatting.* +### Tooltip Measures +*Measures optimized for on-hover interactivty and formatting.* - Measure Name: **`tooltip_revenue_at_risk`** -- Description: *Smart currency formatting (K/M) for revenue at risk in tooltips.* +- Description: *Currency formatting for revenue at risk in tooltips.* ```dax IF( [revenue_at_risk] < 1000000, @@ -246,7 +246,7 @@
- Measure Name: **`tooltip_MoM_drop_off_customer`** -- Description: *Formatted volume of dropped-off customers (K) for hover details.* +- Description: *Formatted count of dropped-off customers.* ```dax IF( FORMAT([MoM_drop_off_customers], "0"), @@ -257,7 +257,7 @@
- Measure Name: **`tooltip_highlight_top_3`** -- Description: *Conditional formatting hex codes to highlight the top 3 states by revenue risk in bar chart tooltips.* +- Description: *Hex codes to highlight the top 3 states by revenue risk.* ```dax VAR state_rank = RANKX( diff --git a/power_bi/docs/customer_experience/operational_guide.md b/power_bi/docs/customer_experience/operational_guide.md index 26c6325..f337908 100644 --- a/power_bi/docs/customer_experience/operational_guide.md +++ b/power_bi/docs/customer_experience/operational_guide.md @@ -1,35 +1,35 @@ -# Strategic & Operational Guide: Customer Experience & Revenue Exposure +# Operational Guide: Customer Experience & Revenue Exposure ## Strategic Intent -The **Customer Experience & Revenue Exposure** dashboard is a decision-support system designed to monitor financial risk driven by fulfillment failures. By correlating delivery delays with buyer drop-off rates and cancellations, it allows leadership to quantify the "cost of friction" and prioritize interventions in high-value segments or regions. +The **Customer Experience & Revenue Exposure** dashboard monitors financial risk resulting from fulfillment delays. By correlating delivery delays with buyer drop-off and cancellation rates, it allows leadership to quantify the impact on revenue and prioritize interventions in specific segments or regions. -## Navigation & Interface Guide -The dashboard uses a "Head-Up Display" (HUD) style with interactive fly-out panes for a clean workspace. +## Interface and Navigation +The dashboard uses interactive panes to maintain a clean workspace. -* **Information Pane:** Click the "i" icon in the top header to view the performance legend and KPI interpretation guides. +* **Information Pane:** Click the **"i"** icon to view the performance legend and KPI definitions. * **Filter Pane:** Click the funnel icon to open global controls. - * **Threshold Control:** Use the **Delay Threshold Parameter** to define what constitutes a "Danger Zone" (Default: 3 Days). - * **Time Selection:** Select the specific Month and Year for historical review. + * **Threshold Control:** Use the **Delay Threshold Parameter** to define the delivery delay required to flag revenue as "at risk" (Default: 3 Days). + * **Time Selection:** Select the Month and Year for review. * **Action:** Click **Apply Filters** to update the dashboard. -* **Closing Panes:** To return to the main view, click the **"X"** button on the pane or click anywhere in the darkened workspace background. +* **Closing Panes:** Click the **"X"** button on the pane or click the workspace background to return to the main view. -## Smoke Detector Configuration (KPI Thresholds) -| KPI | Green (Natural) | Orange (Alarming) | Red (Critical) | +## KPI Thresholds +| KPI | Green (Target) | Orange (Warning) | Red (Critical) | | :--- | :--- | :--- | :--- | | **Revenue at Risk** | < 5% of Total Revenue | 6% – 15% | > 15% | | **MoM Buyer Drop-off** | < 15% | 16% – 30% | > 30% | | **Cancellation Rate** | < 2% | 3% – 5% | > 5% | -## Decision Matrix -| Visual | Signal to Watch | Response Action | +## Decision Support +| Visual | Indicator | Recommended Action | | :--- | :--- | :--- | -| **Active Buyer Trend** | Widening gap between current and previous month lines. | Identifying mass abandonment; trigger LTV recovery strategy. | -| **Segment Health Matrix** | High-revenue segments (e.g., D2C) drifting right (higher delay). | Critical revenue threat; investigate segment-specific fulfillment hubs. | -| **Danger Zone Breakdown** | Heavy concentration in "5+ Days Late" bracket. | Systemic failure; escalate for carrier reallocation or capacity audit. | -| **Geographic Risk** | High revenue exposure concentrated in specific States. | Regional bottleneck; check for local warehouse or courier disruptions. | +| **Active Buyer Trend** | Increasing gap between current and previous month lines. | Investigate customer retention and recovery strategies. | +| **Segment Health Matrix** | High-revenue segments with increasing delays. | Investigate segment-specific fulfillment or logistics issues. | +| **Delay Breakdown** | High concentration in the "5+ Days Late" bracket. | Assess carrier performance or fulfillment capacity. | +| **Geographic Risk** | High revenue exposure in specific regions. | Investigate regional warehouse or courier disruptions. | ## Operational Workflow -1. **Daily Pulse:** Check the 3 core KPIs to identify if the system is within healthy baselines. -2. **Exposure Analysis:** Use the **Severity Exposure** bar chart to see if delays are escalating or clearing. -3. **Segment Prioritization:** Use the **Segment Health** scatter plot to identify which customer group (SMB, D2C, Enterprise) requires immediate communication. -4. **Regional Deep-Dive:** Use the **Geographic Risk** bar chart and its city-level tooltips to pinpoint the exact failure hub. +1. **Performance Check:** Review core KPIs against target baselines. +2. **Delay Analysis:** Use the **Severity Exposure** chart to determine if delays are increasing or decreasing. +3. **Segment Review:** Use the **Segment Health** scatter plot to identify customer groups requiring attention. +4. **Regional Review:** Use the **Geographic Risk** chart to identify specific locations causing bottlenecks. diff --git a/power_bi/docs/customer_experience/technical_architecture.md b/power_bi/docs/customer_experience/technical_architecture.md index af2d108..8084018 100644 --- a/power_bi/docs/customer_experience/technical_architecture.md +++ b/power_bi/docs/customer_experience/technical_architecture.md @@ -2,61 +2,61 @@ ## Technical Architecture * **Data Source:** Google BigQuery (`customer_dim`, `customer_weekly_fact`). -* **Update Logic:** The `Source Last Update` metric is driven by the BigQuery schema metadata (`source_last_update` table), ensuring data freshness transparency. +* **Update Logic:** The `Source Last Update` metric uses BigQuery schema metadata from the `source_last_update` table. * **Storage Mode:** Import Mode (Power BI). -* **Time Grain:** Weekly aggregation for fact data; Monthly aggregation for retention/drop-off analysis. +* **Time Grain:** Weekly aggregation for fact data; monthly aggregation for retention and drop-off analysis. ## Core Logic & Calculations -### Proactive Danger Zone (Revenue at Risk) -The dashboard identifies financial exposure using a **parameter-driven** filtering logic (`revenue_at_risk` measure): -- **Dynamic Delay Threshold:** Uses the `[delay_threshold_parameter]` (range 1 – 14 days) to define the minimum delivery delay required to flag revenue as "At Risk." +### Revenue At Risk +The dashboard identifies financial exposure using parameter-driven filtering logic in the `revenue_at_risk` measure: +- **Delay Threshold:** Uses the `[delay_threshold_parameter]` (1–14 days) to define the delivery delay required to flag revenue as at risk. -### MoM Buyer Drop-off (Retention Monitor) -Tracks short-term customer abandonment using month-over-month set-based logic: -- **Set-Based Identification:** Employs `EXCEPT` logic to isolate customers active in the previous month who are absent in the current month selection. +### Buyer Drop-off Monitoring +Tracks customer abandonment using month-over-month (MoM) logic: +- **Identification:** Uses `EXCEPT` logic to identify customers active in the previous month who are absent in the current month. ## DAX Data Dictionary *(See [`dax_dictionary.md`](./dax_dictionary.md) for full expressions)* -### Measures Group: Prerequisites Measures (Base Measures) +### Base Measures | **Measure Name** | Description | | --- | --- | -| **total_revenue** | Gross financial value across all transactions. | -| **total_order** | Aggregate count of orders placed. | -| **total_delivered** | Count of orders with a successful delivery status. | -| **total_cancelled** | Volume of orders cancelled by customers or the system. | -| **active_customer_current** | Distinct count of customers buying in the selected month. | -| **active_customer_prev_month** | Distinct count of customers buying in the previous month. | -| **MoM_drop_off_customers** | Headcount of customers active last month but not this month. | +| **total_revenue** | Gross financial value of all transactions. | +| **total_order** | Total count of orders. | +| **total_delivered** | Count of orders with a delivery status. | +| **total_cancelled** | Count of cancelled orders. | +| **active_customer_current** | Distinct count of customers in the selected month. | +| **active_customer_prev_month** | Distinct count of customers in the previous month. | +| **MoM_drop_off_customers** | Count of customers active last month but not the current month. | -### Measures Group: Key Performance Indicator (KPI) +### Key Performance Indicators (KPI) | **Measure Name** | Description | | --- | --- | -| **revenue_at_risk** | Total revenue trapped in delivery delays > threshold. | -| **MoM_drop_off_rate** | Percentage of last month's buyers who haven't returned. | -| **cancellation_rate** | Ratio of lost orders to total order volume. | +| **revenue_at_risk** | Revenue associated with delivery delays exceeding the threshold. | +| **MoM_drop_off_rate** | Percentage of last month's buyers who have not returned. | +| **cancellation_rate** | Ratio of cancelled orders to total order volume. | -### Measures Group: Visual Measures (UX) +### Visual and UX Measures | **Measure Name** | Description | | --- | --- | | **weighted_avg_delivery_delay** | Volume-weighted mean of delivery delays. | | **weighted_avg_approval_lag** | Volume-weighted mean of order processing time. | | **weighted_avg_lead_time** | Volume-weighted mean of total fulfillment duration. | -| **bubble_size_sensitivity** | Power-scaled markers for high-impact scatter plot points. | -| **color_format_revenue_at_risk** | Drive-ratio for conditional formatting (Green/Orange/Red). | -| **last_source_update** | Timestamp of the most recent BigQuery data refresh. | +| **bubble_size_sensitivity** | Scaled markers for scatter plot visualization. | +| **color_format_revenue_at_risk** | Logic for conditional formatting. | +| **last_source_update** | Timestamp of the most recent data refresh. | -### Measures Group: Visual Tooltip Measures (UX) +### Tooltip Measures | **Measure Name** | Description | | --- | --- | -| **tooltip_revenue_at_risk** | Formatted currency (K/M) for hover details. | -| **tooltip_avg_delivery_delay** | String-formatted delay diagnostic. | -| **tooltip_avg_approval_lag** | String-formatted processing diagnostic. | -| **tooltip_avg_lead_time** | String-formatted fulfillment diagnostic. | +| **tooltip_revenue_at_risk** | Formatted currency for hover details. | +| **tooltip_avg_delivery_delay** | Formatted string for delay diagnostics. | +| **tooltip_avg_approval_lag** | Formatted string for processing diagnostics. | +| **tooltip_avg_lead_time** | Formatted string for fulfillment diagnostics. | | **tooltip_MoM_drop_off_customer** | Formatted headcount for retention tooltips. | -| **tooltip_highlight_top_3** | Logic for state-level ranking highlights. | +| **tooltip_highlight_top_3** | Logic for ranking highlights. | diff --git a/power_bi/docs/fulfillment_monitor/dax_dictionary.md b/power_bi/docs/fulfillment_monitor/dax_dictionary.md index 642b2a0..11db421 100644 --- a/power_bi/docs/fulfillment_monitor/dax_dictionary.md +++ b/power_bi/docs/fulfillment_monitor/dax_dictionary.md @@ -1,10 +1,10 @@ -# DAX Data Dictionary: Fulfillment Decision Monitor Dashboard +# DAX Data Dictionary: Fulfillment Decision Monitor -### Display Folder - Paging & Navigation Measures -*Supports dynamic paging and item ranking to handle large datasets without overwhelming the visual interface.* +### Paging and Navigation +*Measures supporting dynamic paging and ranking for large datasets.* - Measure Name: **`item_rank`** -- Description: *Ranks sellers by latency slippage (descending) specifically for those with revenue at risk.* +- Description: *Ranks sellers by latency slippage (descending) for those with revenue at risk.* ```dax CALCULATE( RANKX( @@ -17,7 +17,7 @@
- Measure Name: **`item_rank_filter`** -- Description: *Binary flag (1/0) that identifies if a seller's rank falls within the currently selected page and page size.* +- Description: *Binary flag (1/0) identifying if a seller's rank falls within the selected page and page size.* ```dax VAR _page = [paging_helper Value] VAR _page_size = [table_items Value] @@ -37,7 +37,7 @@
- Measure Name: **`max_page`** -- Description: *Calculates the total number of pages available based on the current count of sellers at risk and the chosen page size.* +- Description: *Calculates total pages based on the count of sellers at risk and the selected page size.* ```dax ROUNDUP( DIVIDE(VALUE([seller_at_risk]), [table_items Value]), @@ -48,7 +48,7 @@
- Measure Name: **`current_viewed_filter`** -- Description: *Conditional formatting logic used to highlight sellers who are both at risk and on the currently viewed page.* +- Description: *Logic used to highlight sellers who are at risk and on the currently viewed page.* ```dax VAR _is_at_risk = [seller_at_risk] VAR _is_on_page = [item_rank_filter] @@ -64,7 +64,7 @@
- Measure Name: **`page_filter`** -- Description: *Global filter logic used to restrict the paging slicer to the actual range of data available.* +- Description: *Restricts the paging slicer to the available data range.* ```dax VAR _total_items = VALUE([seller_at_risk]) VAR items_per_page = [table_items Value] @@ -85,7 +85,7 @@
- Measure Name: **`revenue_at_risk`** -- Description: *Sum of revenue for sellers identified as outliers. This version is additive across time for paging and intervention lists.* +- Description: *Total revenue for sellers identified as outliers.* ```dax SUMX( VALUES(calendar_dates[Week Start Date]), @@ -103,11 +103,11 @@
-### Display Folder - Prerequisites Measures -*Fundamental building blocks and base aggregations required for higher-level logic. This section includes primary counts, temporal baselines, and the core outlier detection engine.* +### Base Measures +*Aggregations and counts used for temporal baselines and outlier detection.* - Measure Name: **`latency`** -- Description: *Basic average of the delivery delay (Estimated vs. Actual delivery dates).* +- Description: *Average delivery delay (Actual vs. Estimated delivery dates).* ```dax AVERAGE(seller_weekly_fact[weekly_avg_delivery_delay]) ``` @@ -115,7 +115,7 @@
- Measure Name: **`latency_current_week`** -- Description: *Retrieves the average latency for the most recent week in the current filter context.* +- Description: *Average latency for the selected week.* ```dax CALCULATE([latency], calendar_dates[Date]) ``` @@ -123,7 +123,7 @@
- Measure Name: **`latency_previous_week`** -- Description: *Retrieves the latency from exactly seven days prior to the current selection for trend analysis.* +- Description: *Latency from seven days prior to the selection for trend analysis.* ```dax CALCULATE( [latency], @@ -134,7 +134,7 @@
- Measure Name: **`latency_4w_rolling_avg`** -- Description: *Smoothed fulfillment performance baseline using a 28-day trailing window to establish "Normal" behavior.* +- Description: *Fulfillment performance baseline using a 28-day trailing window.* ```dax AVERAGEX( DATESINPERIOD('calendar_dates'[Date], @@ -146,7 +146,7 @@
- Measure Name: **`is_outlier`** -- Description: *Core "Smoke Detector" logic; flags sellers where current latency exceeds the 4-week baseline + 1 StdDev and the slip is > 0.5 days.* +- Description: *Flags sellers where current latency exceeds the 28-day baseline + 1 StdDev and the delay is > 0.5 days.* ```dax VAR current_latency = [latency_current_week] VAR _4w_avg_latency = [latency_4w_rolling_avg] @@ -181,7 +181,7 @@
- Measure Name: **`matrix_slippage_tracker`** -- Description: *Ensures sellers remain visible in history heatmaps if they are outliers in the current global selection.* +- Description: *Maintains seller visibility in history heatmaps for currently identified outliers.* ```dax VAR global_flag = [is_outlier_global] VAR cell_slippage = [latency_current_week] - [latency_4w_rolling_avg] @@ -199,7 +199,7 @@
- Measure Name: **`is_outlier_global`** -- Description: *Calculates outlier status ignoring local visual date filters to support matrix row persistence.* +- Description: *Calculates outlier status ignoring local visual date filters.* ```dax CALCULATE( [is_outlier], @@ -210,7 +210,7 @@
- Measure Name: **`logistics_delay`** -- Description: *Isolates the courier's portion of the delay by removing internal warehouse processing time.* +- Description: *Calculates courier delay by removing internal warehouse processing time.* ```dax VAR total_delay = [latency_current_week] VAR result = total_delay - [internal_delay] @@ -221,7 +221,7 @@
- Measure Name: **`latency_slippage`** -- Description: *Quantifies the exact "speed loss" in days only for sellers currently flagged as outliers.* +- Description: *Quantifies delivery delay (in days) for sellers identified as outliers.* ```dax VAR delta = [latency_current_week] - [latency_4w_rolling_avg] RETURN @@ -234,11 +234,11 @@
-### Display Folder - KPI Measures -*High-level performance indicators used in cards and executive summaries. These measures quantify systemic health and the scale of active bottlenecks.* +### KPI Measures +*Performance indicators quantifying systemic health and bottlenecks.* - Measure Name: **`seller_at_risk`** -- Description: *Count of unique Seller IDs currently flagged by the outlier detection logic.* +- Description: *Count of unique sellers currently identified as outliers.* ```dax COUNTROWS( FILTER( @@ -251,7 +251,7 @@
- Measure Name: **`network_slowdown`** -- Description: *Percentage of total order volume currently losing speed compared to the 4-week baseline.* +- Description: *Percentage of total order volume currently delayed compared to the 28-day baseline.* ```dax VAR _slipping_vol = SUMX( @@ -268,7 +268,7 @@
- Measure Name: **`delivery_stability`** -- Description: *Measures statistical unpredictability; high values indicate delivery dates are becoming inconsistent.* +- Description: *Statistical measure of delivery date variability.* ```dax VAR _4w_avg_latency = [latency_4w_rolling_avg] VAR _stddev = @@ -287,11 +287,11 @@
-### Display Folder - Visual Measures -*Support measures designed for specific charts and UI elements, including diagnostic isolation, ranking, and conditional formatting.* +### Visual and UI Measures +*Measures designed for specific charts and UI elements.* - Measure Name: **`internal_delay`** -- Description: *Calculates the average time taken by sellers to approve/process orders (Warehouse Lag).* +- Description: *Average time taken by sellers to approve or process orders (Warehouse Lag).* ```dax VAR result = AVERAGE(seller_weekly_fact[weekly_avg_approval_lag]) RETURN @@ -300,7 +300,7 @@
- Measure Name: **`is_new_seller`** -- Description: *Flags sellers in their first 30 days of operation to provide onboarding context.* +- Description: *Flags sellers in their first 30 days of operation.* ```dax VAR first_order = SELECTEDVALUE(seller_dim[first_order_date]) VAR current_date = MAX(calendar_dates[Date]) @@ -315,7 +315,7 @@
- Measure Name: **`last_source_update`** -- Description: *Displays the most recent data sync timestamp from the source metadata.* +- Description: *Timestamp of the most recent data refresh from source metadata.* ```dax MAX(source_last_update_time[Last_Update_Time]) ``` @@ -323,7 +323,7 @@
- Measure Name: **`bubble_size_sensivity`** -- Description: *Exponentially scales scatter plot markers to improve visual differentiation of high-slippage sellers.* +- Description: *Scales scatter plot markers based on latency slippage.* ```dax [latency_slippage] ^ 4.5 ``` @@ -331,7 +331,7 @@
- Measure Name: **`total_order_volume`** -- Description: *Aggregate count of all order items across the filtered dataset.* +- Description: *Total count of all order items.* ```dax SUM(seller_weekly_fact[weekly_order_count]) ``` @@ -339,7 +339,7 @@
- Measure Name: **`wrapper_is_outlier`** -- Description: *String-based flag ("Active" / "No") for cleaner display of outlier status in intervention tables.* +- Description: *Status flag ("Active" / "No") for outlier status in intervention tables.* ```dax IF([is_outlier], "Active", "No") ``` @@ -349,11 +349,11 @@
-### Display Folder -Tooltip Visual Measures -*Context-specific measures optimized for tooltips to provide granular impact details without cluttering the primary UI.* +### Tooltip Measures +*Measures optimized for tooltips and hover interactivity.* - Measure Name: **`tooltip_revenue_exposed`** -- Description: *Total financial value currently handled by sellers flagged as outliers.* +- Description: *Total revenue handled by sellers identified as outliers.* ```dax VAR result = SUMX( @@ -374,7 +374,7 @@
- Measure Name: **`tooltip_week_start_date`** -- Description: *Provides specific week context during hover-over analysis on trend lines.* +- Description: *Week context for trend line tooltips.* ```dax SELECTEDVALUE(calendar_dates[Week Start Date]) ``` @@ -382,7 +382,7 @@
- Measure Name: **`tooltip_seller_id`** -- Description: *Retrieves the specific Seller ID in context for detailed tooltip labels.* +- Description: *Retrieves the Seller ID for tooltip labels.* ```dax SELECTEDVALUE(seller_dim[seller_id_int]) ``` diff --git a/power_bi/docs/fulfillment_monitor/operational_guide.md b/power_bi/docs/fulfillment_monitor/operational_guide.md index 19f0555..ddcfa9c 100644 --- a/power_bi/docs/fulfillment_monitor/operational_guide.md +++ b/power_bi/docs/fulfillment_monitor/operational_guide.md @@ -1,37 +1,36 @@ -# Strategic & Operational Guide: Fulfillment Decision Monitor +# Operational Guide: Fulfillment Decision Monitor ## Strategic Intent -The **Fulfillment Decision Monitor** is an operational "Smoke Detector" designed to identify logistics partners and sellers requiring immediate intervention. By focusing on statistical deviations rather than absolute failure, it allows teams to catch performance slippage early, preventing systemic delays before they impact the final customer. +The **Fulfillment Decision Monitor** identifies logistics partners and sellers requiring immediate attention. By focusing on statistical deviations in performance, it allows teams to identify delays early and intervene before they impact customers. -## Navigation & Interface Guide -The dashboard utilizes an interactive "Head-Up Display" (HUD) architecture for rapid diagnostic review and **Dynamic Sensitivity Calibration**. +## Interface and Navigation +The dashboard uses an interactive interface for diagnostic review and sensitivity adjustment. -**Information Pane :** Click the "i" icon in the top header to view the performance legend and KPI interpretation guides. -* **Sensitivity Controls (Slicer Pane):** - * **Slippage Threshold:** Adjust the minimum "speed loss *in Delivery Delay*" (0.5 to 3.0 days) required to trigger an alert. - * **Statistical Boundary:** Adjust the Standard Deviation multiplier (1.0 to 3.0) to tighten or loosen the detection logic. -* **Paging Controls:** Navigate large lists of at-risk sellers using the "Page" and "Items per Page" (Inside Filter Pane) selectors. -* **Closing Panes:** To return to the main view, click the **"X"** button on the pane or click anywhere in the darkened workspace background. +* **Information Pane:** Click the **"i"** icon to view the performance legend and KPI definitions. +* **Sensitivity Controls:** + * **Slippage Threshold:** Adjust the minimum delivery delay (0.5 to 3.0 days) required to trigger an alert. + * **Statistical Boundary:** Adjust the Standard Deviation multiplier (1.0 to 3.0) to modify the detection logic. +* **Paging Controls:** Navigate the list of at-risk sellers using the "Page" and "Items per Page" (located in the Filter Pane) selectors. +* **Closing Panes:** Click the **"X"** button on the pane or click the workspace background to return to the main view. -## Smoke Detector Configuration (KPI Thresholds) +## KPI Thresholds | KPI | Green (Stable) | Orange (Warning) | Red (Critical) | | :--- | :--- | :--- | :--- | | **Delivery Stability** | 0% – 10% | 11% – 25% | > 25% | | **Network Slowdown** | 0% – 2% | 3% – 5% | > 5% | | **Sellers at Risk** | 0 – 10 | 11 – 50 | > 50 | -## Decision Matrix -| Visual | Signal to Watch | Response Action | +## Decision Support +| Visual | Indicator | Recommended Action | | :--- | :--- | :--- | -| **Network Speed Trend** | Line ascending toward 0. | Systemic slowdown; check for carrier strikes or weather events. | -| **Lost Speed History** | Deep blue cells appearing for 3+ weeks. | Chronic failure; initiate Performance Improvement Plan (PIP). | -| **Delay Diagnostics** | Long "Warehouse Lag" bar. | Seller internal issue; audit warehouse staffing/processes. | -| **Priority Targeting** | Large dots in the Top-Right. | High-impact crisis; immediate call to high-volume partner. | -| **Action List** | Specific IDs with "Active" status. | Direct intervention; use Tooltip revenue to prioritize calls. | - +| **Network Speed Trend** | Increasing trend line. | Assess for systemic issues such as carrier disruptions or regional events. | +| **Lost Speed History** | Consistent high-delay markers for 3+ weeks. | Address chronic performance failures with the partner. | +| **Delay Diagnostics** | Long "Warehouse Lag" bar. | Investigate seller internal issues, such as warehouse staffing or processes. | +| **Priority Targeting** | Large markers in the top-right quadrant. | Prioritize high-impact partners for immediate review. | +| **Action List** | IDs with "Active" status. | Use revenue impact tooltips to prioritize direct intervention. | ## Operational Workflow -1. **Check the Pulse:** Review the 3 core KPIs. If **Network Slowdown** is > 5%, the issue is systemic (Global). -2. **Identify Outliers:** If **Sellers at Risk** is high but network slowdown is low, the issues are seller-specific. -3. **Prioritize Impact:** Use the **Impact vs. Delay Mapping** scatter plot to find high-volume sellers with high slippage. -4. **Execute Audit:** Select a seller from the **Operation Intervention List** and use the **Delay Diagnostics** to determine if the call should focus on their warehouse (Internal) or their courier (Logistics). +1. **Performance Check:** Review core KPIs. A high **Network Slowdown** indicates a systemic issue. +2. **Outlier Identification:** If **Sellers at Risk** is high but network slowdown is low, the issues are likely seller-specific. +3. **Impact Prioritization:** Use the **Impact vs. Delay** scatter plot to identify high-volume sellers with significant delays. +4. **Root Cause Audit:** Select a seller from the **Operation Intervention List** and use **Delay Diagnostics** to determine if the issue is internal (warehouse) or logistical (courier). diff --git a/power_bi/docs/fulfillment_monitor/technical_architecture.md b/power_bi/docs/fulfillment_monitor/technical_architecture.md index a742627..ebf2c14 100644 --- a/power_bi/docs/fulfillment_monitor/technical_architecture.md +++ b/power_bi/docs/fulfillment_monitor/technical_architecture.md @@ -2,66 +2,66 @@ ## Technical Architecture * **Data Source:** Google BigQuery (`seller_dim`, `seller_weekly_fact`). -* **Update Logic:** Performance snapshots are generated weekly; the `last_source_update` metric tracks data freshness from the BigQuery metadata. +* **Update Logic:** Performance snapshots are generated weekly. The `last_source_update` metric tracks data refresh timestamps from BigQuery metadata. * **Storage Mode:** Import Mode (Power BI). -## Dynamic Outlier Detection (The "Smoke Detector") -The dashboard identifies risk using a dual-layer **parameter-driven** statistical filter (`is_outlier` measure): -- **Dynamic Slippage Threshold:** Uses the `[slippage_threshold Value]` parameter (range 0.5 – 3.0 days) to set the minimum speed loss *(Delivery Delay)* required for an alert. -- **Dynamic Statistical Boundary:** Uses the `[std_dev_boundary Value]` parameter (range 1.0 – 3.0) as a multiplier for the standard deviation boundary. -- **Exclusion Logic:** The `is_valid_year` variable excludes 2022 records to prevent incomplete historical data from skewing the baseline. +## Outlier Detection Logic +The dashboard identifies fulfillment risk using a statistical filter based on two parameters: +- **Slippage Threshold:** Uses the `[slippage_threshold Value]` parameter (0.5 – 3.0 days) to define the minimum delivery delay required for an alert. +- **Statistical Boundary:** Uses the `[std_dev_boundary Value]` parameter (1.0 – 3.0) as a multiplier for the standard deviation boundary. +- **Data Filtering:** Historical data from 2022 is excluded to prevent incomplete records from affecting the baseline. ## DAX Data Dictionary *(See [`dax_dictionary.md`](./dax_dictionary.md) for full expressions)* -### Measures Group: Paging & Navigation +### Paging and Navigation | **Measure Name** | Description | | --- | --- | | **item_rank** | Ranks sellers by slippage intensity for sorted lists. | -| **item_rank_filter** | Logic for identifying sellers within the current page range. | -| **max_page** | Dynamic calculation of total pages based on at-risk count. | -| **page_filter** | Global logic used to restrict the paging slicer range. | -| **current_viewed_filter** | Color-based logic for highlighting current page items. | -| **revenue_at_risk** | Additive revenue impact calculation for at-risk sellers. | +| **item_rank_filter** | Identifies sellers within the current page range. | +| **max_page** | Calculates total pages based on the count of sellers at risk. | +| **page_filter** | Restricts the paging slicer range. | +| **current_viewed_filter** | Highlights items on the current page. | +| **revenue_at_risk** | Calculates the revenue impact of at-risk sellers. | -### Measures Group: Prerequisites Measures +### Base Measures | **Measure Name** | Description | | --- | --- | -| **latency** | Basic average of the delivery delay (Estimated vs. Actual). | -| **latency_current_week** | Context-aware latency for the specific date selection. | -| **latency_previous_week** | Prior week latency for trend analysis. | +| **latency** | Average delivery delay (Actual vs. Estimated). | +| **latency_current_week** | Latency for the selected week. | +| **latency_previous_week** | Latency for the prior week used for trend analysis. | | **latency_4w_rolling_avg** | 28-day trailing baseline for fulfillment performance. | -| **is_outlier** | Core logic flagging sellers deviating from historical norms. | -| **matrix_slippage_tracker** | Logic for maintaining seller visibility in historical heatmaps. | +| **is_outlier** | Flags sellers deviating from historical performance baselines. | +| **matrix_slippage_tracker** | Maintains seller visibility in historical heatmaps. | | **is_outlier_global** | Outlier status ignoring local visual date filters. | -### Measures Group: Key Performance Indicator (KPI) +### Key Performance Indicators (KPI) | **Measure Name** | Description | | --- | --- | -| **seller_at_risk** | Count of unique Seller IDs flagged for intervention. | -| **network_slowdown** | Percentage of volume currently trending slower than baseline. | -| **delivery_stability** | Statistical measure of delivery date unpredictability. | +| **seller_at_risk** | Count of unique sellers flagged for intervention. | +| **network_slowdown** | Percentage of volume trending slower than the baseline. | +| **delivery_stability** | Statistical measure of delivery date variability. | -### Measures Group: Visual Measures (UX) +### Visual and UX Measures | **Measure Name** | Description | | --- | --- | -| **latency_slippage** | The magnitude of speed loss (days) for at-risk sellers. | -| **total_order_volume** | Aggregate count of all order items. | -| **logistics_delay** | Isolated courier performance (Total - Warehouse). | -| **internal_delay** | Average warehouse processing/approval lag. | +| **latency_slippage** | Magnitude of delay (days) for at-risk sellers. | +| **total_order_volume** | Total count of order items. | +| **logistics_delay** | Average courier performance delay. | +| **internal_delay** | Average warehouse processing or approval lag. | | **is_new_seller** | Flags sellers in their first 30 days of operation. | -| **bubble_size_sensivity** | Exponential scaling for scatter plot markers. | -| **last_source_update** | Displays the most recent data sync timestamp. | -| **wrapper_is_outlier** | String-based status flag ("Active" / "No"). | +| **bubble_size_sensivity** | Scales scatter plot markers based on revenue risk. | +| **last_source_update** | Timestamp of the most recent data sync. | +| **wrapper_is_outlier** | Status flag ("Active" / "No"). | -### Measures Group: Tooltip Visual Measures (UX) +### Tooltip Measures | **Measure Name** | Description | | --- | --- | -| **revenue_exposed** | Total financial value handled by sellers currently at risk. | -| **tool_tip_week_start_date** | Temporal context for trend line hover details. | -| **tooltip_seller_id** | Specific Seller ID associated with a data point. | \ No newline at end of file +| **revenue_exposed** | Total revenue handled by sellers currently at risk. | +| **tool_tip_week_start_date** | Date context for trend line tooltips. | +| **tooltip_seller_id** | Seller ID associated with a data point. | diff --git a/power_bi/docs/product_friction/dax_dictionary.md b/power_bi/docs/product_friction/dax_dictionary.md index 6d57fc2..10842ec 100644 --- a/power_bi/docs/product_friction/dax_dictionary.md +++ b/power_bi/docs/product_friction/dax_dictionary.md @@ -1,7 +1,7 @@ -# DAX Data Dictionary: Product Fulfillment Friction Dashboard +# DAX Data Dictionary: Product Fulfillment Friction -### Display Folder - Paging & Navigation Measures -*Supports dynamic paging and item ranking to handle large datasets without overwhelming the visual interface.* +### Paging and Navigation +*Measures supporting dynamic paging and ranking for large datasets.* - Measure Name: **`item_rank`** - Description: *Ranks products by revenue at risk (descending).* @@ -22,7 +22,7 @@
- Measure Name: **`item_rank_filter`** -- Description: *Binary flag (1/0) that identifies if a products's rank falls within the currently selected page and page size.* +- Description: *Binary flag (1/0) identifying if a product's rank falls within the selected page and page size.* ```dax VAR _page = [paging_helper Value] VAR _page_size = [table_items Value] @@ -47,10 +47,10 @@
- Measure Name: **`max_page`** -- Description: *Calculates the total number of pages available based on the current count of products at risk and the chosen page size.* +- Description: *Calculates total pages based on the count of products at risk and the selected page size.* ```dax ROUNDUP( - DIVIDE(VALUE([seller_at_risk]), [table_items Value]), + DIVIDE(VALUE([product_at_risk]), [table_items Value]), 0 ) ``` @@ -58,7 +58,7 @@
- Measure Name: **`current_viewed_filter`** -- Description: *Conditional formatting logic used to highlight products who are both at risk and on the currently viewed page.* +- Description: *Logic used to highlight products that are at risk and on the currently viewed page.* ```dax VAR _is_at_risk = [product_at_risk] VAR _is_on_page = [item_rank_filter] @@ -74,7 +74,7 @@
- Measure Name: **`page_filter`** -- Description: *Global filter logic used to restrict the paging slicer to the actual range of data available.* +- Description: *Restricts the paging slicer to the available data range.* ```dax VAR _total_items = VALUE([product_at_risk]) VAR items_per_page = [table_items Value] @@ -94,11 +94,11 @@
-### Display Folder - Prerequisites Measures -*Fundamental building blocks and base aggregations required for higher-level logic. This section includes primary counts, temporal baselines, and the core outlier detection engine.* +### Base Measures +*Aggregations and counts used for temporal baselines and outlier detection.* - Measure Name: **`total_orders`** -- Description: *Aggregate count of order items within the filtered period.* +- Description: *Total count of order items within the selected period.* ```dax SUM(product_weekly_fact[weekly_order_count]) ``` @@ -106,7 +106,7 @@
- Measure Name: **`avg_delivery_delay`** -- Description: *Mean days of delay for delivered orders; negative values indicate early delivery.* +- Description: *Average delivery delay in days.* ```dax AVERAGE(product_weekly_fact[weekly_avg_delivery_delay]) ``` @@ -114,7 +114,7 @@
- Measure Name: **`total_cancelled`** -- Description: *Aggregate volume of orders cancelled during the reporting week.* +- Description: *Total volume of cancelled orders.* ```dax SUM(product_weekly_fact[weekly_cancelled_orders]) ``` @@ -122,7 +122,7 @@
- Measure Name: **`total_revenue`** -- Description: *Gross revenue generated across all product categories.* +- Description: *Gross revenue across all product categories.* ```dax SUM(product_weekly_fact[weekly_revenue]) ``` @@ -130,7 +130,7 @@
- Measure Name: **`avg_lead_time`** -- Description: *Mean duration from order creation to final delivery.* +- Description: *Average duration from order creation to delivery.* ```dax AVERAGE(product_weekly_fact[weekly_avg_lead_time]) ``` @@ -138,7 +138,7 @@
- Measure Name: **`current_lead_time`** -- Description: *Context-aware lead time for the specific date selection in visual filters.* +- Description: *Average lead time for the selected date.* ```dax CALCULATE([avg_lead_time], calendar_date[Date]) ``` @@ -146,7 +146,7 @@
- Measure Name: **`lead_time_4w_rolling_avg`** -- Description: *Smoothed fulfillment performance baseline using a 28-day trailing window.* +- Description: *Fulfillment performance baseline using a 28-day trailing window.* ```dax AVERAGEX( DATESINPERIOD('calendar_date'[Date], @@ -158,7 +158,7 @@
- Measure Name: **`is_outlier`** -- Description: *Core "Smoke Detector" logic; flags products where current lead time exceeds the 4-week baseline + 1 StdDev.* +- Description: *Flags products where lead time exceeds the 28-day baseline + 1 StdDev and the delay is > 0.5 days.* ```dax VAR slippage_delta = [current_lead_time] - [lead_time_4w_rolling_avg] VAR is_meaningful = IF(slippage_delta > 0.5, 1, 0) @@ -182,7 +182,7 @@
- Measure Name: **`lead_time_slippage`** -- Description: *Calculates the lead time "Gap" only for products currently flagged as outliers.* +- Description: *Quantifies lead-time delay in days for products identified as outliers.* ```dax VAR delta = [current_lead_time] - [lead_time_4w_rolling_avg] RETURN @@ -195,11 +195,11 @@
-### Display Folder - KPI Measures -*High-level performance indicators used in cards and executive summaries. These measures quantify the scale of friction by aggregating risk across products and revenue.* +### KPI Measures +*Performance indicators used to track fulfillment friction and financial exposure.* - Measure Name: `product_at_risk` -- Description: *Count of unique Product IDs currently flagged by the outlier detection logic.* +- Description: *Count of unique Product IDs identified as outliers.* ```dax COUNTROWS( FILTER( @@ -212,7 +212,7 @@
- Measure Name: `lead_time_volatility` -- Description: *Statistical unpredictability (StdDev) restricted only to the subset of at-risk products.* +- Description: *Statistical measure of lead-time variability for at-risk products.* ```dax VAR at_risk_products = FILTER( @@ -232,7 +232,7 @@
- Measure Name: `revenue_at_risk` -- Description: *Total financial value currently impacted by fulfillment friction and outliers.* +- Description: *Total revenue associated with fulfillment delays and outliers.* ```dax CALCULATE( [total_revenue], @@ -249,11 +249,11 @@
-### Display Folder - Visual Measures -*Support measures designed for specific charts and UI elements. This includes conditional formatting logic, dynamic axis scaling, and ranking filters for top-impact visuals.* +### Visual and UI Measures +*Measures supporting specific charts and UI elements.* - Measure Name: `product_weight` -- Description: *Retrieves the static gram weight for the current product in the visual context.* +- Description: *Weight (in grams) for the selected product.* ```dax LOOKUPVALUE( product_dim[product_weight_g], @@ -265,7 +265,7 @@
- Measure Name: `product_volume_categ` -- Description: *Retrieves the weight-based classification (Small, Heavy, etc.) for categorical grouping.* +- Description: *Weight-based classification (e.g., Small, Heavy) for grouping.* ```dax LOOKUPVALUE( product_dim[size_bucket], @@ -277,7 +277,7 @@
- Measure Name: `category_range_view` -- Description: *Dynamic Y-Axis buffer calculation for the Category Revenue bar chart.* +- Description: *Calculates the Y-Axis range for the Category Revenue chart.* ```dax VAR highest_value = MAXX( @@ -292,7 +292,7 @@
- Measure Name: `bubble_size_sensitivity` -- Description: *Exponentially scales marker sizes in scatter plots to enhance visibility of high-impact outliers.* +- Description: *Scales markers in scatter plots based on revenue risk.* ```dax [revenue_at_risk] ^ 1.5 ``` @@ -300,7 +300,7 @@
- Measure Name: `top_10_filter_color` -- Description: *Conditional formatting hex codes for highlighting the Top 10 revenue-at-risk outliers.* +- Description: *Conditional formatting for the top 10 revenue-at-risk outliers.* ```dax VAR CurrentRank = RANKX( @@ -317,26 +317,8 @@
-- Measure Name: `top_20_filter_color` -- Description: *Conditional formatting hex codes for highlighting the Top 20 revenue-at-risk outliers.* - ```dax - VAR CurrentRank = - RANKX( - ALLSELECTED(product_dim[product_id_int]), - [revenue_at_risk], - , - DESC, - Dense - ) - - RETURN - IF(CurrentRank <= 20, "#5C86A8", "#999999") - ``` - -
- - Measure Name: `top_10_product_id` -- Description: *Visual-level filter logic that enforces a strict 10-row limit regardless of ranking ties.* +- Description: *Visual filter enforcing a 10-row limit for the top outliers.* ```dax VAR _rank = RANK( @@ -361,34 +343,8 @@
-- Measure Name: `top_20_product_id` -- Description: *Visual-level filter logic that enforces a strict 20-row limit regardless of ranking ties.* - ```dax - VAR _rank = - RANK( - DENSE, - FILTER( - ALLSELECTED(product_dim[product_id_int]), - [product_at_risk] = 1 - ), - ORDERBY([revenue_at_risk], DESC) - ) - - VAR _top_20 = - IF( - _rank <= 20 && - [product_at_risk] = 1, - 1, BLANK() - ) - - RETURN - _top_20 - ``` - -
- - Measure Name: `format_product_id_count` -- Description: *String-formatted count of at-risk products for use in text-based slicers.* +- Description: *Formatted count of at-risk products for slicers.* ```dax VAR _distinct_count = [product_at_risk] VAR format_count = @@ -404,7 +360,7 @@
- Measure Name: `format_total_orders` -- Description: *Smart-formatted order volume (K/M) for clean label display.* +- Description: *Formatted order volume (K/M) for display.* ```dax VAR order_at_risk = CALCULATE([total_orders], @@ -427,7 +383,7 @@
- Measure Name: `last_source_update` -- Description: *Displays the most recent data sync timestamp from the BigQuery source.* +- Description: *Timestamp of the most recent data sync.* ```dax MAX(source_last_update_time[Last_Update_Time]) ``` @@ -438,11 +394,11 @@
-### Display Folder: Tooltip Visual Measures -*Context-specific measures optimized for tooltips. These provide granular details, such as segment breakdowns by weight bucket and formatted currency values, to enhance on-hover interactivity.* +### Tooltip Measures +*Measures optimized for formatting and tooltip interactivity.* - Measure Name: `tooltip_format_revenue` -- Description: Precision formatting for revenue metrics specifically for hover-over details. +- Description: *Formatted revenue metrics for tooltips.* ```dax IF( [revenue_at_risk] < 1000000, @@ -454,7 +410,7 @@
- Measure Name: `tooltip_oversize` -- Description: *Count of outliers within the "Oversize" weight bucket for tooltip summaries.* +- Description: *Count of outliers in the "Oversize" weight bucket.* ```dax COUNTROWS( FILTER( @@ -468,7 +424,7 @@
- Measure Name: `tooltip_heavy` -- Description: *Count of outliers within the "Heavy" weight bucket for tooltip summaries.* +- Description: *Count of outliers in the "Heavy" weight bucket.* ```dax COUNTROWS( FILTER( @@ -482,7 +438,7 @@
- Measure Name: `tooltip_standard` -- Description: *Count of outliers within the "Standard" weight bucket for tooltip summaries.* +- Description: *Count of outliers in the "Standard" weight bucket.* ```dax COUNTROWS(FILTER(VALUES(product_dim), product_dim[size_bucket] = "Standard" && [is_outlier] = 1)) ``` @@ -490,7 +446,7 @@
- Measure Name: `tooltip_small` -- Description: *Count of outliers within the "Small" weight bucket for tooltip summaries.* +- Description: *Count of outliers in the "Small" weight bucket.* ```dax COUNTROWS( FILTER( @@ -504,7 +460,7 @@
- Measure Name: `tooltip_week_start` -- Description: *Provides the specific week start date context during bar/line hover.* +- Description: *Week start date context for trend line tooltips.* ```dax SELECTEDVALUE(calendar_date[Week Start Date]) ``` @@ -512,7 +468,7 @@
- Measure Name: `tooltip_product_id` -- Description: *Displays the specific Product ID associated with a scatter plot data point.* +- Description: *Retrieves the Product ID for tooltip labels.* ```dax SELECTEDVALUE(product_dim[product_id_int]) - ``` \ No newline at end of file + ``` diff --git a/power_bi/docs/product_friction/operational_guide.md b/power_bi/docs/product_friction/operational_guide.md index 3264897..04b83b1 100644 --- a/power_bi/docs/product_friction/operational_guide.md +++ b/power_bi/docs/product_friction/operational_guide.md @@ -1,36 +1,35 @@ -# Strategic & Operational Guide: Product Friction Monitor +# Operational Guide: Product Friction Monitor ## Strategic Intent -The **Product Friction Monitor** is an early-warning system designed to identify physical and structural fulfillment bottlenecks driven by product specifications. The proactive "friction" detection, allows operations teams to intervene before systemic delays impact customer satisfaction and revenue. +The **Product Friction Monitor** identifies physical and structural fulfillment bottlenecks caused by product specifications. This monitoring allows operations teams to intervene before delays impact customer satisfaction and revenue. -## Navigation & Interface Guide -The dashboard utilizes an interactive "Head-Up Display" (HUD) architecture for rapid diagnostic review and **Dynamic Sensitivity Calibration**. +## Interface and Navigation +The dashboard uses an interactive interface for diagnostic review and sensitivity adjustment. -* **Information Pane :** Click the "i" icon in the top header to view the performance legend and KPI interpretation guides. -* **Sensitivity Controls (Slicer Pane):** - * **Slippage Threshold:** Adjust the minimum "speed loss *in Lead Time*" (0.5 to 5.0 days) required to trigger an alert. - * **Statistical Boundary:** Adjust the Standard Deviation multiplier (1.0 to 3.0) to tighten or loosen the detection logic. -* **Paging Controls:** Navigate large lists of at-risk products using the "Page" and "Items per Page" (Inside Filter Pane) selectors. -* **Closing Panes:** To return to the main view, click the **"X"** button on the pane or click anywhere in the darkened workspace background. +* **Information Pane:** Click the **"i"** icon to view the performance legend and KPI definitions. +* **Sensitivity Controls:** + * **Slippage Threshold:** Adjust the minimum lead-time delay (0.5 to 5.0 days) required to trigger an alert. + * **Statistical Boundary:** Adjust the Standard Deviation multiplier (1.0 to 3.0) to modify the detection logic. +* **Paging Controls:** Navigate the list of at-risk products using the "Page" and "Items per Page" (located in the Filter Pane) selectors. +* **Closing Panes:** Click the **"X"** button on the pane or click the workspace background to return to the main view. -## Smoke Detector Configuration (KPI Thresholds) +## KPI Thresholds | KPI | Green (Stable) | Orange (Warning) | Red (Critical) | | :--- | :--- | :--- | :--- | | **Products at Risk** | 0 – 50 | 51 – 150 | > 150 | | **Lead Time Volatility** | < 10% | 10% – 25% | > 25% | | **Revenue at Risk** | < 5% of Total Revenue | 6% – 15% | > 15% | -## Decision Matrix -| Visual | Signal to Watch | Response Action | +## Decision Support +| Visual | Indicator | Recommended Action | | :--- | :--- | :--- | -| **Lead Time Trend** | Rolling average drifting upward. | Systemic slowdown; check for carrier-wide issues. | -| **Breaking Point Curve** | Outliers clustering above specific weights. | Update shipping terms; route to specialized freight. | -| **Revenue at Risk** | High revenue tied to Tier 3 suppliers. | High-risk cash flow; prioritize Tier 3 inventory audits. | -| **Action List** | Specific Product IDs with High Z-Scores. | Immediate intervention; contact supplier/courier. | - +| **Lead Time Trend** | Rolling average increasing. | Investigate systemic slowdowns or carrier-wide issues. | +| **Weight Analysis** | Outliers clustering in high-weight categories. | Update shipping terms or route to specialized freight services. | +| **Revenue at Risk** | High revenue risk associated with specific suppliers. | Prioritize inventory audits for high-risk suppliers. | +| **Action List** | Product IDs with high statistical deviation (Z-Score). | Initiate direct follow-up with the supplier or courier. | ## Operational Workflow -1. **Daily Pulse:** Check **Network Stability** (Details via Info Pane) and the 3 core KPIs. -2. **Impact Check:** Use the **Bar Chart** to see if a specific Category (e.g., Electronics) is driving the Revenue at Risk. -3. **Root Cause:** Use the **Scatter Plot** to see if the friction is weight-driven (e.g., Oversize items). -4. **Execute:** Use the **Intervention Action List** to identify the specific Product IDs requiring follow-up. +1. **Performance Check:** Review core KPIs to assess overall network stability. +2. **Category Review:** Use the **Revenue at Risk Bar Chart** to identify specific product categories driving financial exposure. +3. **Friction Analysis:** Use the **Scatter Plot** to determine if delays are related to product weight or volume (e.g., oversize items). +4. **Targeted Follow-up:** Use the **Intervention Action List** to identify the exact Product IDs requiring attention. diff --git a/power_bi/docs/product_friction/technical_architecture.md b/power_bi/docs/product_friction/technical_architecture.md new file mode 100644 index 0000000..36f37a9 --- /dev/null +++ b/power_bi/docs/product_friction/technical_architecture.md @@ -0,0 +1,75 @@ +# Technical Documentation: Product Friction Monitor + +## Technical Architecture +* **Data Source:** Google BigQuery (`published_product_dim`, `published_product_weekly_fact`). +* **Update Logic:** The `Source Last Update` metric tracks data refresh timestamps from the `source_last_update` BigQuery metadata table. +* **Storage Mode:** Import Mode (Power BI). +* **Weight Classifications:** + * Small: < 1kg + * Standard: < 5kg + * Heavy: < 10kg + * Oversize: > 10kg + +## Outlier Detection Logic +The dashboard identifies fulfillment risk using a statistical filter based on two parameters: +- **Slippage Threshold:** Uses the `[slippage_threshold Value]` parameter (0.5 – 5.0 days) to define the minimum lead-time delay required for an alert. +- **Statistical Boundary:** Uses the `[std_dev_boundary Value]` parameter (1.0 – 3.0) as a multiplier for the standard deviation boundary. + +## DAX Data Dictionary +*(See [`dax_dictionary.md`](../product_friction/dax_dictionary.md) for full expressions)* + +### Paging and Navigation + +| **Measure Name** | Description | +| --- | --- | +| **item_rank** | Ranks products by slippage intensity for sorted lists. | +| **item_rank_filter** | Identifies products within the current page range. | +| **max_page** | Calculates total pages based on the count of products at risk. | +| **page_filter** | Restricts the paging slicer range. | +| **current_viewed_filter** | Highlights items on the current page. | + +### Base Measures + +| **Measure Name** | Description | +| --- | --- | +| **total_orders** | Total count of order items. | +| **avg_delivery_delay** | Mean days of delay for delivered orders. | +| **total_cancelled** | Total volume of cancelled orders. | +| **total_revenue** | Gross revenue across all categories. | +| **avg_lead_time** | Mean duration from order creation to delivery. | +| **lead_time_4w_rolling_avg** | 28-day trailing baseline for fulfillment performance. | +| **is_outlier** | Flags products deviating from historical performance baselines. | +| **current_lead_time** | Lead time for the selected date. | +| **lead_time_slippage** | Magnitude of lead-time delay (days) for at-risk products. | + +### Key Performance Indicators (KPI) + +| **Measure Name** | Description | +| --- | --- | +| **product_at_risk** | Count of unique products flagged for intervention. | +| **lead_time_volatility** | Statistical measure of delivery date variability. | +| **revenue_at_risk** | Total revenue associated with fulfillment delays. | + +### Visual and UX Measures + +| **Measure Name** | Description | +| --- | --- | +| **product_weight** | Product weight in grams. | +| **product_volume_categ** | Weight-based classification for grouping. | +| **category_range_view** | Logic for the Category Revenue chart Y-axis. | +| **bubble_size_sensitivity** | Scales scatter plot markers based on revenue risk. | +| **format_product_id_count** | Formatted count of at-risk products. | +| **format_total_orders** | Formatted order volume for display. | +| **last_source_update** | Timestamp of the most recent data refresh. | + +### Tooltip Measures + +| **Measure Name** | Description | +| --- | --- | +| **tooltip_format_revenue** | Formatted revenue metrics for tooltips. | +| **tooltip_oversize** | Outlier count within the "Oversize" weight bucket. | +| **tooltip_heavy** | Outlier count within the "Heavy" weight bucket. | +| **tooltip_standard** | Outlier count within the "Standard" weight bucket. | +| **tooltip_small** | Outlier count within the "Small" weight bucket. | +| **tooltip_week_start** | Date context for trend line tooltips. | +| **tooltip_product_id** | Product ID associated with a data point. | diff --git a/power_bi/docs/product_friction/technical_architecutre.md b/power_bi/docs/product_friction/technical_architecutre.md deleted file mode 100644 index 61a3ce2..0000000 --- a/power_bi/docs/product_friction/technical_architecutre.md +++ /dev/null @@ -1,71 +0,0 @@ -# Technical Documentation: Product Friction Monitor - -## Technical Architecture -* **Data Source:** Google BigQuery (`published_product_dim`, `published_product_weekly_fact`). -* **Update Logic:** The `Source Last Update` metric is driven by the BigQuery schema metadata, (`source_last_update` table), ensuring users know the exact freshness of the snapshot. -* **Storage Mode:** Import Mode (Power BI). -* **Size Buckets:** Standardized weight classifications (Small < 1kg, Standard < 5kg, Heavy < 10kg, Oversize > 10kg). - -## Dynamic Outlier Detection (The "Smoke Detector") -The dashboard identifies risk using a dual-layer **parameter-driven** statistical filter (`is_outlier` measure): -- **Dynamic Slippage Threshold:** Uses the `[slippage_threshold Value]` parameter (range 0.5 – 5.0 days) to set the minimum speed loss *(Lead Time)* required for an alert. -- **Dynamic Statistical Boundary:** Uses the `[std_dev_boundary Value]` parameter (range 1.0 – 3.0) as a multiplier for the standard deviation boundary. - -## DAX Data Dictionary -*(See [`dax_data_dictionary.md`](../product_friction/dax_dictionary.md) for full expressions)* - -### Measures Group: Paging & Navigation - -| **Measure Name** | Description | -| --- | --- | -| **item_rank** | Ranks products by slippage intensity for sorted lists. | -| **item_rank_filter** | Logic for identifying products within the current page range. | -| **max_page** | Dynamic calculation of total pages based on at-risk count. | -| **page_filter** | Global logic used to restrict the paging slicer range. | -| **current_viewed_filter** | Color-based logic for highlighting current page items. | - -### Measures Group: Prerequisites Measures (Base Measures) - -| **Measure Name** | Description | -| --- | --- | -| **total_orders** | Aggregate count of order items within the filtered period. | -| **avg_delivery_delay** | Mean days of delay for delivered orders. | -| **total_cancelled** | Aggregate volume of cancelled orders. | -| **total_revenue** | Gross revenue generated across all categories. | -| **avg_lead_time** | Mean duration from order creation to delivery. | -| **lead_time_4w_rolling_avg** | 28-day trailing baseline for fulfillment performance. | -| **is_outlier** | Core logic flagging products deviating from historical norms. | -| **current_lead_time** | Context-aware lead time for the specific date selection. | -| **lead_time_slippage** | The magnitude of speed loss (days) for at-risk products. | - -### Measures Group: Key Performance Indicator (KPI) - -| **Measure Name** | Description -| --- | --- | -| **product_at_risk** | Count of unique Product IDs flagged for intervention. | -| **lead_time_volatility** | Statistical measure of delivery date unpredictability. | -| **revenue_at_risk** | Total financial value currently impacted by fulfillment friction. | - -### Measures Group: Visual Measures (UX) - -| **Measure Name** | Description | -| --- | --- | -| **product_weight** | Static gram weight for the product in visual context. | -| **product_volume_categ** | Weight-based classification (Small, Heavy, etc.) for grouping. | -| **category_range_view** | Dynamic Y-Axis buffer for the Category Revenue chart. | -| **bubble_size_sensitivity** | Exponential scaling for scatter plot markers. | -| **format_product_id_count** | String-formatted count of at-risk products for slicers. | -| **format_total_orders** | Smart-formatted order volume (K/M) for display. | -| **last_source_update** | Displays the most recent data sync timestamp. | - -### Measures Group: Visual Tooltip Measures (UX) - -| **Measure Name** | Description | -| --- | --- | -| **tooltip_format_revenue** | Precision formatting for revenue metrics in tooltips. | -| **tooltip_oversize** | Outlier count within the "Oversize" weight bucket. | -| **tooltip_heavy** | Outlier count within the "Heavy" weight bucket. | -| **tooltip_standard** | Outlier count within the "Standard" weight bucket. | -| **tooltip_small** | Outlier count within the "Small" weight bucket. | -| **tooltip_week_start** | Temporal context for trend line hover details. | -| **tooltip_product_id** | Specific Product ID associated with a data point. |