From f0deac19adb6d6574204c731e920acb4b5a4fc92 Mon Sep 17 00:00:00 2001 From: Bryan Melvida <126201239+BLMgithub@users.noreply.github.com> Date: Tue, 19 May 2026 12:30:32 +0800 Subject: [PATCH] docs: update broken link paths --- README.md | 3 +++ assets/benchmarks/polars/README.md | 2 +- data/README.md | 2 +- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index a30e98b..46fe833 100644 --- a/README.md +++ b/README.md @@ -64,6 +64,8 @@ By leveraging the Polars Rust engine (Lazy API), the system achieves near-optima | 40M Snapshot (8GB / 4 vCPU) | | :---: | | ![engine-performance-8gb](assets/screenshots/engine-performance-8gb-4cpu.png) | +> Benchmark data: [`40m_stats_log.csv`](assets/benchmarks/polars/) +> Dataset : [`Dataset Information`](data/) | Metric | Data | |:---|:---| @@ -72,6 +74,7 @@ By leveraging the Polars Rust engine (Lazy API), the system achieves near-optima | Efficiency (Processing) | ~307k Rows / Second | | Total Runtime (Wall-Clock) | 130 Seconds | + * **Maximized Memory Density:** The **Primitive Integer Pipeline** allows a ~5.34GB analytical model to process within the 8GB RAM limit by shrinking join-key overhead by ~16x. * **Near-Linear Performance Scaling:** The engine saturates available vCPUs, yielding high throughput during streaming execution. * **Zero-Idle Economics:** 100% serverless execution ensures zero billable time during idle periods. diff --git a/assets/benchmarks/polars/README.md b/assets/benchmarks/polars/README.md index 4390e09..fbac0ef 100644 --- a/assets/benchmarks/polars/README.md +++ b/assets/benchmarks/polars/README.md @@ -1,6 +1,6 @@ # Measurement Methodology -This section details the methodology used to capture the memory metrics in the [`GCP Stress-Test Metrics (Scaling Efficiency)`](../../../README.md#gcp-stress-test-metrics-scaling-efficiency) +This section details the methodology used to capture the memory metrics in the [`GCP Stress-Test Metrics (Scaling Efficiency)`](../../../README.md###gcp-stress-test-metrics-scaling-efficiency) The telemetry logger below was added to the orchestrator for a specific benchmarking run. diff --git a/data/README.md b/data/README.md index 85adc48..be904eb 100644 --- a/data/README.md +++ b/data/README.md @@ -3,7 +3,7 @@ This directory serves as the local state provider for the pipeline when executing in a non-cloud environment. It mimics the structure of the Google Cloud Storage (GCS) buckets. ## Synthetic Dataset -To replicate the high-volume environment described in the [GCP Stress-Test Metrics (Scaling Efficiency)](/README.md#gcp-stress-test-metrics-scaling-efficiency) section, you can download the 40M-row synthetic dataset here: [**Kaggle Dataset Link**](https://www.kaggle.com/datasets/melvidabryan/e-commerce-synthetic-dataset) +To replicate the high-volume environment described in the [GCP Stress-Test Metrics (Scaling Efficiency)](/README.md###GCP-Stress-Test-Metrics) section, you can download the 40M-row synthetic dataset here: [**Kaggle Dataset Link**](https://www.kaggle.com/datasets/melvidabryan/e-commerce-synthetic-dataset) > *Note: This upload contains the **Contracted Version** of the dataset. The original "Raw" state, totaling approximately ~26GB of unrefined CSVs was omitted to prioritize transfer efficiency.*