From 7f3edcd923f57e1c16980ef023e79164bb4fc1bc Mon Sep 17 00:00:00 2001 From: Deep Patel Date: Fri, 8 May 2026 16:54:18 -0700 Subject: [PATCH] docs: add Apache Pinot sink connector documentation Adds a dedicated configuration page for the Pinot OFFLINE segment sink connector and links it from the top-level connectors overview. Co-Authored-By: Claude Sonnet 4.6 Signed-off-by: Deep Patel --- .../docs/configuration-engine/pinot.md | 43 +++++++++++++++++++ documentation/docs/connectors.md | 1 + 2 files changed, 44 insertions(+) create mode 100644 documentation/docs/configuration-engine/pinot.md diff --git a/documentation/docs/configuration-engine/pinot.md b/documentation/docs/configuration-engine/pinot.md new file mode 100644 index 000000000..0793fa730 --- /dev/null +++ b/documentation/docs/configuration-engine/pinot.md @@ -0,0 +1,43 @@ +# Apache Pinot Engine Configuration + +Apache Pinot is a real-time OLAP datastore designed for low-latency analytical queries at scale. DataSQRL writes to Pinot OFFLINE tables by building and uploading segments via the Pinot controller REST API. + +## Connector Options + +| Option | Required | Default | Description | +|---|---|---|---| +| `controller.url` | yes | — | HTTP URL of the Pinot controller, e.g. `http://pinot-controller:9000` | +| `table.name` | yes | — | Name of the target OFFLINE table (without the `_OFFLINE` suffix) | +| `segment.flush.rows` | no | `500000` | Number of rows to buffer before flushing a segment; also flushes on every Flink checkpoint | + +## DDL Example + +```sql +CREATE TABLE OrderMetrics ( + order_id BIGINT, + customer STRING, + total DECIMAL(10, 2), + ordered_at TIMESTAMP_LTZ(3) +) WITH ( + 'connector' = 'pinot', + 'controller.url' = 'http://pinot-controller:9000', + 'table.name' = 'OrderMetrics', + 'segment.flush.rows' = '500000' +); +``` + +## Prerequisites + +The target Pinot schema and OFFLINE table must exist before the pipeline starts. Create them once via the Pinot controller REST API or the Pinot console — DataSQRL does not create Pinot schemas or tables automatically. + +## Delivery Guarantee + +At-least-once. Segments are uploaded on every Flink checkpoint and when `segment.flush.rows` is reached. Enable Pinot's built-in deduplication if exactly-once semantics are required. + +## Usage Notes + +- Only **sink** (write) is supported; Pinot cannot be used as a Flink source in DataSQRL +- Column names in the DDL must match the field names in your Pinot schema +- `TIMESTAMP_LTZ` columns are stored as epoch milliseconds (`LONG`) in Pinot; define the corresponding field with `dataType: LONG` in the Pinot schema +- `DECIMAL` columns are converted to `DOUBLE` +- The connector JAR (`pinot-connector-.jar`) must be present in the Flink `lib/` directory or on the job classpath diff --git a/documentation/docs/connectors.md b/documentation/docs/connectors.md index 62bd740b9..ea78bfd8d 100644 --- a/documentation/docs/connectors.md +++ b/documentation/docs/connectors.md @@ -7,6 +7,7 @@ DataSQRL uses Apache Flink connectors and formats. To find a connector for your * **[The Official Apache Flink connectors](https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/connectors/table/overview/)** for Kafka, Filesystem, Kinesis, and many more. * **DataSQRL provided connectors** * **[Safe Kafka Source Connectors](https://github.com/DataSQRL/flink-sql-runner?tab=readme-ov-file#dead-letter-queue-support-for-kafka-sources)** which support dead-letter queues for faulty messages. + * **[Apache Pinot Sink Connector](configuration-engine/pinot)** for writing to Apache Pinot OFFLINE tables with at-least-once delivery. * **[Apache Flink CDC connectors](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.5/docs/connectors/flink-sources/overview/)** for Postgres, MySQL, Oracle, SqlServer, and other databases. ## Connector Management