Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions documentation/docs/configuration-engine/pinot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Apache Pinot Engine Configuration

Apache Pinot is a real-time OLAP datastore designed for low-latency analytical queries at scale. DataSQRL writes to Pinot OFFLINE tables by building and uploading segments via the Pinot controller REST API.

## Connector Options

| Option | Required | Default | Description |
|---|---|---|---|
| `controller.url` | yes | — | HTTP URL of the Pinot controller, e.g. `http://pinot-controller:9000` |
| `table.name` | yes | — | Name of the target OFFLINE table (without the `_OFFLINE` suffix) |
| `segment.flush.rows` | no | `500000` | Number of rows to buffer before flushing a segment; also flushes on every Flink checkpoint |

## DDL Example

```sql
CREATE TABLE OrderMetrics (
order_id BIGINT,
customer STRING,
total DECIMAL(10, 2),
ordered_at TIMESTAMP_LTZ(3)
) WITH (
'connector' = 'pinot',
'controller.url' = 'http://pinot-controller:9000',
'table.name' = 'OrderMetrics',
'segment.flush.rows' = '500000'
);
```

## Prerequisites

The target Pinot schema and OFFLINE table must exist before the pipeline starts. Create them once via the Pinot controller REST API or the Pinot console — DataSQRL does not create Pinot schemas or tables automatically.

## Delivery Guarantee

At-least-once. Segments are uploaded on every Flink checkpoint and when `segment.flush.rows` is reached. Enable Pinot's built-in deduplication if exactly-once semantics are required.

## Usage Notes

- Only **sink** (write) is supported; Pinot cannot be used as a Flink source in DataSQRL
- Column names in the DDL must match the field names in your Pinot schema
- `TIMESTAMP_LTZ` columns are stored as epoch milliseconds (`LONG`) in Pinot; define the corresponding field with `dataType: LONG` in the Pinot schema
- `DECIMAL` columns are converted to `DOUBLE`
- The connector JAR (`pinot-connector-<version>.jar`) must be present in the Flink `lib/` directory or on the job classpath
1 change: 1 addition & 0 deletions documentation/docs/connectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ DataSQRL uses Apache Flink connectors and formats. To find a connector for your
* **[The Official Apache Flink connectors](https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/connectors/table/overview/)** for Kafka, Filesystem, Kinesis, and many more.
* **DataSQRL provided connectors**
* **[Safe Kafka Source Connectors](https://github.com/DataSQRL/flink-sql-runner?tab=readme-ov-file#dead-letter-queue-support-for-kafka-sources)** which support dead-letter queues for faulty messages.
* **[Apache Pinot Sink Connector](configuration-engine/pinot)** for writing to Apache Pinot OFFLINE tables with at-least-once delivery.
* **[Apache Flink CDC connectors](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.5/docs/connectors/flink-sources/overview/)** for Postgres, MySQL, Oracle, SqlServer, and other databases.

## Connector Management
Expand Down
Loading