AGP is a system for managing the asynchronous execution of SQL queries against a ClickHouse cluster. It efficiently handles query orchestration, execution, and result management, making it ideal for long-running analytical workloads.
AGP relies on PostgreSQL for query orchestration and ClickHouse for execution. The provided Docker Compose file builds AGP and starts all required dependencies.
docker compose up -dAGP orchestrates SQL query execution asynchronously, ensuring efficient resource utilization.
- Execution: A container for an SQL query scheduled for execution.
- Queueing: Executions are created via the API and stored in PostgreSQL.
- Workers: A fleet of workers processes executions against ClickHouse.
- Tracking: The API provides status updates (PENDING, RUNNING, CANCELED, FAILED, SUCCEEDED), query progress, and result availability.
- Result Storage: Successfully completed executions are stored in an object store for retrieval until expiration.
To prevent redundant execution of identical queries, AGP supports query deduplication:
- A query_id can be assigned at execution creation.
- At any given time, only one in-flight Execution (PENDING or RUNNING) exists per
query_id. - If no
query_idis specified, it defaults to a hash of the SQL query.
AGP supports long-running analytical queries where some degree of data staleness is acceptable:
- Execution results are stored in an object store or local filesystem.
- The API allows listing past executions for a given
query_idand retrieving their results. - Users can flexibly choose to use recent results instead of re-executing queries.
AGP dynamically manages execution priority based on user typology:
- Customizable Tiers: Executions are assigned a tier at creation (a string value).
- Resource Allocation: Higher-tier users (e.g., experienced analysts) may access more CPU and longer query times, while lower-tier users have more restricted execution limits.
- Flexible Worker Configurations: Different workers handle different tiers and can be configured with distinct ClickHouse settings or separate clusters.
The API server allows users to:
- Create executions
- List executions
- Poll execution statuses
- Retrieve results
Workers are responsible for:
- Polling PostgreSQL for new executions
- Running queries against the ClickHouse cluster
- Updating execution status and storing results
The Bookkeeper maintains system stability by:
- Detecting and recovering from dead workers to prevent stuck executions
- Expiring old executions and their results
AGP's API is built with OpenAPI, and the documentation can be viewed using:
go run github.com/swaggest/swgui/cmd/swgui internal/api/v1/async/api.yamlAlternatively, the API server exposes an embedded Swagger UI at: http://localhost:8888/v1/async/docs/