Short description and motivation for the proposed feature
Summary
The Data Catalog feature currently supports 7 integrations (Snowflake, Salesforce, BigQuery, MS SQL Server, MySQL, Oracle, PostgreSQL), but ClickHouse — one of the most widely used columnar/time-series databases — is not supported. This limits the ability of agents to accurately discover and reason about schema metadata when ClickHouse is used as a data source.
Motivation
When using MindsDB as an AI-powered query engine with ClickHouse as the backend data source, agents lack the metadata context (table structure, column types, statistics, constraints) needed to generate accurate and consistent SQL queries.
This results in:
- Inaccurate schema inference — agents cannot discover table/column structure automatically
- Reduced query reliability — without metadata, generated queries are prone to syntax and semantic errors
- Poor ontology/entity modeling — when MindsDB is used to instantiate domain models over ClickHouse data, the absence of schema introspection makes it difficult to maintain data accuracy and consistency
ClickHouse is widely adopted for analytics, time-series, and OLAP workloads. First-class Data Catalog support would significantly improve the agent experience for this integration.
Current State
The ClickHouse handler (mindsdb/integrations/handlers/clickhouse_handler/clickhouse_handler.py):
- Inherits from
DatabaseHandler (not MetaDatabaseHandler)
- Only implements basic
get_tables() (via SHOW TABLES) and get_columns() (via DESCRIBE)
- Does not implement any of the
meta_get_* methods required for Data Catalog integration
Video or screenshots
No response
Describe some possible solutions
Upgrade the ClickHouse handler to inherit from MetaDatabaseHandler and implement the required metadata methods using ClickHouse's system.* tables:
| Method |
ClickHouse Source |
meta_get_tables() |
system.tables (name, engine, total_rows, total_bytes, comment) |
meta_get_columns() |
system.columns (name, type, default_kind, default_expression, comment) |
meta_get_column_statistics() |
Per-table queries using uniqExact, min, max, countIf against actual data |
meta_get_primary_keys() |
Return empty — ClickHouse MergeTree uses ORDER BY keys, not traditional PKs |
meta_get_foreign_keys() |
Return empty — ClickHouse does not enforce foreign key constraints |
meta_get_handler_info() |
Return ClickHouse-specific SQL dialect guidance |
Key Implementation Notes
-
meta_get_tables() — Use system.tables which provides database, name, engine, total_rows, total_bytes, and comment.
-
meta_get_columns() — Use system.columns which provides table, name, type, default_kind,
default_expression, comment, and is_in_primary_key.
-
meta_get_column_statistics() — ClickHouse lacks PostgreSQL-style pg_stats. Statistics must be computed per-table using aggregate functions (uniqExact, min, max, countIf(x, IS NULL)). Consider implementing meta_get_column_statistics_for_table() to leverage the base class's concurrent execution pattern.
-
Primary/Foreign Keys — ClickHouse does not enforce traditional primary/foreign key constraints. These methods should return an empty Response(RESPONSE_TYPE.TABLE, pd.DataFrame()).
-
Handler Info — Provide guidance on ClickHouse-specific SQL dialect (e.g., ENGINE clause requirements, MergeTree family, lack of UPDATE/DELETE in older versions, array functions, etc.).
Files to Modify
mindsdb/integrations/handlers/clickhouse_handler/clickhouse_handler.py — Change base class, add meta_get_* methods
mindsdb/integrations/handlers/clickhouse_handler/tests/test_clickhouse_handler.py — Add tests for metadata methods
docs/data_catalog/integrations/overview.mdx — Add ClickHouse to the supported integrations list
Impact
This change is backward compatible — existing ClickHouse handler functionality remains unchanged. The new methods are additive and only invoked when the Data Catalog feature is enabled.
Environment
- MindsDB version: latest (main branch)
- ClickHouse versions: 22.x+ (system tables are stable since this version)
Anything else?
No response
Short description and motivation for the proposed feature
Summary
The Data Catalog feature currently supports 7 integrations (Snowflake, Salesforce, BigQuery, MS SQL Server, MySQL, Oracle, PostgreSQL), but ClickHouse — one of the most widely used columnar/time-series databases — is not supported. This limits the ability of agents to accurately discover and reason about schema metadata when ClickHouse is used as a data source.
Motivation
When using MindsDB as an AI-powered query engine with ClickHouse as the backend data source, agents lack the metadata context (table structure, column types, statistics, constraints) needed to generate accurate and consistent SQL queries.
This results in:
ClickHouse is widely adopted for analytics, time-series, and OLAP workloads. First-class Data Catalog support would significantly improve the agent experience for this integration.
Current State
The ClickHouse handler (
mindsdb/integrations/handlers/clickhouse_handler/clickhouse_handler.py):DatabaseHandler(notMetaDatabaseHandler)get_tables()(viaSHOW TABLES) andget_columns()(viaDESCRIBE)meta_get_*methods required for Data Catalog integrationVideo or screenshots
No response
Describe some possible solutions
Upgrade the ClickHouse handler to inherit from
MetaDatabaseHandlerand implement the required metadata methods using ClickHouse'ssystem.*tables:meta_get_tables()system.tables(name, engine, total_rows, total_bytes, comment)meta_get_columns()system.columns(name, type, default_kind, default_expression, comment)meta_get_column_statistics()uniqExact,min,max,countIfagainst actual datameta_get_primary_keys()ORDER BYkeys, not traditional PKsmeta_get_foreign_keys()meta_get_handler_info()Key Implementation Notes
meta_get_tables()— Usesystem.tableswhich providesdatabase,name,engine,total_rows,total_bytes, andcomment.meta_get_columns()— Usesystem.columnswhich providestable,name,type,default_kind,default_expression,comment, andis_in_primary_key.meta_get_column_statistics()— ClickHouse lacks PostgreSQL-stylepg_stats. Statistics must be computed per-table using aggregate functions (uniqExact,min,max,countIf(x, IS NULL)). Consider implementingmeta_get_column_statistics_for_table()to leverage the base class's concurrent execution pattern.Primary/Foreign Keys — ClickHouse does not enforce traditional primary/foreign key constraints. These methods should return an empty
Response(RESPONSE_TYPE.TABLE, pd.DataFrame()).Handler Info — Provide guidance on ClickHouse-specific SQL dialect (e.g.,
ENGINEclause requirements,MergeTreefamily, lack ofUPDATE/DELETEin older versions, array functions, etc.).Files to Modify
mindsdb/integrations/handlers/clickhouse_handler/clickhouse_handler.py— Change base class, addmeta_get_*methodsmindsdb/integrations/handlers/clickhouse_handler/tests/test_clickhouse_handler.py— Add tests for metadata methodsdocs/data_catalog/integrations/overview.mdx— Add ClickHouse to the supported integrations listImpact
This change is backward compatible — existing ClickHouse handler functionality remains unchanged. The new methods are additive and only invoked when the Data Catalog feature is enabled.
Environment
Anything else?
No response