Skip to content

Add Data Catalog Support for ClickHouse Integration #12408

@toddyLee

Description

@toddyLee

Short description and motivation for the proposed feature

Summary

The Data Catalog feature currently supports 7 integrations (Snowflake, Salesforce, BigQuery, MS SQL Server, MySQL, Oracle, PostgreSQL), but ClickHouse — one of the most widely used columnar/time-series databases — is not supported. This limits the ability of agents to accurately discover and reason about schema metadata when ClickHouse is used as a data source.

Motivation

When using MindsDB as an AI-powered query engine with ClickHouse as the backend data source, agents lack the metadata context (table structure, column types, statistics, constraints) needed to generate accurate and consistent SQL queries.
This results in:

  • Inaccurate schema inference — agents cannot discover table/column structure automatically
  • Reduced query reliability — without metadata, generated queries are prone to syntax and semantic errors
  • Poor ontology/entity modeling — when MindsDB is used to instantiate domain models over ClickHouse data, the absence of schema introspection makes it difficult to maintain data accuracy and consistency

ClickHouse is widely adopted for analytics, time-series, and OLAP workloads. First-class Data Catalog support would significantly improve the agent experience for this integration.

Current State

The ClickHouse handler (mindsdb/integrations/handlers/clickhouse_handler/clickhouse_handler.py):

  • Inherits from DatabaseHandler (not MetaDatabaseHandler)
  • Only implements basic get_tables() (via SHOW TABLES) and get_columns() (via DESCRIBE)
  • Does not implement any of the meta_get_* methods required for Data Catalog integration

Video or screenshots

No response

Describe some possible solutions

Upgrade the ClickHouse handler to inherit from MetaDatabaseHandler and implement the required metadata methods using ClickHouse's system.* tables:

Method ClickHouse Source
meta_get_tables() system.tables (name, engine, total_rows, total_bytes, comment)
meta_get_columns() system.columns (name, type, default_kind, default_expression, comment)
meta_get_column_statistics() Per-table queries using uniqExact, min, max, countIf against actual data
meta_get_primary_keys() Return empty — ClickHouse MergeTree uses ORDER BY keys, not traditional PKs
meta_get_foreign_keys() Return empty — ClickHouse does not enforce foreign key constraints
meta_get_handler_info() Return ClickHouse-specific SQL dialect guidance

Key Implementation Notes

  1. meta_get_tables() — Use system.tables which provides database, name, engine, total_rows, total_bytes, and comment.

  2. meta_get_columns() — Use system.columns which provides table, name, type, default_kind,
    default_expression, comment, and is_in_primary_key.

  3. meta_get_column_statistics() — ClickHouse lacks PostgreSQL-style pg_stats. Statistics must be computed per-table using aggregate functions (uniqExact, min, max, countIf(x, IS NULL)). Consider implementing meta_get_column_statistics_for_table() to leverage the base class's concurrent execution pattern.

  4. Primary/Foreign Keys — ClickHouse does not enforce traditional primary/foreign key constraints. These methods should return an empty Response(RESPONSE_TYPE.TABLE, pd.DataFrame()).

  5. Handler Info — Provide guidance on ClickHouse-specific SQL dialect (e.g., ENGINE clause requirements, MergeTree family, lack of UPDATE/DELETE in older versions, array functions, etc.).

Files to Modify

  • mindsdb/integrations/handlers/clickhouse_handler/clickhouse_handler.py — Change base class, add meta_get_* methods
  • mindsdb/integrations/handlers/clickhouse_handler/tests/test_clickhouse_handler.py — Add tests for metadata methods
  • docs/data_catalog/integrations/overview.mdx — Add ClickHouse to the supported integrations list

Impact

This change is backward compatible — existing ClickHouse handler functionality remains unchanged. The new methods are additive and only invoked when the Data Catalog feature is enabled.

Environment

  • MindsDB version: latest (main branch)
  • ClickHouse versions: 22.x+ (system tables are stable since this version)

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions