Skip to content

Add support for a clickhouse sink#38

Open
fbegyn wants to merge 9 commits into
ti-mo:masterfrom
fbegyn:clickhouse-sink
Open

Add support for a clickhouse sink#38
fbegyn wants to merge 9 commits into
ti-mo:masterfrom
fbegyn:clickhouse-sink

Conversation

@fbegyn
Copy link
Copy Markdown

@fbegyn fbegyn commented Sep 30, 2025

This adds support for a clickhouse sink to conntracct.

The implementation also adds the sink boolean configuration of latestValues. This allows the clickhouse sink to be configured in 2 possible ways:

  1. Timestamped values storage: latestValues: false. This make new entries for every recorded entry in conntracct. This allows the user to later "plot" the flow throughout it's lifetime, allowing to track the process of a flow over time.

  2. Latest value storage: latestValues: true. This is quite similar to how the ES sink is currently implemented. It will only store the latest values of a flow in Clickhouse. This is achieved by using a ReplacingMergeTree schema in Clickhouse that allows only for a singular flow entry and updates on insert.

I think we talked that only 1. should be truly implemented, but seeing that the ES sink works in a similar way makes me wonder if this PR is good as is or I should strip out 2. from it.

Possibly will also add support for a duckdb sink after this PR.

fbegyn added 9 commits March 21, 2025 23:00
Clickhouse is a good competitor for Opensearch/Elasticsearch that just
lets you use SQL against some tables.
Maybe it's not a terrible idea to see how an implementation as a sink
would look.

Signed-off-by: Francis Begyn <francis@begyn.be>
This commit makes it so a functional clickhouse interface can be setup
though the config files. This still needs some cleaning up and formatting
work done, but as a PoC it might work.

Signed-off-by: Francis Begyn <francis@begyn.be>
Signed-off-by: Francis Begyn <francis@begyn.be>
This will make it so we only have 1 single entry for each network flow. It could
be that this causes some performance issues on the clickhouse side, but needs
some further review.

The beheviour is "eventual" consistency, so entries will exist duplicate while
Clickhouse reconciles the setup.
Depending on the use case it might be handy to be able to select wether or not we
would like to keep all the events (more storage required but the option to graph
flows their progress over time) or only keep track off the latest state of the
network flow (less storage, but no data about the progress of the flow)
Signed-off-by: Francis Begyn <francis@begyn.be>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant