Add support for a clickhouse sink#38
Open
fbegyn wants to merge 9 commits into
Open
Conversation
Clickhouse is a good competitor for Opensearch/Elasticsearch that just lets you use SQL against some tables. Maybe it's not a terrible idea to see how an implementation as a sink would look. Signed-off-by: Francis Begyn <francis@begyn.be>
This commit makes it so a functional clickhouse interface can be setup though the config files. This still needs some cleaning up and formatting work done, but as a PoC it might work. Signed-off-by: Francis Begyn <francis@begyn.be>
Signed-off-by: Francis Begyn <francis@begyn.be>
This will make it so we only have 1 single entry for each network flow. It could be that this causes some performance issues on the clickhouse side, but needs some further review. The beheviour is "eventual" consistency, so entries will exist duplicate while Clickhouse reconciles the setup.
Depending on the use case it might be handy to be able to select wether or not we would like to keep all the events (more storage required but the option to graph flows their progress over time) or only keep track off the latest state of the network flow (less storage, but no data about the progress of the flow)
Signed-off-by: Francis Begyn <francis@begyn.be>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds support for a clickhouse sink to conntracct.
The implementation also adds the sink
booleanconfiguration oflatestValues. This allows the clickhouse sink to be configured in 2 possible ways:Timestamped values storage:
latestValues: false. This make new entries for every recorded entry in conntracct. This allows the user to later "plot" the flow throughout it's lifetime, allowing to track the process of a flow over time.Latest value storage:
latestValues: true. This is quite similar to how the ES sink is currently implemented. It will only store the latest values of a flow in Clickhouse. This is achieved by using aReplacingMergeTreeschema in Clickhouse that allows only for a singular flow entry and updates on insert.I think we talked that only 1. should be truly implemented, but seeing that the ES sink works in a similar way makes me wonder if this PR is good as is or I should strip out 2. from it.
Possibly will also add support for a duckdb sink after this PR.