Kafka Hardlimit#102
Conversation
Upgrading to 0.132-tw-0.16
Upgrade to 0.139-tw-17
…scovery Add a few logs in DiscoveryNodeManager to debug 'No worker nodes' error.
…_queries Add query logging of all presto queries
Upgrade to 0.139-tw-0.18
…sitive_parquet_column_match Look up parquet columns by name case-insensitive
…are_parquet_column_match Handle hive keywords when doing a name-based parquet field lookup
Upgrade to 0.139-tw-0.19
[maven-release-plugin] copy for tag 0.141 # Conflicts: # pom.xml
Upgrade to 0.141-tw-0.20
upgrade Presto to 0.179
…at-off-in-default Set default value for hive statistic feature to false
…function signature
Upgrade to 181
Kafka07 plugin
Fix build and style
…iver-node-version Tableau Presto ODBC Driver fix
| @@ -76,7 +76,17 @@ public class KafkaConnectorConfig | |||
| /** | |||
| * Fetch size | |||
There was a problem hiding this comment.
Can you elaborate more about what Fetch Size is
| private DataSize fetchSize = new DataSize(10, Unit.MEGABYTE); | ||
|
|
||
| /** | ||
| * Default Query Interval |
There was a problem hiding this comment.
This is a variable. So:
- Don't call it
defaultXXXX. - Default value should be a
finalvalue. Here, it is good to have one final Duration value calledDEFAULT_QUERY_RANGEand a variable calledqueryInterval. And then assign the default value of the variable with the constant value.
Also, can you elaborate more about what this variable means? I have to read through the logic code using this variable to understand it.
There was a problem hiding this comment.
I think based on the code style in Presto, it's better just assign the default value here. I agree that the variable name here is a little bit confusing. But the "defaultQueryInterval" means that this value is the default query interval if the query interval is not specified in SQL.
There was a problem hiding this comment.
Basing on the code logic, i.e. this value can be assigned. It is not a default value. Default value is a constant value.
There was a problem hiding this comment.
defaultQueryInterval is the variable and Duration.valueOf("10m") is the default value for this variable which is a constant value.
There was a problem hiding this comment.
As above:
This is a variable. So:
- Don't call it
defaultXXXX.- Default value should be a
finalvalue. Here, it is good to have one final Duration value calledDEFAULT_QUERY_RANGEand a variable calledqueryInterval. And then assign the default value of the variable with the constant value.
Call it queryInterval if it is a variable. (Or queryRange can be more descriptive). Otherwise, it is confusing; people need to go through the code logic to understand it.
There was a problem hiding this comment.
As above:
But the "defaultQueryInterval" means that this value is the default query interval if the query interval is not specified in SQL.
| private Duration defaultQueryInterval = Duration.valueOf("10m"); | ||
|
|
||
| /** | ||
| * Hard limit on |
There was a problem hiding this comment.
Also, can you elaborate more about what this variable means? I have to read through the logic code using this variable to understand it.
| /** | ||
| * Hard limit on | ||
| */ | ||
| private boolean hardLimitOn = true; |
There was a problem hiding this comment.
No. "isHardLimitOn" should be a method instead of a variable.
There was a problem hiding this comment.
Then the variable name could be hardLimited
|
|
||
| @Config("kafka.fetch-size") | ||
| public KafkaConnectorConfig setFetchSize(int fetchSize) | ||
| public KafkaConnectorConfig setFetchSize(String fetchSize) |
There was a problem hiding this comment.
I have a feeling you want to use the Builder pattern to achieve the immutable Config class. But it seems self-finished; not that great.
There was a problem hiding this comment.
No, this class is reading the config files
| } | ||
|
|
||
| log.info("startTs: %s, endTs: %s", startTs, endTs); | ||
| if (config.isHardLimitOn() && startTs == null && endTs == null) { |
There was a problem hiding this comment.
I don't understand how the hard_limit is applied. Can you add more comments to make it clear? Especially for the for (Map.Entry<ColumnHandle, Domain> entry : columnHandleDomainMap.entrySet()) loop above.
Also, could we check whether it meets the expectations:
- If users specify a time range exceeding the limitation, an exception should be thrown directly. Otherwise, the behavior is changed silently and unexpectedly.
- If users do not specify the range, we could either throw an exception directly, or trim to the limited time range with an explicit logging/print-out notifying users.
There was a problem hiding this comment.
Here are some thoughts about your expectations:
- If the user doesn't know how to specify a range, they will fall into default query interval.
- If the user knows how to specify a range, they must agree with default query interval and must have a specific reason to go beyond the default query interval. It would be costly to serve the users if they actually have this specific request. It will involve code review and service deploy processes which wouldn't the ideal way for serving our customers for urgent needs.
There was a problem hiding this comment.
-
If the user doesn't know how to specify a range, Presto should either throw an exception directly, or trim to the limited time range with an explicit logging/print-out notifying users. Silently unexpected behavior is bad.
-
A design should be robust to users' mistakes/errors, rather than to rely on users always providing safe input. Also, this is what limitation means: it should not happen.
If users want to query exceeding the limitation, they need to notify all stakeholders, for instance, messaging team, since it could potentially break the whole kafka service, leading to a SEV0 or SEV1.
| { | ||
| return testSessionBuilder() | ||
| .setCatalog("kafka") | ||
| .setCatalog("kafka07") |
There was a problem hiding this comment.
This is good. I like the version tag attached.
| private DataSize fetchSize = new DataSize(10, Unit.MEGABYTE); | ||
|
|
||
| /** | ||
| * Default Query Interval |
There was a problem hiding this comment.
I think based on the code style in Presto, it's better just assign the default value here. I agree that the variable name here is a little bit confusing. But the "defaultQueryInterval" means that this value is the default query interval if the query interval is not specified in SQL.
|
|
||
| @Config("kafka.fetch-size") | ||
| public KafkaConnectorConfig setFetchSize(int fetchSize) | ||
| public KafkaConnectorConfig setFetchSize(String fetchSize) |
There was a problem hiding this comment.
No, this class is reading the config files
| /** | ||
| * Hard limit on | ||
| */ | ||
| private boolean hardLimitOn = true; |
There was a problem hiding this comment.
No. "isHardLimitOn" should be a method instead of a variable.
| } | ||
|
|
||
| log.info("startTs: %s, endTs: %s", startTs, endTs); | ||
| if (config.isHardLimitOn() && startTs == null && endTs == null) { |
There was a problem hiding this comment.
Here are some thoughts about your expectations:
- If the user doesn't know how to specify a range, they will fall into default query interval.
- If the user knows how to specify a range, they must agree with default query interval and must have a specific reason to go beyond the default query interval. It would be costly to serve the users if they actually have this specific request. It will involve code review and service deploy processes which wouldn't the ideal way for serving our customers for urgent needs.
No description provided.