Skip to content

Configurable max rows per streaming request #237

Open
Francesco Capponi (FreCap) wants to merge 2 commits into
confluentinc:masterfrom
FreCap:max-rows-per-request
Open

Configurable max rows per streaming request #237
Francesco Capponi (FreCap) wants to merge 2 commits into
confluentinc:masterfrom
FreCap:max-rows-per-request

Conversation

@FreCap
Copy link
Copy Markdown

Due to BQ streaming put limitations, the max request size is 10MB.

Hence, considering that in average 1 record takes at least 20 bytes, if we have big batches (e.g. 500000) we might need to run against BigQuery multiple requests that would return a Request Too Large before finding the right size.

This config allows starting from a lower value altogether and reduce the amount of failed requests. Only works with simple TableWriter (no GCS)

Otherwise this can lead to

BigQueryException
Request payload size exceeds the limit: 10485760 bytes.

BigQueryException
Unexpected end of file from server

BigQueryException
Remote host terminated the handshake

BigQueryException
Error writing request body to server

@FreCap Francesco Capponi (FreCap) requested a review from a team as a code owner September 15, 2022 00:24
Copy link
Copy Markdown
Member

@b-goyal Bhagyashree (b-goyal) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, Francesco Capponi (@FreCap) and apologies for the delay in reviewing. I have left a few comments, please take a look when you get chance.

"The interval, in seconds, in which to attempt to run GCS to BQ load jobs. Only relevant "
+ "if enableBatchLoad is configured.";

public static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_CONFIG = "bqStreamingMaxRowsPerRequest";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to - maxRowsPerRequest


public static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_CONFIG = "bqStreamingMaxRowsPerRequest";
private static final ConfigDef.Type BQ_STREAMING_MAX_ROWS_PER_REQUEST_TYPE = ConfigDef.Type.INT;
private static final Integer BQ_STREAMING_MAX_ROWS_PER_REQUEST_DEFAULT = 50000;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have the default behaviour same. We can use '-1' to say this is disabled and have that as the default

private static final ConfigDef.Type BQ_STREAMING_MAX_ROWS_PER_REQUEST_TYPE = ConfigDef.Type.INT;
private static final Integer BQ_STREAMING_MAX_ROWS_PER_REQUEST_DEFAULT = 50000;
private static final ConfigDef.Importance BQ_STREAMING_MAX_ROWS_PER_REQUEST_IMPORTANCE = ConfigDef.Importance.LOW;
private static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_DOC =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum number of rows to be sent in one batch in the request payload to bigquery.
This can reduce number of failed calls due to Request Too Large if the payload exceeds BigQuery specified quota limits. (https://cloud.google.com/bigquery/quotas#write-api-limits)
Setting it to a low value can result in degraded performance of the connector

"that would return a `Request Too Large` before finding the right size. " +
"This config allows starting from a lower value altogether and reduce the amount of failed requests. " +
"Only works with simple TableWriter (no GCS)";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a validator as well with minimum and maximum values allowed.
-1 -> default
1 -> min
50,000 -> max (https://cloud.google.com/bigquery/quotas#write-api-limits)

@FreCap
Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants