Add plan for coalescing operations during streaming#770
Conversation
|
|
||
| ## Expected impact | ||
|
|
||
| For the customer's workload (9,737 DELETEs in a single batch): |
There was a problem hiding this comment.
Did we see other columns in the WHERE condition in users' WAL events? In the future we could make it more flexible and batch DELETE statements with different column names. Example:
DELETE FROM t1 WHERE my_column in ($1, $2, $3)There was a problem hiding this comment.
WALs always refer to changes by their identify columns, so we don't have to support complex WHERE conditions.
kvch
left a comment
There was a problem hiding this comment.
This is a good improvement. Hopefully, it will solve most of the issues for us. If not, there are several ways we can improve the process.
batch_writer_coalesce_ops_plan.md
Outdated
|
|
||
| Batch raw WAL events instead of pre-built SQL strings, then build bulk SQL at execution time. | ||
|
|
||
| - N DELETEs on the same table become: `DELETE FROM t WHERE "id" IN ($1, $2, ..., $N)` |
There was a problem hiding this comment.
Should we also consider composite primary keys?
There was a problem hiding this comment.
Ah, i see that they are included.
|
|
||
| 1. Separate DDL and DML messages | ||
| 2. For DDL: build and execute queries via existing `ddlAdapter` | ||
| 3. For DML: walk messages in order, building "runs" of consecutive same-(schema, table, action) events: |
There was a problem hiding this comment.
Given that we already walk the messages in order, we can make the query aggregator logic a bit smarter, and consider adding interleaved DELETEs, if there is no INSERT or UPDATE statement that conflicts with them.
There was a problem hiding this comment.
Yes, we can add this later. In some situation we can prove that it's still correct.
kvch
left a comment
There was a problem hiding this comment.
Solid plan, small comments
| 3. For DML: look up `schemaInfo` via schema observer (cached), create `walMessage` with raw `wal.Data` + `schemaInfo` | ||
| 4. Send `walMessage` to batch sender | ||
|
|
||
| ### 4. Refactor BatchWriter.sendBatch — bulk query building |
There was a problem hiding this comment.
What happens if the sending fails? Do we fall back to separate DELETE statements?
There was a problem hiding this comment.
Maybe for simplicity for now we log as DATALOSS, I think this is similar to how we do for batch inserts in the snapshot mode.
batch_writer_coalesce_ops_plan.md
Outdated
|
|
||
| In practice this works well for the target workload: WAL events from bulk operations on the source database (batch purges, accounting reconciliation, ETL loads) naturally produce long runs of the same operation on the same table, which coalesce effectively. | ||
| 5. Execute via existing `flushQueries` / `execQueries` | ||
| 6. Respect PostgreSQL's 65,535 parameter limit — split runs at ~60,000 params |
There was a problem hiding this comment.
I guess we can avoid this limit by using WHERE id = ANY($1::bigint[]) because the array is just one parameter.
So the code would be:
fmt.Sprintf("DELETE FROM %s WHERE id = ANY($1::%s[])", table, idType)
args := []any{idArray}This would perform better for multiple reasons:
- only one parameter is passed to postgresql, so the parameter binding overhead is smaller
- in the planning phase the original suggestion takes longer because the planner has to optimize a long-long condition with many
ORoperators. with the array, there is no such optimization
There was a problem hiding this comment.
great suggestion! Seems like we can do this when the types are well known (scalar types) and fallback to IN for the corner cases.
Trying something new: this adds only the plan for solving #769, so we can discuss it before implementing.
Putting it in Draft mode because this is only meant for discussion.