Open
Conversation
15a6f77 to
44246a8
Compare
44246a8 to
43a1248
Compare
**Issue**
Currently EPQ mechanism doesn't work correctly in ORCA.
**Example**
```sql
set gp_enable_global_deadlock_detector = on
create table test as select 0 as i distributed randomly;
-- Start transaction
-- First session increments the value from 0 to 1
1: begin;
1: update test set i = i + 1;
-- Second session attempts to delete the old value 0
2: delete from test where i = 0;
1: end;
-- The updated row ends up being deleted
-- This is not the expected behavior
```
**Root cause**
As shown in the following plan tree, `MODIFYTABLE->resultRelations` indicates rte 1 is the result relation. However, `SEQSCAN->scanrelid` points to rte 2. This causes the wrong slot (in this case the cache slot with the old value) being handed for deletion operation. Whereas the rte links are correct in the query parsing stage (parse tree), the information isn't transferred to the algebrized query (logical expression).
```
{PLANNEDSTMT
...
:planTree
{MODIFYTABLE
...
:resultRelations (i 1)
...
:plans (
{SEQSCAN
...
:plan_node_id 1
...
:scanrelid 2
...
}
)
...
}
:rtable (
{RTE
:alias <>
:eref
{ALIAS
:aliasname test
:colnames (""i"")
}
:relid 16548
:requiredPerms 8
...
}
{RTE
:alias <>
:eref
{ALIAS
:aliasname test
:colnames (""i"")
}
:relid 16548
:requiredPerms 2
...
}
)
:resultRelations (i 1)
:relationOids (o 16548 16548 16548)
...
}
```
**Solution**
We assign a query id to each query structure, including subqueries. The query id would be attached to the table descriptors, and is used to direct us to the target relation of DML. The idea is to maintain a `query id --> rte index` map. A new rte would be added only if it's not already in the map. An old rte would be reused, potentially having its permission (SELECT/INSERT/UPDATE/DELETE) updated. Multiple query id's are allowed to be directed to the same rte. With this patch, the rte's in the plan tree are deduplicated.
Propagate epqState, to use eqp mechanism, like at it's done at pg planner. Also add epqParam to `DML` node, like it's done at `ModifyTable`, and fill it like it's done at pg planner: `make_modifytable(..., SS_assign_special_param(root))` where SS_assign_special_param post-increments root->glob->nParamExec
43a1248 to
e838191
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem description
Currently,
ExecDML()passesNULLvalue to parameterepqstateinExecDelete()call marking it asDestReceiver. As a consequence, the EPQ functionality is not possible for heap tables in case whenDMLnode (instead of postgres-specificModifyTable) is planned.The
DestReceivername was used in interface ofExecDelete()in gpdb-5x releases therefore we might conclude that preparing gpdb 6 the EPQ feature was not appropriately ported.For testing we might use the following case (work when global deadlock detector is enabled):
This patch adds support of
EPQmechanism for theorcaplanner (in the second commit) forDMLoperations: theDMLandDMLStateobjects were extended withepqParamandepqstatefields respectively. Also there was added the fix toorca(in the first commit) that makes target relation ofDMLoperation and main scanning relation to refer to the sameRTE- this is a backport from 7X. The last problem reveals in the following scenarioHere is a plan of
deletequery, there is a problem:DMLandSEQSCANnode has differentscanrelid(1 and 2 respectively), but they should point to the same RTE (like at pg planner).EPQ mechanism through function call chain calls
ExecScanFetch.Due to incorrect
scanrelidatSEQSCANnode next branchif (estate->es_epqTupleSet[scanrelid - 1])won't be executed. Thus instead of working with slot fromes_epqTuplecache - slot with an old row is returned from access method routine -(*accessMtd) (node)(which uses snapshot for delete operation). As a resultEvalPlanQualreturns not null slot with an old version of row toExecDelete, but tupleid points to the updated version and deletion is performedAt the initial Query structure (parsed and analyzed query) we have right links to
RTE's, so to fix the problem withscanrelid- we want to transfer this information to the initial algebrized plan tree of orca. So we assign thequeryidto each query structure (for subqueruiesqueryidassigned too) and, also, we assignqueryidto table descriptorsCDXLTableDescrwhich explicitly refers to target relation of dml. After orca planning transformations this value exposes to which target relation ofDMLoperation is referred the current table descriptor. And this allows to correctly assignRTEindexes in result plan under transformation fromDXLrepresentation.Forced changes
There are three groups of tests which was fixed after patch changes
Tests on locks (with one relation)
At these tests DML operation is performed on the one table. Due to fixes of this patch, DML operations at these tests has only one (single) RTE - target relation, unlike it was before (scanrelid of DML and
*SCANplan nodes were different and plan contained two rtables, but there should be only one). So callExecOpenScanRelation(on target relation) now doesn't set excessAccessShareLock, and relations still have more privilegedExclusivelock.Tests with explain (with one relation)
Like at previous test group,
DMLoperations at these tests performed on the one table - target relation, so now plan has correctscanrelidatDMLand*SCANnodes, and only oneRTEcontains atrtablelist. Prefixes to column names at queries, that uses one table is not shown, like it is in postgres. Here is some links to code, related to showing prefix (setting flag to show prefix, and function get_variable, where prefix may be appended).Tests on partition locking
Like at previous test group,
DMLoperations at this test performed on the one table - target relation. Thus index_open opens index withNoLock.Tests with explain (used more than one relation)
This test group was changed due to fixes of
scanrelidtoo.scanrelidfor DML and associated to it*SCANnode points to the same rte (and rtable list doesn't contain duplicate of dml target relation). ThusExplainPrintPlannow calculates list of relation names forexplaina bit different (in comparsion to what was before) (calculation is based onrels_usedbitmap - now it calculates correctly - there is no unused relation in a plan), so position of table names with suffixes changed due to fixes ofscanrelid's.Here are some reminders before you submit the pull request
make installcheck