Skip to content

Adbdev 2849 v3#464

Open
HustonMmmavr wants to merge 2 commits intoadb-6.xfrom
ADBDEV-2849-v3
Open

Adbdev 2849 v3#464
HustonMmmavr wants to merge 2 commits intoadb-6.xfrom
ADBDEV-2849-v3

Conversation

@HustonMmmavr
Copy link

@HustonMmmavr HustonMmmavr commented Jan 26, 2023

Problem description

Currently, ExecDML() passes NULL value to parameter epqstate in ExecDelete() call marking it as DestReceiver. As a consequence, the EPQ functionality is not possible for heap tables in case when DML node (instead of postgres-specific ModifyTable) is planned.
The DestReceiver name was used in interface of ExecDelete() in gpdb-5x releases therefore we might conclude that preparing gpdb 6 the EPQ feature was not appropriately ported.

For testing we might use the following case (work when global deadlock detector is enabled):

-- Create table with one row
create table test as select 1 as i distributed randomly;

-- Start update transaction
1: begin;
1: set optimizer to off;
1: update test set i = i where i = 1;

-- Second session starts deletion of target row
-- it waits for result of update
2&: delete from test where i = 1;

-- Commit the first transaction;
-- Deletion is performed on second version of row
-- As this version is checked on predicate `i = 1` using EPQ routine we end up with:
-- ```ERROR:  could not serialize access due to concurrent update
-- HINT:  Use PostgreSQL Planner instead of Optimizer for this query via optimizer=off GUC setting```
-- Correct behavior have to successfully remove the row
1: commit;

This patch adds support of EPQ mechanism for the orca planner (in the second commit) for DML operations: the DML and DMLState objects were extended with epqParam and epqstate fields respectively. Also there was added the fix to orca (in the first commit) that makes target relation of DML operation and main scanning relation to refer to the same RTE - this is a backport from 7X. The last problem reveals in the following scenario

create table test as select 0 as i distributed randomly;

-- Start update transaction
1: begin;
1: set optimizer = off;
1: update test set i = i + 1;

-- Second session tries to delete modified row and
-- it waits for result of update
2&: delete from test where i = 0;
1: end;
-- as a result updated row was deleted - this is
-- not vaild behaviour (there must be no deletion)
DELETE 1

Here is a plan of delete query, there is a problem: DML and SEQSCAN node has different scanrelid (1 and 2 respectively), but they should point to the same RTE (like at pg planner).

{PLANNEDSTMT 
    ...
   :planTree 
      {DML 
      :scanrelid 1 
      ...
      :lefttree 
         {RESULT 
         :plan_node_id 1 
         ...
         :lefttree 
            {SEQSCAN 
            :plan_node_id 2
            ... 
            :scanrelid 2
            }
        ...
         }
      ...
      }
   :rtable (
      {RTE 
      :alias <> 
      :eref 
         {ALIAS 
         :aliasname test 
         :colnames (""i"")
         }
      :rtekind 0 
      :relid 16387 
      :requiredPerms 8 
      ...
      }
      {RTE 
      :alias <> 
      :eref 
         {ALIAS 
         :aliasname test 
         :colnames (""i"")
         }
      :rtekind 0 
      :relid 16387 
      :requiredPerms 2 
      ...
      }
   )
   :resultRelations (i 1)
   :relationOids (o 16387 16387 16387)
   ...
   }

EPQ mechanism through function call chain calls ExecScanFetch.
Due to incorrect scanrelid at SEQSCAN node next branch if (estate->es_epqTupleSet[scanrelid - 1]) won't be executed. Thus instead of working with slot from es_epqTuple cache - slot with an old row is returned from access method routine - (*accessMtd) (node) (which uses snapshot for delete operation). As a result EvalPlanQual returns not null slot with an old version of row to ExecDelete, but tupleid points to the updated version and deletion is performed

static inline TupleTableSlot *
ExecScanFetch(ScanState *node,
			  ExecScanAccessMtd accessMtd,
			  ExecScanRecheckMtd recheckMtd)
{
	EState	   *estate = node->ps.state;
	if (estate->es_epqTuple != NULL)
	{
		//...
		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
		//...
		if (estate->es_epqTupleSet[scanrelid - 1])
		{
			TupleTableSlot *slot = node->ss_ScanTupleSlot;

			/* Return empty slot if we already returned a tuple */
			if (estate->es_epqScanDone[scanrelid - 1])
				return ExecClearTuple(slot);
			//...
			/* Store test tuple in the plan node's scan slot */
			ExecStoreHeapTuple(estate->es_epqTuple[scanrelid - 1],
						   slot, InvalidBuffer, false);
			//...
			return slot;
		}
	}

	/*
	 * Run the node-type-specific access method function to get the next tuple
	 */
	return (*accessMtd) (node);
}

At the initial Query structure (parsed and analyzed query) we have right links to RTE's, so to fix the problem with scanrelid - we want to transfer this information to the initial algebrized plan tree of orca. So we assign the queryid to each query structure (for subqueruies queryid assigned too) and, also, we assign queryid to table descriptors CDXLTableDescr which explicitly refers to target relation of dml. After orca planning transformations this value exposes to which target relation of DML operation is referred the current table descriptor. And this allows to correctly assign RTE indexes in result plan under transformation from DXL representation.

Forced changes

There are three groups of tests which was fixed after patch changes

Tests on locks (with one relation)

  • src/test/regress/expected/ao_locks_optimizer.out (1, 2)
  • src/test/isolation2/output/uao/parallel_delete_optimizer.source (1)
  • src/test/isolation2/output/uao/parallel_update_optimizer.source )(1)

At these tests DML operation is performed on the one table. Due to fixes of this patch, DML operations at these tests has only one (single) RTE - target relation, unlike it was before (scanrelid of DML and *SCAN plan nodes were different and plan contained two rtables, but there should be only one). So call ExecOpenScanRelation (on target relation) now doesn't set excess AccessShareLock, and relations still have more privileged Exclusive lock.

Tests with explain (with one relation)

  • src/test/regress/expected/updatable_views_optimizer.out (1, 2)
  • src/test/regress/expected/update_optimizer.out (1)
  • src/test/isolation2/expected/modify_table_data_corrupt_optimizer.out (1)

Like at previous test group, DML operations at these tests performed on the one table - target relation, so now plan has correct scanrelid at DML and *SCAN nodes, and only one RTE contains at rtable list. Prefixes to column names at queries, that uses one table is not shown, like it is in postgres. Here is some links to code, related to showing prefix (setting flag to show prefix, and function get_variable, where prefix may be appended).

Tests on partition locking

  • src/test/regress/expected/partition_locking_optimizer.out (1)

Like at previous test group, DML operations at this test performed on the one table - target relation. Thus index_open opens index with NoLock.

Tests with explain (used more than one relation)

  • src/test/regress/expected/bfv_dml_optimizer.out (1)
  • src/test/regress/expected/updatable_views_optimizer.out (1)

This test group was changed due to fixes of scanrelid too. scanrelid for DML and associated to it *SCAN node points to the same rte (and rtable list doesn't contain duplicate of dml target relation). Thus ExplainPrintPlan now calculates list of relation names for explain a bit different (in comparsion to what was before) (calculation is based on rels_used bitmap - now it calculates correctly - there is no unused relation in a plan), so position of table names with suffixes changed due to fixes of scanrelid's.

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Communicate in the mailing list if needed
  • Pass make installcheck
  • Review a PR in return to support the community

@HustonMmmavr HustonMmmavr force-pushed the ADBDEV-2849-v3 branch 2 times, most recently from 15a6f77 to 44246a8 Compare January 27, 2023 01:03
@HustonMmmavr HustonMmmavr marked this pull request as ready for review January 27, 2023 06:19
**Issue**
Currently EPQ mechanism doesn't work correctly in ORCA.

**Example**
```sql
set gp_enable_global_deadlock_detector = on
create table test as select 0 as i distributed randomly;

-- Start transaction
-- First session increments the value from 0 to 1
1: begin;
1: update test set i = i + 1;

-- Second session attempts to delete the old value 0
2: delete from test where i = 0;
1: end;

-- The updated row ends up being deleted
-- This is not the expected behavior
```

**Root cause**
As shown in the following plan tree, `MODIFYTABLE->resultRelations` indicates rte 1 is the result relation. However, `SEQSCAN->scanrelid` points to rte 2. This causes the wrong slot (in this case the cache slot with the old value) being handed for deletion operation. Whereas the rte links are correct in the query parsing stage (parse tree), the information isn't transferred to the algebrized query (logical expression).

```
{PLANNEDSTMT
    ...
   :planTree
      {MODIFYTABLE
      ...
      :resultRelations (i 1)
      ...
      :plans (
         {SEQSCAN
         ...
         :plan_node_id 1
         ...
         :scanrelid 2
         ...
         }
      )
      ...
      }
   :rtable (
      {RTE
      :alias <>
      :eref
         {ALIAS
         :aliasname test
         :colnames (""i"")
         }
      :relid 16548
      :requiredPerms 8
      ...
      }
      {RTE
      :alias <>
      :eref
         {ALIAS
         :aliasname test
         :colnames (""i"")
         }
      :relid 16548
      :requiredPerms 2
      ...
      }
   )
   :resultRelations (i 1)
   :relationOids (o 16548 16548 16548)
   ...
   }
```

**Solution**
We assign a query id to each query structure, including subqueries. The query id would be attached to the table descriptors, and is used to direct us to the target relation of DML. The idea is to maintain a `query id --> rte index` map. A new rte would be added only if it's not already in the map. An old rte would be reused, potentially having its permission (SELECT/INSERT/UPDATE/DELETE) updated. Multiple query id's are allowed to be directed to the same rte. With this patch, the rte's in the plan tree are deduplicated.
Propagate epqState, to use eqp mechanism, like at it's done at
pg planner. Also add epqParam to `DML` node, like it's done at
`ModifyTable`, and fill it like it's done at pg planner:
`make_modifytable(..., SS_assign_special_param(root))`
where SS_assign_special_param post-increments root->glob->nParamExec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant