forked from postgres/postgres
-
Notifications
You must be signed in to change notification settings - Fork 0
cf 5556 #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
gburd
wants to merge
5
commits into
master
Choose a base branch
from
cf-5556
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
e777a6e to
650f621
Compare
16e0007 to
331cd76
Compare
9558f42 to
05c4e60
Compare
ae8af13 to
9f584af
Compare
This commit refactors the interaction between heap_tuple_update(), heap_update(), and simple_heap_update() to improve code organization and flexibility. The changes are functionally equivalent to the previous implementation and have no performance impact. The primary motivation is to prepare for upcoming modifications to how and where modified attributes are identified during the update path, particularly for catalog updates. As part of this reorganization, the handling of replica identity key attributes has been adjusted. Instead of fetching a second copy of the bitmap during an update operation, the caller is now required to provide it. This change applies to both heap_update() and heap_delete(). No user-visible changes.
Refactor executor update logic to determine which indexed columns have actually changed during an UPDATE operation rather than leaving this up to HeapDetermineColumnsInfo in heap_update. This enables the comparison to happen without taking a lock on the page and opens the door to reuse in other code paths. Because heap_update now requires the caller to provide the modified indexed columns simple_heap_update has become a tad more complex. It is frequently called from CatalogTupleUpdate which either updates heap tuples via their form or using heap_modify_tuple. In both cases the caller does know the modified set of attributes, but sadly those attributes are lost before being provided to simple_heap_update. Due to that the "simple" path has to retain the HeapDetermineColumnsInfo logic of old (for now). In order for that to work it was necessary to split the (overly large) heap_update call itself up. This moves up into simple_heap_update and heap_tuple_update a bit of what existed in heap_update itself. Ideally this will be cleaned up once CatalogTupleUpdate paths are all recording modified attributes correctly, when that happens the "simple" path can be simplified again. ExecCheckIndexedAttrsForChanges replaces HeapDeterminesColumnsInfo and tts_attr_equal replaces heap_attr_equal changing the test for equality when calling into heap_tuple_update (but not simple_heap_update). In the past we used datumIsEqual(), essentially a binary comparison using memcmp(), now the comparison code in tts_attr_equal uses type-specific equality function when available and falls back to datumIsEqual() when not. This change in equality testing has some intended implications and opens the door for more HOT updates (foreshadowing). For instance, indexes with collation information allowing more HOT updates when the index is specified to be case insensitive. This change forced some logic changes in execReplication on the update paths is now it is required to have knowledge of the set of attributes that are both changed and referenced by indexes. Luckilly, the this is available within calls to slot_modify_data() where LogicalRepTupleData is processed and has a set of updated attributes. In this case rather than using ExecCheckIndexedAttrsForChanges we can preseve what slot_modify_data() identifies as the modified set and then intersect that with the set of indexes on the relation and get the correct set of modified indexed attributes required on heap_update().
In execIndexing on updates we'd like to pass a hint to the indexing code when the indexed attributes are unchanged. This commit replaces the now redundant code in index_unchanged_by_update with the same information found earlier in the update path.
Currently, PostgreSQL conservatively prevents HOT (Heap-Only Tuple) updates whenever any indexed column changes, even if the indexed portion of that column remains identical. This is overly restrictive for expression indexes (where f(column) might not change even when column changes) and partial indexes (where both old and new tuples might fall outside the predicate). Finally, index AMs play no role in deciding when they need a new index entry on update, the rules regarding that are based on binary equality and the HEAP's model for MVCC and related HOT optimization. Here we open that door a bit so as to enable more nuanced control over the process. This enables index AMs that require binary equality (as is the case for nbtree) to do that without disallowing type-specific equality checking for other indexes. This patch introduces several improvements to enable HOT updates in these cases: Add amcomparedatums() callback to IndexAmRoutine. This allows index access methods like GIN to provide custom logic for comparing datums by extracting and comparing index keys rather than comparing the raw datums. GIN indexes now implement gincomparedatums() which extracts keys from both datums and compares the resulting key sets. Also, as mentioned earlier nbtree implements this API and uses datumIsEqual() for equality so that the manner in which it deduplicates TIDs on page split doesn't have to change. This is not a required API, when not implemented the executor will compare TupleTableSlot datum for equality using type-specific operators and take into account collation so that an update from "Apple" to "APPLE" on a case insensitive index can now be HOT. ExecWhichIndexesRequireUpdates() is re-written to find the set of modified indexed attributes that trigger new index tuples on updated. For partial indexes, this checks whether both old and new tuples satisfy or fail the predicate. For expression indexes, this uses type-specific equality operators to compare computed values. For extraction-based indexes (GIN/RUM) that implement amcomparedatums() it uses that. Importantly, table access methods can still signal using TU_Update if all, none, or only summarizing indexes should be updated. While the executor layer now owns determining what has changed due to an update and is interested in only updating the minimum number of indexes possible, the table AM can override that while performing table_tuple_update(), which is what heap does. While this signal is very specific to how the heap implements MVCC and its HOT optimization, we'll leave replacing that for another day. This optimization trades off some new overhead for the potential for more updates to use the HOT optimized path and avoid index and heap bloat. This should significantly improve update performance for tables with expression indexes, partial indexes, and GIN/GiST indexes on complex data types like JSONB and tsvector, while maintaining correct index semantics. Minimal additional overhead due to type-specific equality checking should be washed out by the benefits of updating indexes fewer times. One notable trade-off is that there are more calls to FormIndexDatum() as a result. Caching these might reduce some of that overhead, but not all. This lead to the change in the frequency for expressions in the spec update test to output notice messages, but does not impact correctness.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.