cf 5556 #16

gburd · 2025-10-24T18:01:53Z

No description provided.

This commit refactors the interaction between heap_tuple_update(), heap_update(), and simple_heap_update() to improve code organization and flexibility. The changes are functionally equivalent to the previous implementation and have no performance impact. The primary motivation is to prepare for upcoming modifications to how and where modified attributes are identified during the update path, particularly for catalog updates. As part of this reorganization, the handling of replica identity key attributes has been adjusted. Instead of fetching a second copy of the bitmap during an update operation, the caller is now required to provide it. This change applies to both heap_update() and heap_delete(). No user-visible changes.

Refactor executor update logic to determine which indexed columns have actually changed during an UPDATE operation rather than leaving this up to HeapDetermineColumnsInfo in heap_update. This enables the comparison to happen without taking a lock on the page and opens the door to reuse in other code paths. Because heap_update now requires the caller to provide the modified indexed columns simple_heap_update has become a tad more complex. It is frequently called from CatalogTupleUpdate which either updates heap tuples via their form or using heap_modify_tuple. In both cases the caller does know the modified set of attributes, but sadly those attributes are lost before being provided to simple_heap_update. Due to that the "simple" path has to retain the HeapDetermineColumnsInfo logic of old (for now). In order for that to work it was necessary to split the (overly large) heap_update call itself up. This moves up into simple_heap_update and heap_tuple_update a bit of what existed in heap_update itself. Ideally this will be cleaned up once CatalogTupleUpdate paths are all recording modified attributes correctly, when that happens the "simple" path can be simplified again. ExecCheckIndexedAttrsForChanges replaces HeapDeterminesColumnsInfo and tts_attr_equal replaces heap_attr_equal changing the test for equality when calling into heap_tuple_update (but not simple_heap_update). In the past we used datumIsEqual(), essentially a binary comparison using memcmp(), now the comparison code in tts_attr_equal uses type-specific equality function when available and falls back to datumIsEqual() when not. This change in equality testing has some intended implications and opens the door for more HOT updates (foreshadowing). For instance, indexes with collation information allowing more HOT updates when the index is specified to be case insensitive. This change forced some logic changes in execReplication on the update paths is now it is required to have knowledge of the set of attributes that are both changed and referenced by indexes. Luckilly, the this is available within calls to slot_modify_data() where LogicalRepTupleData is processed and has a set of updated attributes. In this case rather than using ExecCheckIndexedAttrsForChanges we can preseve what slot_modify_data() identifies as the modified set and then intersect that with the set of indexes on the relation and get the correct set of modified indexed attributes required on heap_update().

In execIndexing on updates we'd like to pass a hint to the indexing code when the indexed attributes are unchanged. This commit replaces the now redundant code in index_unchanged_by_update with the same information found earlier in the update path.

Currently, PostgreSQL conservatively prevents HOT (Heap-Only Tuple) updates whenever any indexed column changes, even if the indexed portion of that column remains identical. This is overly restrictive for expression indexes (where f(column) might not change even when column changes) and partial indexes (where both old and new tuples might fall outside the predicate). Finally, index AMs play no role in deciding when they need a new index entry on update, the rules regarding that are based on binary equality and the HEAP's model for MVCC and related HOT optimization. Here we open that door a bit so as to enable more nuanced control over the process. This enables index AMs that require binary equality (as is the case for nbtree) to do that without disallowing type-specific equality checking for other indexes. This patch introduces several improvements to enable HOT updates in these cases: Add amcomparedatums() callback to IndexAmRoutine. This allows index access methods like GIN to provide custom logic for comparing datums by extracting and comparing index keys rather than comparing the raw datums. GIN indexes now implement gincomparedatums() which extracts keys from both datums and compares the resulting key sets. Also, as mentioned earlier nbtree implements this API and uses datumIsEqual() for equality so that the manner in which it deduplicates TIDs on page split doesn't have to change. This is not a required API, when not implemented the executor will compare TupleTableSlot datum for equality using type-specific operators and take into account collation so that an update from "Apple" to "APPLE" on a case insensitive index can now be HOT. ExecWhichIndexesRequireUpdates() is re-written to find the set of modified indexed attributes that trigger new index tuples on updated. For partial indexes, this checks whether both old and new tuples satisfy or fail the predicate. For expression indexes, this uses type-specific equality operators to compare computed values. For extraction-based indexes (GIN/RUM) that implement amcomparedatums() it uses that. Importantly, table access methods can still signal using TU_Update if all, none, or only summarizing indexes should be updated. While the executor layer now owns determining what has changed due to an update and is interested in only updating the minimum number of indexes possible, the table AM can override that while performing table_tuple_update(), which is what heap does. While this signal is very specific to how the heap implements MVCC and its HOT optimization, we'll leave replacing that for another day. This optimization trades off some new overhead for the potential for more updates to use the HOT optimized path and avoid index and heap bloat. This should significantly improve update performance for tables with expression indexes, partial indexes, and GIN/GiST indexes on complex data types like JSONB and tsvector, while maintaining correct index semantics. Minimal additional overhead due to type-specific equality checking should be washed out by the benefits of updating indexes fewer times. One notable trade-off is that there are more calls to FormIndexDatum() as a result. Caching these might reduce some of that overhead, but not all. This lead to the change in the frequency for expressions in the spec update test to output notice messages, but does not impact correctness.

gburd force-pushed the cf-5556 branch 7 times, most recently from e777a6e to 650f621 Compare November 1, 2025 17:22

gburd force-pushed the cf-5556 branch 6 times, most recently from 16e0007 to 331cd76 Compare November 7, 2025 20:56

gburd force-pushed the cf-5556 branch 4 times, most recently from 9558f42 to 05c4e60 Compare November 16, 2025 18:53

gburd force-pushed the cf-5556 branch 4 times, most recently from ae8af13 to 9f584af Compare November 19, 2025 18:18

gburd force-pushed the cf-5556 branch from 9f584af to b142c27 Compare November 26, 2025 20:22

gburd added 5 commits December 1, 2025 11:22

dev setup v16

a48896d

gburd force-pushed the cf-5556 branch from b142c27 to 94e88c7 Compare December 1, 2025 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cf 5556 #16

cf 5556 #16

Uh oh!

gburd commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cf 5556 #16

Are you sure you want to change the base?

cf 5556 #16

Uh oh!

Conversation

gburd commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants