This repository was archived by the owner on Oct 2, 2025. It is now read-only.
ADBDEV-8282: Fix inefficient sql when restoring statistics#133
Merged
KnightMurloc merged 3 commits intomasterfrom Sep 22, 2025
Merged
ADBDEV-8282: Fix inefficient sql when restoring statistics#133KnightMurloc merged 3 commits intomasterfrom
KnightMurloc merged 3 commits intomasterfrom
Conversation
This patch fixed inefficient sql when restoring statistics by replacing the IN operator with =. To fix existing backups, the nested loop was enabled when restoring statistics. This was done only for a certain range of patchsets. But it turned out that the binaries that are supplied to customers do not have a patchset number, which is why this optimization is not activated. Therefore, this patch has been reworked in the next commits. This reverts commit bbbd801.
Statistics are restored in 3 stages. In the first step, we update reltuples in pg_class. In the second step, we delete statistics for a specific attribute in pg_statistic. And at the last stage, we insert new statistics for this attribute. At the second stage, to determine the attribute number, it is searched in pg_attribute. For this, a subquery and the IN operator were used. This led to an inefficient plan that contains a seq scan on pg_statistic. This, in turn, can significantly affect the speed of statistical restore. Fix this by replacing the IN operator with = in the attribute statistics deletion query. We can be sure that the subquery will return no more than one row, since pg_attribute has a unique index on the attrelid and attname attributes.
|
Third commit's description contains an invalid commit hash. I assume this is supposed to be the second commit's hash, however since the PR will be rebased, the commit hashes will change anyway. I suggest replacing the hash with "previous commit" instead. Other than that, the patch seems good and doesn't have any performance degradation compared to the previous version. |
Statistics backups created in gpbackup versions starting from 1.30.5_arenadata16 have inefficient SQL for deleting existing statistics statements. For new backups, this issue has been fixed in the previous commit. This patch fixes the issue for existing backups by enabling nested loop join. This should lead to a more efficient plan with an index scan instead of a seq scan.
c02f0a7 to
e2ed65e
Compare
silent-observer
approved these changes
Sep 19, 2025
|
In general it looks good, performance remains the same, but:
|
Author
|
dkovalev1
approved these changes
Sep 22, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Commit bbbd801 fixed inefficient SQL for restoring statistics. For new backups, this was done by replacing the IN operator with = in queries to delete existing statistics. For existing backups, this was done by enabling nested loop join for a specific range of gpbackup versions, the backups created with which contained inefficient SQL. Version checking was performed by checking the patchset number. However, gpbackup binaries shipped to customers do not contain the patchset number, which prevents this optimization from being activated.
This commit was reverted and split into two.
The first commit fixes the SQL for new backups by replacing the IN operator with = in queries to delete statistics.
The second commit fixes the issue for existing backups by enabling nested loop join in gprestore.
do not squash.