ADBDEV-8282: Fix inefficient sql when restoring statistics by KnightMurloc · Pull Request #133 · arenadata/gpbackup

KnightMurloc · 2025-09-18T12:15:48Z

Commit bbbd801 fixed inefficient SQL for restoring statistics. For new backups, this was done by replacing the IN operator with = in queries to delete existing statistics. For existing backups, this was done by enabling nested loop join for a specific range of gpbackup versions, the backups created with which contained inefficient SQL. Version checking was performed by checking the patchset number. However, gpbackup binaries shipped to customers do not contain the patchset number, which prevents this optimization from being activated.
This commit was reverted and split into two.
The first commit fixes the SQL for new backups by replacing the IN operator with = in queries to delete statistics.
The second commit fixes the issue for existing backups by enabling nested loop join in gprestore.

do not squash.

This patch fixed inefficient sql when restoring statistics by replacing the IN operator with =. To fix existing backups, the nested loop was enabled when restoring statistics. This was done only for a certain range of patchsets. But it turned out that the binaries that are supplied to customers do not have a patchset number, which is why this optimization is not activated. Therefore, this patch has been reworked in the next commits. This reverts commit bbbd801.

Statistics are restored in 3 stages. In the first step, we update reltuples in pg_class. In the second step, we delete statistics for a specific attribute in pg_statistic. And at the last stage, we insert new statistics for this attribute. At the second stage, to determine the attribute number, it is searched in pg_attribute. For this, a subquery and the IN operator were used. This led to an inefficient plan that contains a seq scan on pg_statistic. This, in turn, can significantly affect the speed of statistical restore. Fix this by replacing the IN operator with = in the attribute statistics deletion query. We can be sure that the subquery will return no more than one row, since pg_attribute has a unique index on the attrelid and attname attributes.

silent-observer · 2025-09-18T15:00:07Z

Third commit's description contains an invalid commit hash. I assume this is supposed to be the second commit's hash, however since the PR will be rebased, the commit hashes will change anyway. I suggest replacing the hash with "previous commit" instead.

Other than that, the patch seems good and doesn't have any performance degradation compared to the previous version.

Statistics backups created in gpbackup versions starting from 1.30.5_arenadata16 have inefficient SQL for deleting existing statistics statements. For new backups, this issue has been fixed in the previous commit. This patch fixes the issue for existing backups by enabling nested loop join. This should lead to a more efficient plan with an index scan instead of a seq scan.

dkovalev1 · 2025-09-22T12:36:07Z

In general it looks good, performance remains the same, but:

Please format description to 80 columns
Perhaps by pathset you meant patchset?
Do you think all tests under "Restore statistic" are not relevant anymore?

KnightMurloc · 2025-09-22T12:43:29Z

In general it looks good, performance remains the same, but:

* Please format description to 80 columns

* Perhaps  by `pathset` you meant `patchset`?

* Do you think all tests under "Restore statistic" are not relevant anymore?

The PR will be merged via rabase, so it is not necessary to align its description to 80 characters.
Fixed.
These tests tested the enabling of nestedloop for certain versions of gpbackup. This logic has been removed, so the tests are also no longer relevant.

KnightMurloc added 2 commits September 18, 2025 18:37

KnightMurloc force-pushed the ADBDEV-8282 branch from c02f0a7 to e2ed65e Compare September 19, 2025 04:11

silent-observer approved these changes Sep 19, 2025

View reviewed changes

dkovalev1 approved these changes Sep 22, 2025

View reviewed changes

KnightMurloc merged commit ab9477f into master Sep 22, 2025
3 checks passed

KnightMurloc deleted the ADBDEV-8282 branch September 22, 2025 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADBDEV-8282: Fix inefficient sql when restoring statistics#133

ADBDEV-8282: Fix inefficient sql when restoring statistics#133
KnightMurloc merged 3 commits intomasterfrom
ADBDEV-8282

KnightMurloc commented Sep 18, 2025 •

edited

Loading

Uh oh!

silent-observer commented Sep 18, 2025

Uh oh!

dkovalev1 commented Sep 22, 2025

Uh oh!

KnightMurloc commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KnightMurloc commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

silent-observer commented Sep 18, 2025

Uh oh!

dkovalev1 commented Sep 22, 2025

Uh oh!

KnightMurloc commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KnightMurloc commented Sep 18, 2025 •

edited

Loading