Skip to content

DO NOT MERGE: CI-3898: Add rules, build and allow-failure tests for adb-8.x#1455

Open
Daniil290620 wants to merge 32 commits intoadb-8.xfrom
ADBDEV-7257-ci
Open

DO NOT MERGE: CI-3898: Add rules, build and allow-failure tests for adb-8.x#1455
Daniil290620 wants to merge 32 commits intoadb-8.xfrom
ADBDEV-7257-ci

Conversation

@Daniil290620
Copy link

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Communicate in the mailing list if needed
  • Pass make installcheck
  • Review a PR in return to support the community

higuoxing and others added 27 commits April 2, 2025 19:31
This is the last patch for replacing pygresql with psycopg2 in Greenplum. This patch mainly targets the gpMgmt tools.

Benefits for replacing pygresql with psycopg2.
- Psycopg2 is maintained actively we have encountered bugs that haven't been fixed by the upstream yet, e.g., https://github.com/greenplum-db/gpdb/pull/13953.
- Psycopg2 is provided by Rocky Linux and Ubuntu. That is to say, we don't need to vendor it ourselves.
- Last but not least, we got a chance to clean up leacy codes during the removal process, e.g., https://github.com/greenplum-db/gpdb/pull/15983.

After this patch, we need to do the following things.
- Add psycopg2 as a dependency of the rpm/deb package.
- Remove the pygresql source code tarball from the gpdb repo.
- Tidy up READMEs and requirements.txt files.

---------

Co-authored-by: Chen Mulong <chenmulong@gmail.com>
Co-authored-by: Xiaoxiao He <hxiaoxiao@vmware.com>
Co-authored-by: zhrt123 <hzhang2@vmware.com>
Co-authored-by: Piyush Chandwadkar <pchandwadkar@vmware.com>
Co-authored-by: Praveen Kumar <36772398+kpraveen457@users.noreply.github.com>
* Fully support Rocky9 platform for GP7
* Align the format of exttable outfile to Rocky9
* Fix compile_gpdb_clients_windows failed
* Fix gpMgmt tests failure to support rocky9/oel9/rhel9
* Fix gpMgmt/bin/gpload_test/gpload2 tests on Rocky9

Authored-by: Ning Wu <ningw@vmware.com>
The PX layer in pgcrypto is handling digest padding on its own uniformly
for all backend implementations. Starting with OpenSSL 3.0.0, DecryptUpdate
doesn't flush the last block in case padding is enabled so explicitly
disable it as we don't use it.

This will be backpatched to all supported version once there is sufficient
testing in the buildfarm of OpenSSL 3.

Reviewed-by: Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/FEF81714-D479-4512-839B-C769D2605F8A@yesql.se
Backpatch-through: 9.6
OpenSSL 3 introduced the concept of providers to support modularization,
and moved the outdated ciphers to the new legacy provider. In case it's
not loaded in the users openssl.cnf file there will be a lot of regress
test failures, so add alternative outputs covering those.

Also document the need to load the legacy provider in order to use older
ciphers with OpenSSL-enabled pgcrypto.

This will be backpatched to all supported version once there is sufficient
testing in the buildfarm of OpenSSL 3.

Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/FEF81714-D479-4512-839B-C769D2605F8A@yesql.se
Backpatch-through: 9.6
The following routines are called within pgcrypto when handling digests
but there were no checks for failures:
- EVP_MD_CTX_size (can fail with -1 as of 3.0.0)
- EVP_MD_CTX_block_size (can fail with -1 as of 3.0.0)
- EVP_DigestInit_ex
- EVP_DigestUpdate
- EVP_DigestFinal_ex

A set of elog(ERROR) is added by this commit to detect such failures,
that should never happen except in the event of a processing failure
internal to OpenSSL.

Note that it would be possible to use ERR_reason_error_string() to get
more context about such errors, but these refer mainly to the internals
of OpenSSL, so it is not really obvious how useful that would be.  This
is left out for simplicity.

Per report from Coverity.  Thanks to Tom Lane for the discussion.

Backpatch-through: 9.5
This has previously not been a problem (that anyone ever reported),
but in future OpenSSL versions (3.0.0), where legacy ciphers are/can
be disabled, this is the place where this is reported.  So we need to
catch the error here, otherwise the higher-level functions would
return garbage.  The nearby encryption code already handled errors
similarly.

Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/9e9c431c-0adc-7a6d-9b1a-915de1ba3fe7@enterprisedb.com
Backpatch-through: 9.6
When execute inject fault `checkpoint_after_redo_calculated` or
'checkpoint_control_file_updated' in a newly promoted mirror node,
connection failed error occurs due to the fatal log 'mirror is being
promoted'. It means when connection state is MIRROR_READY but the
role change to primary during promoting, the connection will be
declined to avoid confusion.

Wait for segment promotion finished and accept connection before
inject fault.

Co-authored-by: Xing Guo higuoxing@gmail.com
This issue is environment-sensitive, where the SSL tests could fail in
various way by feeding on defaults provided by sslcert, sslkey,
sslrootkey, sslrootcert, sslcrl and sslcrldir coming from a local setup,
as of ~/.postgresql/ by default.  Horiguchi-san has reported two
failures, but more advanced testing from me (aka inclusion of garbage
SSL configuration in ~/.postgresql/ for all the configuration
parameters) has showed dozens of failures that can be triggered in the
whole test suite.

History has showed that we are not good when it comes to address such
issues, fixing them locally like in dd87799, and such problems keep
appearing.  This commit strengthens the entire test suite to put an end
to this set of problems by embedding invalid default values in all the
connection strings used in the tests.  The invalid values are prefixed
in each connection string, relying on the follow-up values passed in the
connection string to enforce any invalid value previously set.  Note
that two tests related to CRLs are required to fail with certain pre-set
configurations, but we can rely on enforcing an empty value instead
after the invalid set of values.

Reported-by: Kyotaro Horiguchi
Reviewed-by: Andrew Dunstan, Daniel Gustafsson, Kyotaro Horiguchi
Discussion: https://postgr.es/m/20220316.163658.1122740600489097632.horikyota.ntt@gmail.com
backpatch-through: 10
backport of commit f0d2c65 to releases 11 and 12

This means the SSL tests will fail on machines with extremely old
versions of OpenSSL, but we don't know of anything trying to run such
tests. The ability to build is not affected.

Discussion: https://postgr.es/m/73e646d3-8653-1a1c-0a39-739872b591b0@dunslane.net
…250)

In the previous https://github.com/greenplum-db/gpdb/pull/14144, we finished the support of compression transmission for readable gpfdist external table. In this PR, I developed the code for zstd compression transmission for the writable gpfdist external table.

The pr involves three aspects of changes. The first part is adding decompression code for gpfdist. The second part is adding compression code for gpdb. Other modifications mainly involve the regression test for writable gpfdist external table.

Finally, to use the zstd compression to transfer data for a writable external table, we need to use the flag --compress to turn on the compression transmission for gpfdist.
fix a compiler warning.
rename master to coordinator in gpfdist

**Signed-off-by:** Yongtao Huang <yongtaoh@vmware.com>
In an extreme case, two gpfdist occupy one port(ipv4/ipv6). The reason for this problem includes two aspects. The first one is that 'bind' is not mutually exclusive. And another is that when listening to a port fail, gpfdist will try the same port with a different protocol(ipv4 or ipv6).

So the PR fixes the problem by changing the handling for failed listening behavior.
Previously, header_callback(), used to get and parse HTTP headers, took only
first status header from all header blocks to show it to user. The problem is
that we can have multiple messages with multiple header blocks. In case of
error, only first (successful) status is shown to user, which is not
informative. This patch fixes header_callback(), which now process all status
headers.
…#14698)

Compression is very useful for data transmission with low bandwidth and high efficiency. But compression is a computationally intensive task and may increase the cost of time in the compression stage. To improve the efficiency of data transmission with compression, multi-thread data compression and transmission are introduced in this PR.

The feature makes each segment served by a single thread. In the main thread, requests are read and further parsed into specific data structures. The data is read into a buffer, then a sub-thread is created to compress data and send the compressed data.

To avoid conflict between threads, each request can be only served by one sub-thread. What's more, all public resources are accessed by a main thread to avoid waiting caused by thread synchronization. To reduce the extra cost incurred by thread switch, we limit the max number of threads according to the number of CPU cores.

After the benchmark test in a local development environment, we proved two promotions:

    gpfdist saves about 60%~70% bandwidth occupation when compression is open.
    multi-thread compression has more efficient transmission.

If you want multi-thread transmission with compression, both flag '--compress' and '--multi_thread' is required. To gain more efficiency, we recommend setting the max-length flag (-m) to be 1M.

Co-authored by: @water32
This commit fix some typos in gpfdist, such as:
- a request just start -> a request just start**s** 
- ret will be a error number -> ret will be **an** error number
- It is for compress data in buffer. -> It is for compress**ing** data in buffer.
et al. And delete outdated comment.

Signed-off-by: Yongtao Huang <yongtaoh@vmware.com>
…7X) (#15724)

The default value for session clean up timeout is 300s, but in heavy
workload, this default value is not long enough to wait for segment
next POST request to arrive. Then it will cause "400 invalid request
due to wrong sequence number" failure.

So this commit add one option '-k' to set this value for gpfdist.
The default is 300, the valid range would be (300..86400).
…fication. (#15980)

To simplify the ssl authentication of gpfdist, we provide one more configuration option to gpfdist and changed the gpdb side SSL identity authentication behavior when verify_gpfdist_cert=off.

The new command flag is ssl_verify_peer. If the flag is 'on', gpfdist will be forced to check the identity of clients, and CA certification file (root.crt) is necessary to be laid in the SSL certification directory.

If the flag is 'off', even the GUC verify_gpfdists_cert=on, gpfdist will not check the identity of clients, and it doesn't matter whether the CA certification file exists on the gpfdist side. And SSL certificate files and SSL private key files on the gpdb segments' side are no longer necessary. This greatly simplifies the configuration burden when using the gpfdists protocol for data transfer.

Note that The default value of ssl_verify_peer is 'on'.

Another change is regarding the changes to gpdb identity authentication when verify_gpfdist_cert=off. When verify_gpfdist_cert=off, the CA certification file is unnecessary for gpdb segments.

Finally, the existence checks for the certification files and private key files on the GPDB segment are no longer mandatory. This means that we can only determine if the files are configured correctly based on the error message received during the SSL connection.

The table below demonstrates the effectiveness of using these two types of indicators.:

    verify_gpfdists_cert = on, ssl_verify_peer=on: Certification files, CA certification files, and private key files are necessary for both gpfdist and gpdb segments.

    verify_gpfdists_cert = on, ssl_verify_peer=off: Certification file and private key file for gpfdist are necessary, CA certification files are necessary for gpdb segments.

    verify_gpfdists_cert = off, ssl_verify_peer=on: Certification files and private key files are necessary for gpdb segments, while CA certification files are unnecessary. A certification file, CA certification file and private key file are necessary for gpfdist.

    verify_gpfdists_cert = off, ssl_verify_peer=off: Certification file and private key file are necessary for gpfdist. SSL related files are not necessary for gpdb segments.
Gpdb is going to support rocky9/rhel9/oel9. In this operation system, the default openssl version is 3.*, which doesn't support tls 1.0 and tls 1.1. Thus, we have to do some adjustments for gpfdist regression tests.
OpenSSL 3.0 does not allow to use self-signed certificates with IP address as
host. Adding alternative IP address or hostname requires to generate
certificates via config file ('-config' option). All additional IPs or hosts
should be written in the "alt_names" section.

To avoid generating config file for OpenSSL script uses "localhost" instead of
127.0.0.1.

Patch changes host "127.0.0.1" to "localhost" in gpfdist's ssl test and in the
generate_certs.sh script. Such changes successfully work with OpenSSL 1.1.1.
This is the last patch for replacing pygresql with psycopg2 in Greenplum. This patch mainly targets the gpMgmt tools.

Benefits for replacing pygresql with psycopg2.
- Psycopg2 is maintained actively we have encountered bugs that haven't been fixed by the upstream yet, e.g., https://github.com/greenplum-db/gpdb/pull/13953.
- Psycopg2 is provided by Rocky Linux and Ubuntu. That is to say, we don't need to vendor it ourselves.
- Last but not least, we got a chance to clean up leacy codes during the removal process, e.g., https://github.com/greenplum-db/gpdb/pull/15983.

After this patch, we need to do the following things.
- Add psycopg2 as a dependency of the rpm/deb package.
- Remove the pygresql source code tarball from the gpdb repo.
- Tidy up READMEs and requirements.txt files.

---------

Co-authored-by: Chen Mulong <chenmulong@gmail.com>
Co-authored-by: Xiaoxiao He <hxiaoxiao@vmware.com>
Co-authored-by: zhrt123 <hzhang2@vmware.com>
Co-authored-by: Piyush Chandwadkar <pchandwadkar@vmware.com>
Co-authored-by: Praveen Kumar <36772398+kpraveen457@users.noreply.github.com>
Attach gpdb cluster logs to allure report

This is partial cherry-pick from commit 85efddc, other parts were ported in
commit 1e1b6c8.
it is needed since gp-concourse-cluster-provisioner has changed
it's code to set up a cluster with cdw instead of mdw, change
standby_master to standby_coordinator, also it change its
MASTER_DIRECTORY to /data/gpdata/coordination from
/data/gpdata/master

[GPR-1052]

Authored-by: Shaoqi Bai <bshaoqi@vmware.com>
In some gpexpand tests, checks if the expand is interrupted after a set time,
but sometimes gpexpand may expand faster than the set time and the test will
fail.

This patch adds a 5-second sleep after the expansion of the first table to
stimulate a long expansion without increasing the amount of data. To add sleep,
a fault injector in the reindex_relation is used, which occurs during the
expansion of the table.
…ch) (#14978)

This pr can fix the bug reported in [issue 14644](https://github.com/greenplum-db/gpdb/issues/14644)

**Bug Detail:**
After analyzing a table with a dropped subclass, the value on the coordinator may differ from the segments. See the SQL result with bug below:
```
gpadmin=# CREATE TYPE test_type2 AS (a int, b text);
CREATE TYPE
gpadmin=# CREATE TABLE test_tbl2 OF test_type2;
CREATE TABLE
gpadmin=# CREATE TABLE test_tbl2_subclass () INHERITS (test_tbl2);
CREATE TABLE
gpadmin=# DROP TABLE test_tbl2_subclass;
DROP TABLE
gpadmin=# analyze;
ANALYZE
gpadmin=# select gp_segment_id, relhassubclass from pg_class where relname = 'test_tbl2';        
gp_segment_id | relhassubclass
---------------+----------------
            -1 | t
(1 row)
 
gpadmin=# select gp_segment_id, relhassubclass from gp_dist_random('pg_class') where relname = 'test_tbl2';
 gp_segment_id | relhassubclass
---------------+----------------
             2 | f
             1 | f
             0 | f
(3 rows)
```

**Correct Behavior:**
the value of `relhassubclass `on the coordinator should be the same as segments.
```
gpadmin=# DROP TABLE test_tbl2_subclass;
DROP TABLE
gpadmin=# analyze;
ANALYZE
gpadmin=# select gp_segment_id, relhassubclass from pg_class where relname = 'test_tbl2';        
gp_segment_id | relhassubclass
---------------+----------------
            -1 | f
(1 row)
 
gpadmin=# select gp_segment_id, relhassubclass from gp_dist_random('pg_class') where relname = 'test_tbl2';
 gp_segment_id | relhassubclass
---------------+----------------
             2 | f
             1 | f
             0 | f
(3 rows)
```
Signed-off-by: Yongtao Huang <yongtaoh@vmware.com>
@Daniil290620
Copy link
Author

bender build

@Daniil290620
Copy link
Author

bender build

Tao-T and others added 5 commits April 22, 2025 11:08
* Added GUCs to support JIT using ORCA costing model

1. CREATING NEW GUCs

Following GUCs were added for ORCA, specifically for JIT compilation. Now JIT flags can
be set using ORCA specific JIT GUCs or Planner specific JIT GUCs.

            1. optimizer_jit
            2. optimizer_jit_above_cost
            3. optimizer_jit_inline_above_cost
            4. optimizer_jit_optimize_above_cost

JIT compilation is triggered based on the cost of a plan. Earlier, the cost of plan
generated by ORCA/planner was compared with the following
JIT specific costing GUCs (defined for planner):

            1.  jit_above_cost
            2.  jit_optimize_above_cost
            3.  jit_inline_above_cost

Based on the certain conditions, JIT flags were set in the planned statement.
Thus, we did not have individual control for ORCA, as there were common GUCs for them.

Since the costing model of ORCA and Planner are different (Planner cost usually higher),
setting the JIT flags based on the common JIT costing GUCs could lead to false triggering of JIT.
To prevent such situations and gain control over triggering of JIT compilation for ORCA,
separate GUCs based on ORCA costing are created for ORCA.

2. Setting default values of GUCs

Based on the TPC-H and other runs, for queries which perform better with JIT,
following default values were finalized for GUCs.
The complete analysis is available at the JIRA link(GPQP-204).

            GUC NAME                               DEFAULT VALUE(plan cost)
            1. optimizer_jit_above_cost                  7500
            2. optimizer_jit_inline_above_cost           37500
            3. optimizer_jit_optimize_above_cost         37500

* combining compute_Jit_flag() for optimizer and planner

* Made format changes and addressed review comments

* Addressing review comments- reverted changes in pg_jit_available()

* Addressing review comments- variable name change

Updated variable name from "optimizer_jit"  to "optimizer_jit_enabled".

* Updating default values of JIT GUCs for ORCA

Estimated and assigned default values to JIT GUCs for ORCA. Based on the TPC-H and other runs,
for queries which perform better with JIT, following default values were finalized for GUCs.
The complete analysis is available at the JIRA link(GPQP-204).

            GUC NAME                               DEFAULT VALUE(plan cost)
            1. optimizer_jit_above_cost                  7500
            2. optimizer_jit_inline_above_cost           37500
            3. optimizer_jit_optimize_above_cost         37500

---------

Co-authored-by: nishant sharma <nishants4@nishants46M6RY.vmware.com>
For gpdb7 will only support python3 and we also update the
gpMgmt/requirements-dev.txt to remove the stick the modules to the python2
compatible versions introduced by the commit 68a8237.

[GPR-1153]

Co-authored-by: Anusha Shakarad <shakarada@vmware.com>
Co-authored-by: Ning Wu <ningw@vmware.com>
In the scenario "gprecoverseg should not give warning if pg_basebackup is
running for the up segments", after stopping the primary segments, there is no
waiting for switching to mirrors, and therefore the step "the user suspend the
walsender on the primary on content 2" fails.

This patch adds the "user can start transactions" step after stop primary
segments to wait for switching to mirrors. As is done in other scenarios.
Minor fixes for running tests on commit 32fba77
The Dockerfile and scripts for running behave tests are taken from adb-7.2.0.
The README.Ubuntu.bash required for container build has also been updated.
Additional packages and configuration flags have been added to Dockerfile.ubuntu
SSH key verification is disabled, which increases the stability of behave tests.
Disabled the inclusion of dependencies during the build in compile_gpdb.bash
because we don't use it.
In the script for running ORCA unit tests, the xerces build is disabled because
it is not required.
Fix the use of the old python version in mirrors_mgmt_utils.
In explain_format, fix passing the test with ORCA in tests with jit. Changes are
taken from 7e86a6b
@Daniil290620
Copy link
Author

bender build

8 similar comments
@Daniil290620
Copy link
Author

bender build

@Daniil290620
Copy link
Author

bender build

@Daniil290620
Copy link
Author

bender build

@Daniil290620
Copy link
Author

bender build

@Daniil290620
Copy link
Author

bender build

@Daniil290620
Copy link
Author

bender build

@Daniil290620
Copy link
Author

bender build

@Daniil290620
Copy link
Author

bender build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.