PT-2237 - fixed PostgreSQL DB logs collection (Reopen)#1099
PT-2237 - fixed PostgreSQL DB logs collection (Reopen)#1099
Conversation
This fix ensures `pt-k8s-debug-collector` collects postgreSql databse logs which is stored in `/pgdata/<cluster_name>/pg_log`. A test `TestIndividualFiles` was refactored, and modified for a new case with new feature.
This refactor includes replacing all of the `kubectl` cli calls with golang sdk for k8s. Additionaly dumper now has new structure, new logger, tar file path controll, and multithreaded approach for downloading and exporting files form multiple pods.
…7_fix_pg_log_collector
…7_fix_pg_log_collector
…cona-toolkit into PT-2237_fix_pg_log_collector
svetasmirnova
left a comment
There was a problem hiding this comment.
pgo test fails:
=== RUN TestCollectorRunner/Operator_pgo/TestIndividualFiles/Resource_pgo
main_test.go:401:
Error Trace: /home/sveta/src/percona/percona-toolkit/src/go/pt-k8s-debug-collector/main_test.go:401
/home/sveta/go/pkg/mod/github.com/stretchr/testify@v1.11.1/suite/suite.go:115
Error: Received unexpected error:
exit status 2
Test: TestCollectorRunner/Operator_pgo/TestIndividualFiles/Resource_pgo
main_test.go:403:
Error Trace: /home/sveta/src/percona/percona-toolkit/src/go/pt-k8s-debug-collector/main_test.go:403
/home/sveta/go/pkg/mod/github.com/stretchr/testify@v1.11.1/suite/suite.go:115
Error: Preprocessor Check
Test: TestCollectorRunner/Operator_pgo/TestIndividualFiles/Resource_pgo
Messages: test pgo_pg_logs_exist
resource:pgo
namespace: pgo
output is not as expected
Output:
Wanted: [.log]
=== RUN TestCollectorRunner/Operator_pgo/TestIndividualFiles/Resource_auto
main_test.go:401:
Error Trace: /home/sveta/src/percona/percona-toolkit/src/go/pt-k8s-debug-collector/main_test.go:401
/home/sveta/go/pkg/mod/github.com/stretchr/testify@v1.11.1/suite/suite.go:115
Error: Received unexpected error:
exit status 2
Test: TestCollectorRunner/Operator_pgo/TestIndividualFiles/Resource_auto
main_test.go:403:
Error Trace: /home/sveta/src/percona/percona-toolkit/src/go/pt-k8s-debug-collector/main_test.go:403
/home/sveta/go/pkg/mod/github.com/stretchr/testify@v1.11.1/suite/suite.go:115
Error: Preprocessor Check
Test: TestCollectorRunner/Operator_pgo/TestIndividualFiles/Resource_auto
Messages: test pgo_pg_logs_exist
resource:auto
namespace: pgo
output is not as expected
Output:
Wanted: [.log]
Since we will deprecate pgo soon as per EOL note here: https://docs.percona.com/percona-operator-for-postgresql/1.6.0/, you can simply remove this test.
There was a problem hiding this comment.
Pull request overview
This PR fixes PostgreSQL log collection in the Kubernetes debug collector by enabling extraction of all files from a log directory (where filenames vary at runtime), and updates the integration test to validate that at least one PostgreSQL .log file is included in the produced dump archive.
Changes:
- Added support for dumping an entire directory of “individual files” (used for PostgreSQL
pg_log) instead of only known, fixed filenames. - Added pg log directory mappings for PG v1 (“pgo”) and PG v2 (“pgv2”) resources.
- Refactored
TestIndividualFilesto run namespace-specific assertions and added checks for PostgreSQL log presence.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/go/pt-k8s-debug-collector/main_test.go | Refactors individual-file tests and adds checks that PG .log files exist in the archive. |
| src/go/pt-k8s-debug-collector/dumper/resources.go | Registers PostgreSQL log directories (pg_log) for PG v1 and PG v2 dump collection. |
| src/go/pt-k8s-debug-collector/dumper/kube_utils.go | Introduces streaming tar execution helpers used for directory/file extraction. |
| src/go/pt-k8s-debug-collector/dumper/individual_files.go | Reworks individual file dumping to support env substitution and directory extraction via tar streaming. |
| src/go/pt-k8s-debug-collector/dumper/dumper.go | Normalizes CR type handling when matching pods and dumping summaries/files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (d *Dumper) executeInPodStream(ctx context.Context, command []string, pod corev1.Pod, container string, stdin io.Reader) (io.ReadCloser, bytes.Buffer, error) { | ||
| stdinFlag := stdin != nil | ||
| var stderr bytes.Buffer | ||
|
|
There was a problem hiding this comment.
executeInPodStream returns stderr as a bytes.Buffer value while the goroutine passed &stderr to exec.StreamWithContext may still be writing to it. Copying/returning a bytes.Buffer concurrently with writes is a data race and can also cause callers to see incomplete/empty stderr. Consider allocating stderr as *bytes.Buffer (or io.Writer) and returning that pointer instead of copying the buffer.
There was a problem hiding this comment.
I've removed err buffer completely. Error logging inside a streaming gorutine will be enough.
| defer pw.Close() | ||
|
|
||
| if err := exec.StreamWithContext(ctx, remotecommand.StreamOptions{ | ||
| Stdin: stdin, | ||
| Stdout: pw, | ||
| Stderr: &stderr, | ||
| Tty: false, | ||
| }); err != nil && !errors.Is(err, context.Canceled) { | ||
| log.Errorf("error while streaming files from pod: %s", err.Error()) | ||
| } |
There was a problem hiding this comment.
In the streaming goroutine, exec.StreamWithContext errors are only logged and the pipe is closed with pw.Close(), so callers may see a clean EOF and treat failures (e.g., missing dir/file, tar exit != 0) as success. Consider propagating the stream error via pw.CloseWithError(err) and/or returning an error channel / waiting for the goroutine so tarFromPod/callers can reliably detect failures.
| defer pw.Close() | |
| if err := exec.StreamWithContext(ctx, remotecommand.StreamOptions{ | |
| Stdin: stdin, | |
| Stdout: pw, | |
| Stderr: &stderr, | |
| Tty: false, | |
| }); err != nil && !errors.Is(err, context.Canceled) { | |
| log.Errorf("error while streaming files from pod: %s", err.Error()) | |
| } | |
| err := exec.StreamWithContext(ctx, remotecommand.StreamOptions{ | |
| Stdin: stdin, | |
| Stdout: pw, | |
| Stderr: &stderr, | |
| Tty: false, | |
| }) | |
| if err != nil && !errors.Is(err, context.Canceled) { | |
| log.Errorf("error while streaming files from pod: %s", err.Error()) | |
| _ = pw.CloseWithError(err) | |
| return | |
| } | |
| _ = pw.Close() |
| func (d *Dumper) getContainerEnvMap(ctx context.Context, namespace, podName, containerName string) (map[string]string, error) { | ||
| pod, err := d.clientSet.CoreV1().Pods(namespace).Get(ctx, podName, metav1.GetOptions{}) | ||
| if err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
getContainerEnvMap does a live Pods().Get(...) call per pod/container even though exportJob already includes a full corev1.Pod (with Spec.Containers[].Env) from the informer/lister. This adds avoidable API-server load during dumps; consider parsing env vars from job.Pod.Spec instead of re-fetching the Pod.
| dst := d.PodIndividualFilesPath( | ||
| job.Pod.Namespace, | ||
| job.Pod.Name, | ||
| path.Join(tarFolder, path.Base(hdr.Name)), |
There was a problem hiding this comment.
processDir flattens all extracted files into tarFolder using only path.Base(hdr.Name). If the directory contains nested subdirectories or duplicate basenames, files can be overwritten and structure is lost. Consider preserving the relative path from the tar header (e.g., clean hdr.Name and join it under tarFolder) to avoid collisions.
| dst := d.PodIndividualFilesPath( | |
| job.Pod.Namespace, | |
| job.Pod.Name, | |
| path.Join(tarFolder, path.Base(hdr.Name)), | |
| // Preserve the relative path from the tar header while ensuring it | |
| // cannot escape the intended destination directory. | |
| relPath := path.Clean(hdr.Name) | |
| // Normalize common tar prefixes like "./" | |
| relPath = strings.TrimPrefix(relPath, "./") | |
| // Prevent path traversal outside tarFolder by stripping leading "../" | |
| for strings.HasPrefix(relPath, "../") { | |
| relPath = strings.TrimPrefix(relPath, "../") | |
| } | |
| // Skip entries that do not resolve to a meaningful relative path | |
| if relPath == "" || relPath == "." { | |
| continue | |
| } | |
| dst := d.PodIndividualFilesPath( | |
| job.Pod.Namespace, | |
| job.Pod.Name, | |
| path.Join(tarFolder, relPath), |
| func (d *Dumper) processDir( | ||
| ctx context.Context, | ||
| job exportJob, | ||
| container, tarFolder, dir string, | ||
| ) error { | ||
|
|
||
| tr, rc, _, err := d.tarFromPod(ctx, job.Pod, container, "-C", dir, ".") | ||
| if err != nil { | ||
| return err |
There was a problem hiding this comment.
PR description mentions adding a getAllFilesFromDirectory function in dumper.go, but the implementation here introduces processDir/tarFromPod instead. Consider updating the PR description (or naming) to match the actual approach so future maintainers can find the relevant code quickly.
What was added:
To keep changes minimal, the existing test
TestIndividualFileswas modified to cover the new case.The test was refactored because the previous implementation couldn't handle files with variable names.
In
/pgdata/<cluster_name>/pg_log, logs are named according to the day/week, so filenames are not known at runtime.A new function
getAllFilesFromDirectorywas added todumper.go.Its purpose is the same as the existing
getIndividualFiles, but it supports extracting multiple files from a directory. It is useful because pg log filenames can vary, and there can be multiple log files.(
/libhas not been changed)