Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .github/resources/manifests/base/driver-plugin-cm-path.yaml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis is the fith instance of this file, please use a kustomize component, for example in the argo folder.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere 10 times but have it somewhere in the common base.

Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: ml-pipeline-driver-agent
data:
sidecar.container: |
name: driver-plugin
image: kind-registry:5000/driver:ci
imagePullPolicy: IfNotPresent
env:
- name: LOG_ACCESS_KEY
valueFrom:
secretKeyRef:
name: mlpipeline-minio-artifact
key: accesskey
- name: LOG_SECRET_KEY
valueFrom:
secretKeyRef:
name: mlpipeline-minio-artifact
key: secretkey
ports:
- containerPort: 8080
resources:
requests:
cpu: "0.1"
memory: "64Mi"
limits:
cpu: "0.5"
memory: "0.5Gi"
securityContext:
runAsNonRoot: true
runAsUser: 65534
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
volumeMounts:
- name: var-run-argo
mountPath: /kfp/log
readOnly: false
Copy link
Copy Markdown
Member

@juliusvonkohout juliusvonkohout May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: cache-env.yaml
target:
kind: Deployment
Expand Down
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,7 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: cache-env.yaml
target:
kind: Deployment
Expand Down
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: ml-pipeline-driver-agent
data:
sidecar.container: |
name: driver-plugin
image: kind-registry:5000/driver:ci
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
env:
- name: LOG_ACCESS_KEY
valueFrom:
secretKeyRef:
name: mlpipeline-minio-artifact
key: accesskey
- name: LOG_SECRET_KEY
valueFrom:
secretKeyRef:
name: mlpipeline-minio-artifact
key: secretkey
resources:
requests:
cpu: "0.1"
memory: "64Mi"
limits:
cpu: "0.5"
memory: "0.5Gi"
securityContext:
runAsNonRoot: true
runAsUser: 65534
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
volumeMounts:
- name: argo-workflows-agent-ca-certificates
mountPath: /kfp/certs
readOnly: true
- name: var-run-argo
mountPath: /kfp/log
readOnly: false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down
39 changes: 38 additions & 1 deletion .github/resources/scripts/collect-logs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,36 @@ function check_namespace {
return 0
}

function describe_argo_workflows {
local NAMESPACE=$1
echo "===== Argo Workflows Inspection ====="
for wf in $(kubectl get wf -n "$NAMESPACE" -o json | jq -r '.items[] | select(.status.phase=="Failed" or .status.phase=="Running") | .metadata.name'); do
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recommend proper long expressive human readable and pronounceable variable names

echo "Inspected workflow: $wf"
kubectl get wf "$wf" -n "$NAMESPACE" || true
pods=$(kubectl get po -n "$NAMESPACE" -l "workflows.argoproj.io/workflow=$wf" -o jsonpath='{.items[*].metadata.name}')
for pod in $pods; do
phase=$(kubectl get po "$pod" -n "$NAMESPACE" -o jsonpath='{.status.phase}')
echo "Inspect Pod: $pod, Status: $phase"
if [[ "$phase" != "Pending" && "$phase" != "Succeeded" ]]; then
echo " ---> $pod Logs:"
if [[ "$pod" == *-agent ]]; then
kubectl logs "$pod" -n "$NAMESPACE" -c driver-plugin || true
else
kubectl logs "$pod" -n "$NAMESPACE" || true
fi
fi
echo " ---> Describe $pod:"
if [[ "$phase" != "Succeeded" ]]; then
echo " ---> Describe:"
kubectl describe po "$pod" -n "$NAMESPACE"
fi
done
done
echo "===== Argo Workflows data ====="
kubectl get events -n "${NAMESPACE}" --field-selector involvedObject.kind=Workflow --sort-by='.metadata.creationTimestamp'
echo "==============================="
}

function display_pod_info {
local NAMESPACE=$1

Expand All @@ -52,7 +82,13 @@ function display_pod_info {
kubectl describe pod "${POD_NAME}" -n "${NAMESPACE}" | grep -A 100 Events || echo "No events found for pod ${POD_NAME}."

echo "----- LOGS -----"
kubectl logs "${POD_NAME}" -n "${NAMESPACE}" || echo "No logs found for pod ${POD_NAME}."
if [[ "${POD_NAME}" == *-agent* ]]; then
kubectl logs "${POD_NAME}" -n "${NAMESPACE}" -c driver-plugin || \
echo "No logs found for pod ${POD_NAME}."
else
kubectl logs "${POD_NAME}" -n "${NAMESPACE}" || \
echo "No logs found for pod ${POD_NAME}."
fi

echo "==========================="
echo ""
Expand All @@ -64,6 +100,7 @@ function display_pod_info {

if check_namespace "$NS"; then
display_pod_info "$NS"
describe_argo_workflows "$NS"
else
exit 0
fi
3 changes: 2 additions & 1 deletion .github/resources/scripts/kfp-readiness/wait_for_pods.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ def get_pod_statuses():
statuses = {}
for pod in pods.items:
pod_name = pod.metadata.name
if "system" not in pod_name:
# This filter is safe: 'ml-pipeline-persistenceagent-<guid>' will not be excluded and will be processed.
if not ("system" in pod_name or pod_name.endswith("-agent")):
pod_status = pod.status.phase
container_statuses = pod.status.container_statuses or []
ready = 0
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/api-server-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ jobs:
shell: bash
if: ${{ matrix.pod_to_pod_tls_enabled == 'true'}}
run: |
kubectl get secret kfp-api-tls-cert -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
kubectl get secret argo-workflows-agent-ca-certificates -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
echo "CA_CERT_PATH=${{ github.workspace }}/ca.crt" >> "$GITHUB_ENV"


Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/e2e-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ jobs:
shell: bash
if: ${{ matrix.pod_to_pod_tls_enabled == 'true'}}
run: |
kubectl get secret kfp-api-tls-cert -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
kubectl get secret argo-workflows-agent-ca-certificates -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
echo "CA_CERT_PATH=${{ github.workspace }}/ca.crt" >> "$GITHUB_ENV"
- name: Configure Input Variables
shell: bash
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/legacy-v2-api-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ jobs:
shell: bash
if: ${{ matrix.pod_to_pod_tls_enabled == 'true' }}
run: |
kubectl get secret kfp-api-tls-cert -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
kubectl get secret argo-workflows-agent-ca-certificates -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
echo "CA_CERT_PATH=${{ github.workspace }}/ca.crt" >> "$GITHUB_ENV"

- name: Forward MLMD port
Expand Down
2 changes: 1 addition & 1 deletion backend/Dockerfile.driver
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ RUN GO111MODULE=on go mod download

COPY . .

RUN GO111MODULE=on CGO_ENABLED=0 GOOS=linux go build -tags netgo -gcflags="${GCFLAGS}" -ldflags '-extldflags "-static"' -o /bin/driver ./backend/src/v2/cmd/driver/*.go
RUN GO111MODULE=on CGO_ENABLED=0 GOOS=linux go build -tags netgo -gcflags="${GCFLAGS}" -ldflags '-extldflags "-static"' -o /bin/driver ./backend/src/driver/*.go

FROM alpine:3.21

Expand Down
73 changes: 73 additions & 0 deletions backend/src/common/util/context_logger.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
package util

import (
"context"
"fmt"
"io"
"os"

"github.com/sirupsen/logrus"
)

type CtxKey string

const (
contextLoggerKey CtxKey = "driver_log_key"
)

func newFileLogger(logFile string) (*logrus.Logger, io.Closer, error) {
f, err := os.Create(logFile)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recommend proper long expressive human readable and pronounceable variable names

if err != nil {
return nil, nil, err
}

logger := logrus.New()
logger.Out = io.MultiWriter(os.Stdout, f)
logger.Formatter = &logrus.TextFormatter{}
return logger, f, nil
}

// WithExistingLogger For testing only
func WithExistingLogger(ctx context.Context, logger *logrus.Logger) context.Context {
return context.WithValue(ctx, contextLoggerKey, logger)
}

func WithLogger(ctx context.Context, logFile string) (context.Context, io.Closer, error) {
if ctx == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recommend proper long expressive human readable and pronounceable variable names

return nil, nil, fmt.Errorf(
"error during creation of the logger for logId: %v. ctx can not be nil",
logFile,
)
}

if GetLoggerFrom(ctx) != nil {
return nil, nil, fmt.Errorf("logger already exists in context")
}

logger, f, err := newFileLogger(logFile)
if err != nil {
return nil, nil, fmt.Errorf(
"error during creation of the logger for logId: %v details: %w",
logFile,
err,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recommend proper long expressive human readable and pronounceable variable names

)
}

ctx = context.WithValue(ctx, contextLoggerKey, logger)

return ctx, f, nil
}

func GetLoggerFrom(ctx context.Context) *logrus.Logger {
v := ctx.Value(contextLoggerKey)
if v == nil {
return nil
}

logger, ok := v.(*logrus.Logger)
if !ok {
return nil
}

return logger
}
Loading
Loading