Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions actions/shared/collect_diag_info/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# 🚀 Collect Diagnostics Action

GitHub Action to collect logs, resource manifests, and artifacts from a Kubernetes namespace
and upload them as a workflow artifact for post-deploy or post-failure investigation.

---

## Features

- Creates an `artifacts/` working directory for all collected files
- Collects pod list, per-pod YAML manifests, namespace events, PVC and PV information
- Captures container logs for all containers in all pods, with a pending-retry loop
(up to 300 seconds) for containers that are not yet running or terminated
- Captures previous-run logs for containers that have restarted at least once
- Lists all resources (`kubectl get all`) and secret names in the namespace
- Optionally collects VictoriaMetrics Custom Resource YAML files for monitoring pipelines
- Generates a timestamped artifact name automatically, or uses a caller-supplied name
- Uploads all collected files as a single GitHub Actions artifact

---

## 📌 Inputs

| Name | Description | Required | Default |
| --- | --- | --- | --- |
| `namespace` | Kubernetes namespace to collect diagnostics from. | Yes | - |
| `service_branch` | Branch or tag name of the service being tested. Used in the auto-generated artifact name — slashes are replaced with underscores, whitespace is stripped. Ignored when `artifact_name` is provided. | No | - |
| `artifact_name` | Custom base name for the uploaded artifact. When provided, the final name is `<artifact_name>_<timestamp>`. When empty, the name is generated as `<job>_<namespace>_<branch>_<version>_artifacts_<timestamp>`. | No | `""` |
| `version` | Matrix dimension value (e.g. `${{ matrix.service_image }}`) appended as a suffix in the auto-generated artifact name. Has no effect when `artifact_name` is provided. | No | `""` |
| `monitoring_pipeline` | When `true`, dumps VictoriaMetrics CRs (`VMAlertManager`, `VMAlert`, `VMAgent`, `VMSingle`, `VMAuth`, `PlatformMonitoring`) as YAML files into `artifacts/`. | No | `false` |

---

## 📌 Outputs

This action produces no step outputs. All collected data is uploaded as a workflow artifact
whose name is derived from the inputs (see Additional Information).

---

## How it works

The action gathers diagnostics in the following order, with most steps running under
`if: always()` so they execute even when previous steps failed:

1. **Monitoring resources** (only when `monitoring_pipeline: true`, runs under `if: always()`): exports six
VictoriaMetrics and `PlatformMonitoring` CRs as YAML files into `artifacts/`.
2. **Pod list**: runs `kubectl get pods` and saves output to
`artifacts/<namespace>_get_pods.txt`.
3. **Pod YAML manifests**: saves each pod's full YAML to
`artifacts/pod_yamls/<pod-name>.yaml`. Failures write a `_FAILED.txt` marker instead.
4. **Namespace events**: saves `kubectl events` output to
`artifacts/<namespace>_get_events.txt`.
5. **PVC YAML**: saves all PersistentVolumeClaim manifests to
`artifacts/<namespace>_get_pvc_yaml.yaml`.
6. **PV list**: filters cluster-wide PVs for those bound to the namespace and saves to
`artifacts/<namespace>_get_pv.txt`.
7. **Container logs**: for every container in every pod, collects current logs to
`artifacts/logs/<pod>__<container>.log`. Containers not yet in `running` or `terminated`
state are queued and retried every 5 seconds until ready or the 300-second hard timeout
is reached. Containers with `restartCount > 0` also get their previous logs saved to
`artifacts/logs/<pod>__<container>__previous.log`.
8. **All resources**: prints `kubectl get all` to the workflow log (not saved to a file).
9. **Secrets list**: prints `kubectl get secrets` to the log (names only, not values).
10. **Artifact upload**: generates a timestamped artifact name and uploads the entire
`artifacts/` folder via `actions/upload-artifact@v4`.

---

## Additional Information

### Artifact naming

When `artifact_name` is empty (the default), the artifact name is built as:

```text
<github.job>_<namespace>_<service_branch>_<version>_artifacts_<YYYYMMDDHHmmssSSS>
```

- `service_branch` has whitespace stripped and `/` replaced with `_`.
- `<version>_` is included only when the `version` input is non-empty.
- The timestamp uses UTC with millisecond precision.

When `artifact_name` is provided, the name is:

```text
<artifact_name>_<YYYYMMDDHHmmssSSS>
```

Slashes in `artifact_name` are replaced with `_`.

### Container log collection

The log-collection step uses a two-pass approach:

- **First pass**: iterates all pods and containers. Running or terminated containers have
logs collected immediately. All others are added to a pending list.
- **Retry loop**: pending containers are re-checked every 5 seconds. The loop exits when all
pending containers have been processed or the 300-second timeout is reached; timed-out
containers are skipped with a log message.

Log files for the first pass use a double-underscore separator (`pod__container.log`).
Log files collected in the retry loop use a single-underscore separator (`pod_container.log`).

### Monitoring YAML files

When `monitoring_pipeline: true`, the following files are written to `artifacts/`:

| File | Resource |
| --- | --- |
| `vmalertmanagers.yaml` | `VMAlertManager` |
| `vmalerts.yaml` | `VMAlert` |
| `vmagent.yaml` | `VMAgent` |
| `vmsingles.yaml` | `VMSingle` |
| `vmauths.yaml` | `VMAuth` |
| `PlatformMonitoring.yaml` | `PlatformMonitoring platformmonitoring` |

All commands use `|| true` — missing CRDs do not fail the step.

---

## Usage

```yaml
name: Collect Diagnostics

on:
workflow_dispatch:

jobs:
diagnostics:
runs-on: ubuntu-latest
permissions:
contents: read
steps:
- name: Checkout pipeline repo
uses: actions/checkout@v4
with:
path: qubership-test-pipelines

- name: Collect diagnostics
if: always()
uses: Netcracker/qubership-test-pipelines/actions/shared/collect_diag_info@905b88900dc8c14291eaeff4eddcf4d4f734aee1 # v1.9.0
with:
namespace: consul
service_branch: ${{ github.ref_name }}
```

---

## Notes

- All diagnostic steps run with `if: always()` — they collect data even when earlier steps
in the caller's job have failed. Call this action with `if: always()` as well to ensure
it runs after a failed Helm deploy or verification step.
- The `artifacts/` directory is created but **not** cleared by this action. If a prior step
in the same job already populated `artifacts/`, those files will be included in the upload.
- The `Get Secrets` step logs secret names to the workflow run log (not values). Anyone with
read access to the workflow run can see the secret names in the namespace.
- The `version` input is used as a suffix in the auto-generated artifact name. Pass the
relevant matrix dimension (e.g. `${{ matrix.service_image }}`) explicitly — omit the input
to leave the suffix empty.
- Pin to a full 40-character commit SHA with the release tag as a trailing comment, e.g.
`@905b88900dc8c14291eaeff4eddcf4d4f734aee1 # v1.9.0`. The SHA is the immutable pin; the
comment shows which release it points to. Tags alone are mutable and can be moved —
acceptable only when callers explicitly want auto-updates within a minor version. Never
use `@main` or short SHAs.
11 changes: 9 additions & 2 deletions actions/shared/collect_diag_info/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ inputs:
required: false
default: ""

version:
description: |
Matrix version value used as a suffix in the auto-generated artifact name.
Pass any matrix dimension that identifies this run (e.g. matrix.version or matrix.service_image).
required: false
default: ""

monitoring_pipeline:
description: |
Only for Monitoring. Enable monitoring CRs checks (true) or disable them (false)
Expand All @@ -33,7 +40,7 @@ runs:
run: mkdir -p artifacts

- name: Get monitoring specific resources
if: ${{ inputs.monitoring_pipeline == 'true' }}
if: always() && inputs.monitoring_pipeline == 'true'
shell: bash
env:
NAMESPACE: ${{ inputs.namespace }}
Expand Down Expand Up @@ -236,7 +243,7 @@ runs:
env:
SERVICE_BRANCH: ${{ inputs.service_branch }}
INPUT_ARTIFACT_NAME: ${{ inputs.artifact_name }}
MATRIX_VERSION: ${{ matrix.version }}
MATRIX_VERSION: ${{ inputs.version }}
GITHUB_JOB: ${{ github.job }}
NAMESPACE: ${{ inputs.namespace }}
Comment on lines 244 to 248
run: |
Expand Down
134 changes: 134 additions & 0 deletions actions/shared/create_ingress/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# 🚀 Create Ingress-Controller

GitHub Action that installs ingress-nginx into a `kind` cluster and configures CoreDNS
with a wildcard `*.testdomain.local` DNS zone pointing to the ingress controller's ClusterIP.

---

## Features

- Deploys the official `ingress-nginx` controller for `kind` clusters
- Patches the `nginx` IngressClass to be the cluster-wide default
- Configures CoreDNS to resolve all `*.testdomain.local` hostnames to the nginx ClusterIP,
enabling in-cluster DNS-based routing without external DNS
- Restarts CoreDNS pods and waits up to 180 seconds for them to become ready
- Prints pod and event status for the `ingress-nginx` namespace after setup

---

## 📌 Inputs

This action has no inputs.

---

## 📌 Outputs

This action produces no outputs.

---

## How it works

The action performs four cluster-level changes, all of which are permanent for the lifetime
of the `kind` cluster:

1. **Install ingress-nginx**: applies the official `kind` ingress manifest from
`https://kind.sigs.k8s.io/examples/ingress/deploy-ingress-nginx.yaml` and patches the
`nginx` IngressClass to be the default class. Any Ingress resource without an explicit
`ingressClassName` will be handled by this controller.
2. **Configure wildcard DNS**: reads the ClusterIP of the `ingress-nginx-controller` service
in the `ingress-nginx` namespace, then replaces the `kube-system/coredns` ConfigMap with
an updated Corefile. The new Corefile adds a `testdomain.local:53` zone that answers any
query matching `*.testdomain.local` with the nginx ClusterIP:

```text
testdomain.local:53 {
template ANY ANY {
match .*\.testdomain\.local\.$
answer "{{ .Name }} 60 IN A <nginx-clusterip>"
fallthrough
}
}
```

3. **Restart CoreDNS**: deletes all pods with label `k8s-app=kube-dns` in `kube-system` to
force a reload of the new ConfigMap. Kubernetes recreates them automatically.
4. **Wait for readiness**: runs `kubectl wait --for=condition=Ready` on the new CoreDNS pods
with a 180-second timeout. The action fails if pods do not become ready in time.

After setup, any workload in the cluster can reach the ingress controller by pointing an
Ingress resource to a `*.testdomain.local` hostname — no external DNS configuration is needed.

---

## Additional Information

### testdomain.local DNS zone

The wildcard zone is cluster-internal only. `*.testdomain.local` hostnames resolve to the
nginx ClusterIP, which is not routable outside the cluster. This is designed for in-cluster
test scenarios where services communicate via Ingress resources using fixed hostnames.

The CoreDNS ConfigMap is replaced (not patched) using `kubectl replace`. The full Corefile,
including the original `cluster.local` zone configuration, is embedded in the action and
printed to the workflow log during the replace step.

### kind cluster requirement

The ingress manifest is sourced directly from `https://kind.sigs.k8s.io/examples/ingress/deploy-ingress-nginx.yaml`.
This action is designed exclusively for `kind` clusters and will not work correctly on other
cluster types without modification.

### No waiting for ingress-nginx readiness

The action does not wait for the `ingress-nginx-controller` deployment to become ready before
reading its ClusterIP. The ClusterIP is assigned at Service creation time and is available
immediately after `kubectl apply`, but the controller pods themselves may still be starting.
If subsequent steps depend on the controller being fully functional, add a `kubectl wait`
step after this action.

---

## Usage

```yaml
name: Setup Kind Ingress

on:
workflow_dispatch:

jobs:
setup:
runs-on: ubuntu-latest
permissions:
contents: read
steps:
- name: Checkout pipeline repo
uses: actions/checkout@v4
with:
path: qubership-test-pipelines

- name: Create ingress controller
uses: Netcracker/qubership-test-pipelines/actions/shared/create_ingress@905b88900dc8c14291eaeff4eddcf4d4f734aee1 # v1.9.0
```

---

## Notes

- This action modifies the `kube-system/coredns` ConfigMap using `kubectl replace` — it
overwrites any previous custom CoreDNS configuration. Run it before any step that adds
additional CoreDNS rules.
- The full Corefile content (including the nginx ClusterIP) is printed to the workflow log
during the CoreDNS patch step.
- Designed for `kind` clusters only. The ingress manifest URL is pinned to the official
`kind.sigs.k8s.io` example and may not suit production or other cluster types.
- If CoreDNS pods do not reach `Ready` within 180 seconds, the action fails and subsequent
steps will not run. Check `kubectl get pods -n kube-system` and `kubectl describe` output
in the workflow log for the root cause.
- Pin to a full 40-character commit SHA with the release tag as a trailing comment, e.g.
`@905b88900dc8c14291eaeff4eddcf4d4f734aee1 # v1.9.0`. The SHA is the immutable pin; the
comment shows which release it points to. Tags alone are mutable and can be moved —
acceptable only when callers explicitly want auto-updates within a minor version. Never
use `@main` or short SHAs.
Loading
Loading