You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .agents/skills/debug-openshell-cluster/SKILL.md
+9Lines changed: 9 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,6 +138,15 @@ kubectl -n openshell rollout status statefulset/openshell
138
138
139
139
Look for failed installs, unexpected values, missing namespace, wrong image tag, TLS settings that do not match the registered endpoint, and scheduling failures.
140
140
141
+
For HA or PostgreSQL-backed installs, also check the service-binding Secret and
142
+
bundled PostgreSQL workload:
143
+
144
+
```bash
145
+
kubectl -n openshell get secret -l app.kubernetes.io/instance=openshell
146
+
kubectl -n openshell get statefulset,pod,pvc -l app.kubernetes.io/instance=openshell
Copy file name to clipboardExpand all lines: .agents/skills/helm-dev-environment/SKILL.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
name: helm-dev-environment
3
-
description: Start up, tear down, and configure the local Kubernetes development environment for OpenShell. Uses k3d (Docker-backed k3s) + Skaffold + Helm. Covers cluster lifecycle, optional add-ons (Keycloak OIDC, Envoy Gateway), and port mappings. Trigger keywords - local k8s, local cluster, k3d, skaffold, helm dev, start cluster, stop cluster, tear down cluster, delete cluster, create cluster, helm:k3s, helm:skaffold, local dev environment, dev cluster, k8s dev, envoy gateway local, keycloak local.
3
+
description: Start up, tear down, and configure the local Kubernetes development environment for OpenShell. Uses k3d (Docker-backed k3s) + Skaffold + Helm. Covers cluster lifecycle, optional add-ons (Keycloak OIDC, Envoy Gateway), HA testing, and port mappings. Trigger keywords - local k8s, local cluster, k3d, skaffold, helm dev, start cluster, stop cluster, tear down cluster, delete cluster, create cluster, helm:k3s, helm:skaffold, local dev environment, dev cluster, k8s dev, envoy gateway local, keycloak local, high availability, HA.
4
4
---
5
5
6
6
# Helm Dev Environment
@@ -65,6 +65,10 @@ generates mTLS secrets on first install. Envoy Gateway opt-in; see the Optional
65
65
66
66
The gateway Service uses ClusterIP. Access is via Envoy Gateway (port `8080`) or `kubectl port-forward`.
67
67
68
+
**HA test deploy** (two gateway replicas + bundled PostgreSQL): uncomment
69
+
`#- ci/values-high-availability.yaml` in `deploy/helm/openshell/skaffold.yaml`,
70
+
then run `mise run helm:skaffold:run` or `mise run helm:skaffold:dev`.
71
+
68
72
### TLS behaviour
69
73
70
74
`ci/values-skaffold.yaml` sets `server.disableTls: true`, so Skaffold-based deploys run
@@ -198,6 +202,7 @@ mise run helm:k3s:status
198
202
|`deploy/helm/openshell/ci/values-skaffold.yaml`| Dev overrides (image pull policy, TLS disabled for local Skaffold) |
instructions="Open $workflow_link, find the run for commit \`$short_pr\`, and click **Re-run all jobs** to execute with the label set."
71
78
fi
72
-
body="Label \`$LABEL_NAME\` applied for \`$short_pr\`. $instructions The run will execute $suite_summary after building the required $build_summary once. The matching required CI gate status on this PR will flip green automatically once the run finishes."
79
+
body="Label \`$LABEL_NAME\` applied for \`$short_pr\`. $instructions The run will execute $suite_summary after building the required $build_summary once. $status_summary"
Copy file name to clipboardExpand all lines: CI.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,13 +10,15 @@ PR CI that runs on NVIDIA self-hosted runners uses NVIDIA's copy-pr-bot. The bot
10
10
11
11
`Branch Checks` run automatically after copy-pr-bot mirrors the PR. `Required CI Gates` posts PR-head statuses that verify the mirror exists, is current, and ran the expected push-based workflows. E2E suites are opt-in because they are more expensive and publish temporary images.
12
12
13
-
Two opt-in labels enable the long-running E2E suites:
13
+
Three opt-in labels enable the long-running E2E suites:
14
14
15
15
-`test:e2e` runs the standard E2E suite in `Branch E2E Checks`
16
16
-`test:e2e-gpu` runs GPU E2E in `Branch E2E Checks`
17
+
-`test:e2e-kubernetes` runs Kubernetes E2E with the HA Helm overlay
18
+
(`replicaCount: 2` and bundled PostgreSQL) in `Branch E2E Checks`
17
19
18
-
When both labels are present, `Branch E2E Checks` builds the shared gateway and supervisor images once and fans out all enabled suites in parallel.
19
-
The `OpenShell / E2E` and `OpenShell / GPU E2E` required statuses are evaluated from separate suite result jobs inside that workflow, so the expensive GPU suite stays independently gated.
20
+
When multiple labels are present, `Branch E2E Checks` builds the shared gateway and supervisor images once and fans out all enabled suites in parallel.
21
+
The `OpenShell / E2E` and `OpenShell / GPU E2E` required statuses are evaluated from separate suite result jobs inside that workflow. `test:e2e-kubernetes` is optional while HA behavior is under active iteration: failures are visible in the workflow run but do not publish a required CI gate status.
20
22
21
23
The GitHub ruleset should require the `OpenShell / ...` statuses published by `Required CI Gates`, not the push-triggered workflow jobs directly.
22
24
@@ -69,7 +71,7 @@ Flow:
69
71
70
72
1. Open the PR. copy-pr-bot mirrors it to `pull-request/<N>` automatically.
71
73
2. The mirror push runs `Branch Checks` automatically. `Required CI Gates` keeps the PR blocked until the mirror exists, matches the PR head SHA, and the required push-based workflow succeeds. The first `Branch E2E Checks` run only resolves metadata and skips expensive jobs unless an E2E label is already set.
72
-
3. A maintainer applies `test:e2e`and/or `test:e2e-gpu`. `E2E Label Help` posts a comment with a link to the existing gated workflow run.
74
+
3. A maintainer applies `test:e2e`, `test:e2e-gpu`, and/or `test:e2e-kubernetes`. `E2E Label Help` posts a comment with a link to the existing gated workflow run.
73
75
4. The maintainer opens that link and clicks **Re-run all jobs**. This time `pr_metadata` sees the label and the build/E2E jobs run.
74
76
5. When the run finishes, the matching `OpenShell / ...` gate status flips to green automatically.
75
77
6. New commits push to the mirror automatically and re-trigger `Branch Checks` plus any labeled E2E jobs in `Branch E2E Checks`.
@@ -108,7 +110,7 @@ The bot's full administrator documentation is internal to NVIDIA. The only comma
108
110
| File | Role |
109
111
|---|---|
110
112
|`.github/workflows/branch-checks.yml`| Required non-E2E PR checks. Triggers on `push: pull-request/[0-9]+`. |
111
-
|`.github/workflows/branch-e2e.yml`| Opt-in standardand GPU E2E. Triggers on `push: pull-request/[0-9]+` and runs jobs selected by `test:e2e` / `test:e2e-gpu`. |
113
+
|`.github/workflows/branch-e2e.yml`| Opt-in standard, GPU, and Kubernetes HA E2E. Triggers on `push: pull-request/[0-9]+` and runs jobs selected by `test:e2e`, `test:e2e-gpu`, or `test:e2e-kubernetes`. |
112
114
|`.github/workflows/helm-lint.yml`| Helm chart validation. Triggers on `push: pull-request/[0-9]+` and skips lint jobs unless Helm inputs changed. |
113
115
|`.github/actions/pr-gate/action.yml`| Composite action that resolves PR metadata and verifies the required label is set. |
114
116
|`.github/actions/pr-merge-base/action.yml`| Composite action that resolves and fetches the merge-base commit for `pull-request/<N>` push workflows. |
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -302,4 +302,4 @@ DCO sign-off is separate from cryptographic commit signing. CI requires signing
302
302
303
303
## CI
304
304
305
-
How PR CI runs, the `test:e2e` / `test:e2e-gpu` labels, copy-pr-bot, and commit-signing setup are documented in [CI.md](CI.md).
305
+
How PR CI runs, the `test:e2e`, `test:e2e-gpu`, and `test:e2e-kubernetes` labels, copy-pr-bot, and commit-signing setup are documented in [CI.md](CI.md).
0 commit comments