Skip to content

Commit ace2a80

Browse files
committed
Merge remote-tracking branch 'upstream/main' into refactor/split-sandbox
Resolutions: - openshell-core: union new pub mods (denial, driver_mounts, provider_credentials, proto_struct); keep tonic features ["channel", "tls-native-roots"] and add tonic-prost + tokio. - openshell-sandbox: keep slim orchestrator (deps moved to leaves); promote provider env tuple to four components to carry dynamic_credentials; rely on supervisor-process for skill install, TLS state, netns creation, and supervisor hardening. - openshell-supervisor-process: union both test sets in process.rs; keep existing imports in ssh.rs and add is_supervisor_only_env_var; bump tonic to tls-native-roots; add the new mount-namespace prep call (prepare_supervisor_identity_mount_namespace_from_env) before apply_supervisor_startup_hardening in run.rs. - openshell-supervisor-network: take main's token_grant_injection.rs; relocate token_grant.rs and spiffe_endpoint.rs from openshell-sandbox into this crate, declare the modules, add reqwest and spiffe deps, and switch to openshell_ocsf::ctx::ctx(). - openshell-core: rewrite the four self-references openshell_core::proto:: -> crate::proto:: in grpc_client.rs and provider_credentials.rs. Signed-off-by: Radoslav Hubenov <rrhubenov@gmail.com>
2 parents 1ac626e + fb83d1a commit ace2a80

133 files changed

Lines changed: 14443 additions & 1012 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/skills/debug-openshell-cluster/SKILL.md

Lines changed: 33 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ Use gateway metadata, deployment values, or the user's setup notes to identify t
5454
|---|---|
5555
| Docker | Gateway process logs, Docker daemon health, sandbox containers, image pulls. |
5656
| Podman | Podman socket, rootless networking, sandbox containers, image pulls. |
57-
| Kubernetes | Helm release, StatefulSet, service, secrets, sandbox pods, events. |
57+
| Kubernetes | Helm release, gateway workload, service, secrets, sandbox pods, events. |
5858
| VM | VM driver logs, rootfs availability, host virtualization support. |
5959

6060
### Step 3: Check Docker-Backed Gateways
@@ -131,12 +131,17 @@ Common findings:
131131
```bash
132132
helm -n openshell status openshell
133133
helm -n openshell get values openshell
134-
kubectl -n openshell get statefulset,pod,svc,pvc
135-
kubectl -n openshell logs statefulset/openshell --tail=200
134+
kubectl -n openshell get deployment,statefulset,pod,svc,pvc
135+
kubectl -n openshell logs deployment/openshell -c openshell-gateway --tail=200
136+
kubectl -n openshell logs statefulset/openshell -c openshell-gateway --tail=200
137+
kubectl -n openshell rollout status deployment/openshell
136138
kubectl -n openshell rollout status statefulset/openshell
137139
```
138140

139-
Look for failed installs, unexpected values, missing namespace, wrong image tag, TLS settings that do not match the registered endpoint, and scheduling failures.
141+
Use the log and rollout commands for the workload kind that exists in the
142+
release. Look for failed installs, unexpected values, missing namespace, wrong
143+
image tag, TLS settings that do not match the registered endpoint, and
144+
scheduling failures.
140145

141146
For HA or PostgreSQL-backed installs, also check the external database Secret
142147
referenced by `server.externalDbSecret` and the PostgreSQL workload if the test
@@ -169,15 +174,34 @@ Secrets but does not create the sandbox JWT signing Secret.
169174

170175
If the gateway exits with `failed to read sandbox JWT signing key from
171176
/etc/openshell-jwt/signing.pem`, verify that `openshell-jwt-keys` contains
172-
`signing.pem`, `public.pem`, and `kid`, and that the StatefulSet mounts the
177+
`signing.pem`, `public.pem`, and `kid`, and that the gateway workload mounts the
173178
`sandbox-jwt` secret at `/etc/openshell-jwt`. The sandbox JWT mount is required
174179
even when local Helm values disable TLS.
175180

181+
If `server.providerTokenGrants.spiffe.enabled=true`, the gateway should still
182+
render `[openshell.gateway.gateway_jwt]` and mount the `sandbox-jwt` Secret.
183+
SPIRE is used only by sandbox pods for dynamic provider token grants. Verify
184+
that SPIRE is installed, the CSI driver is available, and the Kubernetes driver
185+
config includes `provider_spiffe_workload_api_socket_path`:
186+
187+
```bash
188+
helm -n openshell get values openshell | grep -E 'providerTokenGrants|workloadApiSocketPath'
189+
kubectl get pods -A | grep -E 'spire|spiffe'
190+
kubectl -n openshell get configmap openshell-config -o yaml | grep provider_spiffe_workload_api_socket_path
191+
```
192+
193+
Sandbox pods using provider token grants should have an
194+
`openshell.io/sandbox-id` annotation, an `openshell.ai/managed-by=openshell`
195+
label, supervisor env vars `OPENSHELL_K8S_SA_TOKEN_FILE` and
196+
`OPENSHELL_PROVIDER_SPIFFE_WORKLOAD_API_SOCKET`, plus both the projected
197+
`openshell-sa-token` volume and the `spiffe-workload-api` CSI volume.
198+
176199
Check the image references currently used by the gateway deployment:
177200

178201
```bash
202+
kubectl -n openshell get deployment openshell -o jsonpath="{.spec.template.spec.containers[*].image}{\"\n\"}{.spec.template.spec.containers[*].env[?(@.name==\"OPENSHELL_SUPERVISOR_IMAGE\")].value}{\"\n\"}"
179203
kubectl -n openshell get statefulset openshell -o jsonpath="{.spec.template.spec.containers[*].image}{\"\n\"}{.spec.template.spec.containers[*].env[?(@.name==\"OPENSHELL_SUPERVISOR_IMAGE\")].value}{\"\n\"}"
180-
helm -n openshell get values openshell | grep -E 'repository|tag|supervisorImage'
204+
helm -n openshell get values openshell | grep -E 'repository|tag|supervisorImage|workload'
181205
```
182206

183207
The gateway image built from `deploy/docker/Dockerfile.gateway` and the scratch supervisor image built from `deploy/docker/Dockerfile.supervisor` should use the same build tag in branch and E2E deploys. A stale supervisor image can make sandbox behavior lag behind gateway policy or proto changes.
@@ -220,7 +244,8 @@ If the gateway is healthy but sandbox creation fails:
220244
```bash
221245
kubectl -n openshell get pods
222246
kubectl -n openshell get events --sort-by=.lastTimestamp | tail -n 50
223-
kubectl -n openshell logs statefulset/openshell --tail=200
247+
kubectl -n openshell logs deployment/openshell -c openshell-gateway --tail=200
248+
kubectl -n openshell logs statefulset/openshell -c openshell-gateway --tail=200
224249
```
225250

226251
Check the configured sandbox namespace:
@@ -268,7 +293,7 @@ openshell logs <sandbox-name>
268293
| Docker or Podman sandbox never registers | Wrong callback endpoint or supervisor startup failure | Gateway logs and sandbox container logs |
269294
| Docker GPU e2e fails before GPU sandbox comparison | NVIDIA CDI specs are missing or Docker has not discovered them | `docker info --format '{{json .DiscoveredDevices}}'`, `/etc/cdi`, `/var/run/cdi`, `nvidia-cdi-refresh.service` |
270295
| Kubernetes gateway pod pending | PVC unbound, taint, selector, or insufficient resources | `kubectl -n openshell describe pod <pod>` |
271-
| Kubernetes gateway pod crash loops | Missing secret, bad DB URL, bad TLS config | `kubectl -n openshell logs statefulset/openshell` |
296+
| Kubernetes gateway pod crash loops | Missing secret, bad DB URL, bad TLS config | `kubectl -n openshell logs deployment/openshell -c openshell-gateway` or `kubectl -n openshell logs statefulset/openshell -c openshell-gateway` |
272297
| CLI TLS error | Local mTLS bundle does not match server cert/CA | Check `~/.config/openshell/gateways/<name>/mtls/` |
273298
| Image pull failure | Gateway or sandbox image cannot be pulled | Runtime events and image pull credentials |
274299
| `K8s namespace not ready` with `envoy-gateway-openshell.yaml: the server could not find the requested resource` | Optional Gateway API manifest was applied without Envoy Gateway CRDs, or k3s Helm controller startup exceeded the namespace wait | Apply `deploy/kube/manifests/envoy-gateway-openshell.yaml` manually only after Envoy Gateway is installed and `grpcRoute` is enabled |

.agents/skills/helm-dev-environment/SKILL.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,23 @@ To remove Keycloak:
178178
mise run keycloak:k8s:teardown
179179
```
180180

181+
### SPIRE / SPIFFE Provider Token Grants
182+
183+
Skaffold can install SPIRE with the SPIFFE hardened Helm charts. To activate
184+
SPIFFE JWT-SVIDs for dynamic provider token grants:
185+
186+
1. Uncomment the `spire-crds` and `spire` releases in `deploy/helm/openshell/skaffold.yaml`
187+
2. Uncomment `#- ci/values-spire.yaml` in the OpenShell release values files
188+
3. Redeploy: `mise run helm:skaffold:run`
189+
190+
`ci/values-spire-stack.yaml` configures the local SPIRE trust domain as
191+
`openshell.local` and adds a `ClusterSPIFFEID` that maps sandbox pod
192+
annotations to `spiffe://openshell.local/openshell/sandbox/<sandbox-id>`.
193+
OpenShell mounts the SPIFFE CSI Workload API socket at
194+
`/spiffe-workload-api/spire-agent.sock` into sandbox pods for provider token
195+
grants. Supervisor-to-gateway authentication remains on the Kubernetes
196+
ServiceAccount bootstrap and gateway-minted sandbox JWT path.
197+
181198
---
182199

183200
## Cluster Lifecycle (suspend/resume)
@@ -206,6 +223,8 @@ mise run helm:k3s:status
206223
| `deploy/helm/openshell/ci/values-gateway.yaml` | Envoy Gateway GRPCRoute + Gateway overlay |
207224
| `deploy/helm/openshell/ci/values-high-availability.yaml` | HA test overlay (`replicaCount: 2` with external PostgreSQL Secret) |
208225
| `deploy/helm/openshell/ci/values-keycloak.yaml` | Keycloak OIDC overlay |
226+
| `deploy/helm/openshell/ci/values-spire.yaml` | SPIFFE/SPIRE provider token grant overlay |
227+
| `deploy/helm/openshell/ci/values-spire-stack.yaml` | SPIRE hardened chart values for local dev |
209228
| `deploy/helm/openshell/ci/values-tls-disabled.yaml` | Lint-only: TLS + auth disabled (reverse-proxy edge termination) |
210229
| `deploy/kube/manifests/envoy-gateway-openshell.yaml` | GatewayClass for Envoy Gateway (`mise run helm:gateway:apply`) |
211230
| `tasks/scripts/helm-k3s-local.sh` | k3d cluster create/delete/start/stop/status |

.agents/skills/openshell-cli/SKILL.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -495,8 +495,9 @@ openshell gateway remove local # Remove local registrati
495495
```bash
496496
# Inspect a Kubernetes Helm release and gateway pod
497497
helm -n openshell status openshell
498-
kubectl -n openshell get pods,svc
499-
kubectl -n openshell logs statefulset/openshell --tail=100
498+
kubectl -n openshell get deployment,statefulset,pods,svc
499+
kubectl -n openshell logs deployment/openshell -c openshell-gateway --tail=100
500+
kubectl -n openshell logs statefulset/openshell -c openshell-gateway --tail=100
500501
```
501502
502503
For Docker, Podman, and VM-backed gateways, inspect the gateway process or container logs and the selected runtime directly.

.github/workflows/snap-package.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -77,19 +77,19 @@ jobs:
7777
sudo snap install snapcraft --classic
7878
7979
- name: Download prebuilt CLI binary
80-
uses: actions/download-artifact@3e5f45b2cfb9172054b40e67214ff5f5447ce83dd # v8
80+
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
8181
with:
8282
name: cli-linux-${{ matrix.arch }}
8383
path: prebuilt/cli
8484

8585
- name: Download prebuilt gateway binary
86-
uses: actions/download-artifact@3e5f45b2cfb9172054b40e67214ff5f5447ce83dd # v8
86+
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
8787
with:
8888
name: gateway-binary-linux-${{ matrix.arch }}
8989
path: prebuilt/gateway
9090

9191
- name: Download prebuilt sandbox binary
92-
uses: actions/download-artifact@3e5f45b2cfb9172054b40e67214ff5f5447ce83dd # v8
92+
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
9393
with:
9494
name: supervisor-binary-linux-${{ matrix.arch }}
9595
path: prebuilt/sandbox
@@ -119,7 +119,7 @@ jobs:
119119
cp prebuilt/gateway/openshell-gateway snap/prebuilt/openshell-gateway
120120
cp prebuilt/sandbox/openshell-sandbox snap/prebuilt/openshell-sandbox
121121
122-
cp deploy/snap/bin/openshell-gateway-wrapper snap/prebuilt/
122+
cp tasks/scripts/snap-gateway-wrapper.sh snap/prebuilt/openshell-gateway-wrapper
123123
cp LICENSE snap/prebuilt/
124124
cp README.md snap/prebuilt/
125125

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ ocsf_emit!(event);
160160
- Always use [Conventional Commits](https://www.conventionalcommits.org/) format for commit messages
161161
- Format: `<type>(<scope>): <description>` (scope is optional)
162162
- Common types: `feat`, `fix`, `docs`, `chore`, `refactor`, `test`, `ci`, `perf`
163-
- Sign each commit for DCO compliance (`git commit -s`, or include a `Signed-off-by: Name <email>` trailer)
163+
- Sign off on each commit for DCO compliance. Use the `--signoff` option to `git commit` to add the `Signed-off-by` footer to ensure the user's configured email address is used.
164164
- Never mention Claude or any AI agent in commits (no author attribution, no Co-Authored-By, no references in commit messages)
165165

166166
## Pre-commit

0 commit comments

Comments
 (0)