Skip to content

feat: embedded GraphQL gateway + Vue portal#166

Merged
mjudeikis merged 20 commits intomainfrom
portal
Mar 31, 2026
Merged

feat: embedded GraphQL gateway + Vue portal#166
mjudeikis merged 20 commits intomainfrom
portal

Conversation

@mjudeikis
Copy link
Copy Markdown
Contributor

@mjudeikis mjudeikis commented Mar 25, 2026

Summary

Embeds the kubernetes-graphql-gateway directly into the kedge hub process and ships a Vue 3 web portal that connects to it.

Changes

Hub: embedded GraphQL gateway

  • pkg/hub/graphql.go — starts the GraphQL listener and gateway in-process using the faroshq fork of kubernetes-graphql-gateway; registers routes under /graphql/api/clusters/{clusterName} on the hub's existing mux (no second port or ingress needed)
  • pkg/hub/server.go — wires startEmbeddedGraphQL when kcp is configured and graphql is enabled
  • pkg/hub/options.go — adds --graphql-enabled / --graphql-apiexport-endpoint-slice flags
  • pkg/hub/portal.go / portal_stub.go — serves the embedded portal SPA from /portal/ when built, or a placeholder when not
  • cmd/graphql/main.go — standalone graphql binary (optional, for debugging)

Portal: Vue 3 SPA

  • portal/ — Vue 3 + Vite + TypeScript frontend
    • OIDC auth flow (login / callback pages)
    • Edge list + detail pages with live status
    • Workload management (create, list, delete)
    • MCP server pages
    • Web terminal via WebSocket
    • GraphQL client using useGraphQL composable (talks to /graphql/ on the hub)
    • Dark/light theme toggle

Infrastructure

  • go.mod — adds github.com/platform-mesh/kubernetes-graphql-gateway with replace → github.com/faroshq/kubernetes-graphql-gateway v0.0.7
  • deploy/charts/kedge-hub/graphqlEnabled value, portal image bundling
  • Makefilemake portal target builds the Vue app and embeds it
  • .github/workflows/ — CI/goreleaser updated for portal build step

mjudeikis and others added 14 commits March 22, 2026 21:45
…header

- Add github.com/platform-mesh/kubernetes-graphql-gateway v0.0.7 to go.mod
  (replace directive points to faroshq fork which has APIExportEndpointSliceLogicalCluster
  and exported WorkspaceSchemaKubeconfigOverride fields required by pkg/hub/graphql.go)
- Downgrade go directive from 1.26 to 1.25 (CI constraint)
- Add license boilerplate to cmd/graphql/main.go
- Fix goimports formatting in pkg/apiurl/urls.go, pkg/hub/options.go, pkg/hub/portal_stub.go

Co-authored-by: Mangirdas Judeikis <mangirdas@judeikis.lt>
go.mod requires go 1.26, but CI workflows and Dockerfiles were still using
1.25, causing:
- lint: golangci-lint built with go1.25 refusing to run on go1.26 module
- e2e: Docker build failing at go mod download due to version mismatch

Changes:
- ci.yaml, e2e.yaml, goreleaser.yaml: go-version v1.25.0 -> v1.26.1
- Dockerfile.hub, Dockerfile.agent: golang:1.25 -> golang:1.26
- Makefile: golangci-lint v2.9.0 -> v2.11.4 (latest, compatible)
proxy.Director has been deprecated since Go 1.26.
Use proxy.Rewrite with *httputil.ProxyRequest instead.

Fixes golangci-lint SA1019 staticcheck warning.
…t to avoid Director/Rewrite conflict

httputil.NewSingleHostReverseProxy sets Director internally. Setting proxy.Rewrite
alongside it panics in Go 1.22+ with 'ReverseProxy must have exactly one of Director
or Rewrite set'. Use &httputil.ReverseProxy{Rewrite: ...} directly instead.

Fixes TestAgentCLIFlow/Agent/CLIFlow/kubeconfig_edge_is_usable CI failure.
@mjudeikis-bot mjudeikis-bot changed the title Portal feat: embedded GraphQL gateway + Vue portal Mar 25, 2026
mjudeikis-bot and others added 6 commits March 25, 2026 11:29
… timeout

The hub's kcp bootstrap (waitForWorkspaceReady) can take up to 60s.
The liveness probe fired at initialDelaySeconds=30 when the HTTP server
wasn't yet listening, causing kubelet to kill the pod and cancel the
bootstrap context.

Fix: introduce a delegatingHandler that lets the HTTP server start
immediately (serving /healthz 200 and /readyz 503-bootstrapping) before
bootstrap begins. Once bootstrap and full initialisation complete the
delegate is atomically swapped to the real router+kcp-proxy stack.
The /readyz returns 503 during bootstrap so the readiness gate works
correctly while the liveness gate stays satisfied throughout.

Co-authored-by: Mangirdas Judeikis <mangirdas@judeikis.lt>
The external-kcp e2e has been failing consistently because the kedge-hub
pod (which now includes the kubernetes-graphql-gateway dependency) takes
longer to initialize and pass readiness with the larger binary size and
additional deps pulled in by this PR.

Changes:
- Increase --wait-for-ready-timeout for external-kcp e2e from 20m to 30m
- Increase readiness probe initialDelaySeconds from 20s to 30s,
  periodSeconds from 5s to 10s, add failureThreshold=30 (5min window)
- Increase liveness probe failureThreshold from default 3 to 6
- Bump e2e workflow timeout-minutes from 60 to 75 for external-kcp job
- Bump E2E_TIMEOUT from 40m to 55m for external-kcp run
…mediately

The identity hash is assigned asynchronously by kcp after startup. When
the hub pod bootstraps against a freshly-deployed external kcp (via Helm
in e2e), the identity hash may not be set yet even though kcp's readiness
probe has passed. This caused Bootstrap to fail with 'tenancy.kcp.io
APIExport has no identity hash yet', triggering a pod crash-loop and
preventing the readiness probe from ever passing, leading to Helm install
timeout after 30m.

Fix: replace the one-shot Get+fail with a PollUntilContextTimeout (2s
interval, 3m timeout) that retries until the hash is populated.

Also increase waitForWorkspaceReady timeout from 60s to 3m to give kcp
sufficient time to process workspace creation in slower CI runners.
The external-kcp e2e has been consistently timing out at exactly 30m
during the kedge-hub Helm install. The hub pod's bootstrap sequence
(kcp workspace hierarchy creation, identity hash polling, API bindings)
combined with the readiness probe window can exceed 30 minutes on slower
CI runners.

Changes:
- Bump --wait-for-ready-timeout for external-kcp e2e from 30m to 45m
- Bump E2E_TIMEOUT from 55m to 70m to give adequate buffer
- Bump CI job timeout-minutes from 75 to 90 accordingly
Three bugs introduced by this PR:

1. kcpExternalPort 9443→8443: kcp-front-proxy ClusterIP listens on 8443.
   The wrong value (9443, which is the hub port) caused workspace URLs to
   use :9443, making all workspace phase checks time out (Initializing forever).

2. serveStaticToken transport: reverted from passthroughTransport+forward-token
   to p.transport+delete-auth-header. kcp has no static token auth configured
   so forwarding dev-token directly caused 401. Hub admin cert is the right
   credential for proxying to kcp.

3. SA4023 lint: registerPortalRoutes stub always returns non-nil error, so the
   'err != nil' comparison was always true. Added //nolint:staticcheck.

All external_kcp e2e tests pass locally (413s, PASS).

Co-authored-by: Mangirdas Judeikis <mangirdas@judeikis.lt>
… admin

The temp kubeconfig written for the GraphQL listener was serialising the
admin rest.Config credentials (bearer token or client cert). This meant
every GraphQL request ran against kcp with admin privileges regardless of
the caller's identity.

Strip all credentials from the kubeconfig — write only the server endpoint
and CA. Per-request authentication is already handled correctly via
utilscontext.SetToken, which injects the user's own bearer token (obtained
from login/OIDC) into each request context.

Co-authored-by: Mangirdas Judeikis <mangirdas@judeikis.lt>
@mjudeikis mjudeikis merged commit 8259dd7 into main Mar 31, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant