Headline: 3-node Kubernetes (kind) HA deployment by default #25
dshapi
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
v1.0.1 — May 2026
This release moves dev from a single-node kind cluster to a
production-shaped HA topology that mirrors the prod target one-for-one.
A single
./deploy/scripts/bootstrap-cluster.shbrings up:kind. No worker nodes — control-plane taints lifted on dev so
application pods can schedule cluster-wide.
automatic failover, streaming replication, and a
spm-db-rwService that always points to the current primary.
Flink checkpoints and other large blobs. Replaces Longhorn.
AuthorizationPolicies per service, and a single ingress gateway at
https://aispm.local(port 443, browser-trusted via mkcert).runsc) RuntimeClass as the default for customer-uploadedagent pods — per-pod kernel sandboxing for untrusted code.
localhost:5001, mirrored into everykind node so dev image rebuilds land in seconds.
The chart applies in 6 phased tiers (infra → data → data-init →
platform → compute → frontend), serially, with auto-recovery from
immutable-PVC errors, failed Jobs, and stale registry state.
A single canonical seeder (
scripts/seed_all.py) populates models,posture history, integrations, cases, alerts, and policies via one
idempotent K8s Job.
Bootstrap & operability
./deploy/scripts/bootstrap-cluster.sh. Helperscripts (
kind-cluster.sh,kind-databases-ha.sh,install-gvisor.sh,build-images.sh) are now invoked frombootstrap; never directly.
FORCE_DESTROY/FORCE_KEEP/FORCE_CREATEenv overrides for CI./tmp/bootstrap-xtrace-<pid>.logon bash 4.1+.pod templates, stale Docker registry containers, and racing
cert-manager Certificate reconcile loops.
.envkeys (Postgres password, Anthropic, Tavily, Groq, etc.)auto-merge into
platform-secretsevery bootstrap — no morere-entering them in the Integrations UI after a rebuild.
TLS / WSS
aispm-tls-certificate.yamlnow gates oningress.certManager, so cert-manager doesn't fight thebootstrap's mkcert Secret in dev.
re-upserts the mkcert Secret, restarts
istio-ingressgateway,verifies the wire issuer is
mkcert development CA. SafariWebSocket Secure connections now work end-to-end with no manual
steps.
Schema drift fixes (alembic 010 / 011 / 012)
agent_kindenum +agents.kindcolumn.agents.riskmigrated fromrisk_leveltomodel_risk_tier;model_risk_tierextended withlow/medium/critical.agents.policy_statusmigrated frompolicy_statusenum topolicy_coverage; values mapped (covered→full).model_providerextended withaws,azure,gcp,internal.database states.
Resource sizing
spm-apiships with explicit2Gimemory limits (was inheritingthe namespace LimitRange's
512Miand getting OOMKilled every~8 minutes).
db-seedJob memory bumped to2Gi.macOS dev caveat
gVisor's
runscdoesn't work on Docker Desktop's Linuxkit kernel —sandbox init crashes.
values.dev.yamloverridesagentRuntime.runtimeClassNameto""so agent pods userunconMac. On a Linux dev host or in prod the gVisor sandbox is enforced
unchanged.
This discussion was created from the release Headline: 3-node Kubernetes (kind) HA deployment by default.
Beta Was this translation helpful? Give feedback.
All reactions