Releases: dshapi/AI-SPM
Headline: 3-node Kubernetes (kind) HA deployment by default
v1.0.1 — May 2026
This release moves dev from a single-node kind cluster to a
production-shaped HA topology that mirrors the prod target one-for-one.
A single ./deploy/scripts/bootstrap-cluster.sh brings up:
- 3 control-plane Kubernetes nodes running on Docker Desktop via
kind. No worker nodes — control-plane taints lifted on dev so
application pods can schedule cluster-wide. - CNPG (CloudNativePG) Postgres with 1 primary + 2 replicas,
automatic failover, streaming replication, and aspm-db-rw
Service that always points to the current primary. - Bitnami Redis Sentinel with 3 nodes for cache + session HA.
- MinIO distributed object storage (4 servers, 4 disks) backing
Flink checkpoints and other large blobs. Replaces Longhorn. - Istio service mesh with sidecar injection, mTLS PeerAuthentication,
AuthorizationPolicies per service, and a single ingress gateway at
https://aispm.local(port 443, browser-trusted via mkcert). - gVisor (
runsc) RuntimeClass as the default for customer-uploaded
agent pods — per-pod kernel sandboxing for untrusted code. - Local container registry at
localhost:5001, mirrored into every
kind node so dev image rebuilds land in seconds.
The chart applies in 6 phased tiers (infra → data → data-init →
platform → compute → frontend), serially, with auto-recovery from
immutable-PVC errors, failed Jobs, and stale registry state.
A single canonical seeder (scripts/seed_all.py) populates models,
posture history, integrations, cases, alerts, and policies via one
idempotent K8s Job.
Bootstrap & operability
- One entry point:
./deploy/scripts/bootstrap-cluster.sh. Helper
scripts (kind-cluster.sh,kind-databases-ha.sh,
install-gvisor.sh,build-images.sh) are now invoked from
bootstrap; never directly. - Cluster-destroy prompt at the start.
FORCE_DESTROY/
FORCE_KEEP/FORCE_CREATEenv overrides for CI. - Verbose-by-default: every step prints what it does. xtrace goes to
/tmp/bootstrap-xtrace-<pid>.logon bash 4.1+. - Phased apply runs serially to keep kind-on-Docker load manageable.
- Auto-recovery from immutable-PVC errors, failed/stuck Jobs, stuck
pod templates, stale Docker registry containers, and racing
cert-manager Certificate reconcile loops. .envkeys (Postgres password, Anthropic, Tavily, Groq, etc.)
auto-merge intoplatform-secretsevery bootstrap — no more
re-entering them in the Integrations UI after a rebuild.
TLS / WSS
- Chart's
aispm-tls-certificate.yamlnow gates on
ingress.certManager, so cert-manager doesn't fight the
bootstrap's mkcert Secret in dev. - Bootstrap actively repairs the WSS cert chain on every run —
re-upserts the mkcert Secret, restartsistio-ingressgateway,
verifies the wire issuer ismkcert development CA. Safari
WebSocket Secure connections now work end-to-end with no manual
steps.
Schema drift fixes (alembic 010 / 011 / 012)
- New
agent_kindenum +agents.kindcolumn. agents.riskmigrated fromrisk_leveltomodel_risk_tier;
model_risk_tierextended withlow/medium/critical.agents.policy_statusmigrated frompolicy_statusenum to
policy_coverage; values mapped (covered→full).model_providerextended withaws,azure,gcp,internal.- All migrations idempotent across fresh, hand-patched, and re-run
database states.
Resource sizing
spm-apiships with explicit2Gimemory limits (was inheriting
the namespace LimitRange's512Miand getting OOMKilled every
~8 minutes).db-seedJob memory bumped to2Gi.
macOS dev caveat
gVisor's runsc doesn't work on Docker Desktop's Linuxkit kernel —
sandbox init crashes. values.dev.yaml overrides
agentRuntime.runtimeClassName to "" so agent pods use runc on
Mac. On a Linux dev host or in prod the gVisor sandbox is enforced
unchanged.
AI-SPM v1.0.0 — AI Security Posture Management
AI-SPM v1.0.0 — AI Security Posture Management
Release date: 2026-04-25
Codename: "MCP"
First production release of the AI Security Posture Management. Customers
can now upload their own AI agents as a single Python file, deploy
them into sandboxed containers, and have them chat through the full
security pipeline — prompt-guard → policy decider → Kafka → output-guard
— with attached policies, conversation memory, web search, and a live
activity timeline visible in the admin UI.
Highlights
- End-to-end agent chat through the existing AI-SPM security pipeline,
with attached per-agent policies enforced on every turn. - Drop-in agent uploads. Operator drops in a single
agent.py, the
platform validates it, mints per-agent tokens, spawns a sandboxed
Docker container, and routes traffic through Kafka. No custom image
required for the five example agent shapes we ship. - Provider-agnostic LLM proxy. Native dispatch for Anthropic and
Ollama (both OpenAI-compatible and native modes); operators switch
providers in the UI without restarts or code changes. - Live observability. Every chat turn, web-search call, and LLM
call emits a lineage event that lands insession_eventsand tails
in the per-agent Activity tab in the admin UI within 5 seconds. - DB-backed configuration. The agent SDK fetches its connection
bundle from the controller at boot — no platform secrets in the
agent's container env.
What's new
Agent runtime control plane
POST /api/spm/agents— uploadagent.py(multipart) with
deploy_after=true. Validates syntax, top-levelasync def main,
and dry-import; mints per-agentmcp_token+llm_api_key; creates
the per-agent Kafka topics; spawns the runtime container; polls for
the SDK'saispm.ready()handshake.POST /api/spm/agents/{id}/start | /stop— idempotent kick;
UI surfaces a persistent "working…" spinner until the polled
runtime_stateactually changes.DELETE /api/spm/agents/{id}— stops the container, drops the
topics, deletes the row.POST /api/spm/agents/{id}/chat— full pipeline, SSE response.GET /api/spm/agents/{id}/bootstrap— DB-backed SDK boot. The
agent's container only needs three env vars (AGENT_ID,
MCP_TOKEN,CONTROLLER_URL); everything else is fetched here.GET /api/spm/agents/{id}/policies+PUT— atomic-replace
attach/detach. The chat handler readslinked_policiesper turn
and forwards them to OPA so policies can scope evaluation.GET /api/spm/agents/{id}/activity— unified timeline (chat
turns +AgentToolCall+AgentLLMCall), newest-first, capped at
200 rows. Polled by the Activity tab.
Agent-side SDK (agent_runtime/aispm)
aispm.ready()— lifecycle handshake.aispm.chat.subscribe()/reply()— Kafka I/O. Consumer uses
auto_offset_reset="earliest"so the very first message after deploy
is never silently dropped during consumer-group join.aispm.chat.history(session_id, limit)— replay persisted turns;
example agents use this for conversation memory across turns.aispm.mcp.call("web_fetch", ...)— JSON-RPC over HTTP to the MCP
server;web_fetchis Tavily-backed.aispm.llm.complete(messages=, model=…)— OpenAI-compatible call
throughspm-llm-proxy; the SDK no longer pins a default model so
the operator's chosen provider model wins.aispm.get_secret(name)— per-agent secret store.aispm.log("step", trace=…)— structured lineage line on stdout.
Provider dispatch (spm-llm-proxy)
connector_type |
Endpoint | Auth header | Model source |
|---|---|---|---|
anthropic |
{base_url}/v1/messages |
x-api-key + anthropic-version: 2023-06-01 |
integration model (payload model honoured only when it starts with claude) |
ollama (/v1) |
{base_url}/chat/completions (OpenAI-compatible) |
none | payload model > integration model > llama3.1:8b fallback |
ollama (other) |
{base_url}/api/chat (native) |
none | payload model > integration model > llama3.1:8b fallback |
Switching provider is a UI dropdown change on the AI-SPM Agent Runtime
Control Plane (MCP) integration row — no restart, no agent re-deploy.
Observability (AgentToolCallEvent, AgentLLMCallEvent)
spm-mcpemitsAgentToolCallEventafter everyweb_fetch,
capturing tool name, args, ok/error, andduration_ms.spm-llm-proxyemitsAgentLLMCallEventafter every chat-completion
call (Anthropic and Ollama paths), capturing model, prompt and
completion token counts, and ok/error.- Both events publish to
cpm.global.lineage_events. The existing
lineage_consumerpersists them intosession_eventsautomatically. - Best-effort by design: a producer init failure never blocks the
serving path. Alineage_producer.send failedwarning is the only
signal when Kafka is unreachable; chat keeps working.
Admin UI
- Inventory → Agents tab lists live agents alongside mock rows
with a runtime-state pip and risk tint. - PreviewPanel (right-side panel on row click) carries the
Run/Stop toggle, Open Chat, View Detail, and Delete asset
actions. - AgentChatPanel (300px inline panel) opens from PreviewPanel's
Open Chat. Composer pinned to bottom (min-h-0+max-h(100vh-120px)
so it can never be pushed off-screen by long chat history). - AgentDetailDrawer (560px overlay) opens from PreviewPanel's
View Detail button. Five tabs: Overview, Configure, Activity
(live tail, polls every 5s), Sessions, Lineage. - PolicySelector lets operators attach/detach policies on a live
agent without leaving the panel. - Add Integration modal:
enum_integrationfields render as real
dropdowns of existing integrations (no more pasting UUIDs). - Run/Stop button stays in a "working…" state until the next poll
observes the actual runtime-state change.
Examples
A new top-level Example agents/ folder ships five
ready-to-deploy agents — one per agent_type enum value:
| File | agent_type |
Demonstrates |
|---|---|---|
custom_agent.py |
custom |
Bare-SDK happy path with aispm.chat.history() conversation memory and a strong web-search prompt. |
langchain_agent.py |
langchain |
Off-the-shelf LangChain AgentExecutor + @tool calling our MCP / LLM proxies. |
llamaindex_agent.py |
llamaindex |
LlamaIndex chat-engine routed through aispm.llm, with a hand-rolled retrieval fallback. |
autogpt_agent.py |
autogpt |
Self-prompting plan → execute → reflect loop, capped at 3 hops. |
openai_assistant_agent.py |
openai_assistant |
OpenAI Assistants-style request shape (system + user + tools), no framework. |
The runtime image now has langchain==0.3.*, langchain-openai==0.2.*,
llama-index-core==0.11.*, and llama-index-llms-openai-like==0.2.*
baked in, so langchain_agent.py and llamaindex_agent.py deploy
cleanly without bringing your own image.
Bug fixes
pausedagent immediately after deploy. The upload route's
_wait_for_readywas reading a stale identity-mapped Agent row
from its own SQLAlchemy session and timing out, then overwriting
the (correctly running) row tocrashed. Fixed withdb.expire_all()
on every poll iteration.- First message after deploy silently dropped. The agent's Kafka
consumer joined the group with the defaultauto_offset_reset= "latest", so any message produced betweenaispm.ready()flipping
the row torunningand the consumer registering with the broker
was skipped. Fixed by switching toearliest. Prompt blocked by safety guard. (S2)on the literal word "yes".
Three different code sites (two adapters and one module-level
function injected viaguard_fn=) had the same anti-pattern that
forcedverdict=blockwhenever any S1–S15 category appeared, even
when the guard's own verdict wasallow. Replaced with a length-based
bypass for inputs underGUARD_MIN_TEXT_LEN=8chars and a
score-threshold (GUARD_BLOCK_SCORE=0.6) gate on the
category-escalation path.- 502
Load failedon chat. Theagent_chat.pySSE handler was
importingaiokafkalazily but the package wasn't inspm-api's
requirements. Added the dep. - 500
ModuleNotFoundError: No module named 'services.spm_api'in
bothspm-llm-proxyandspm-mcp. Both fell back to a brittle
cross-service import. Inlined_decode_secretand dropped the
cross-service registry lookup so each service is self-contained. POST /v1/chat/completionsreturning 500. The proxy was hardcoded
to Ollama's/api/chatshape; pointing Default LLM at Anthropic
produced a 404 fromapi.anthropic.com. Now branches on
connector_typeand translates request + response shape per provider.- **
web_fetch...