Releases · dshapi/AI-SPM

05 May 20:31

dshapi

cb89f86

Headline: 3-node Kubernetes (kind) HA deployment by default Latest

Latest

v1.0.1 — May 2026

This release moves dev from a single-node kind cluster to a
production-shaped HA topology that mirrors the prod target one-for-one.
A single ./deploy/scripts/bootstrap-cluster.sh brings up:

3 control-plane Kubernetes nodes running on Docker Desktop via
kind. No worker nodes — control-plane taints lifted on dev so
application pods can schedule cluster-wide.
CNPG (CloudNativePG) Postgres with 1 primary + 2 replicas,
automatic failover, streaming replication, and a spm-db-rw
Service that always points to the current primary.
Bitnami Redis Sentinel with 3 nodes for cache + session HA.
MinIO distributed object storage (4 servers, 4 disks) backing
Flink checkpoints and other large blobs. Replaces Longhorn.
Istio service mesh with sidecar injection, mTLS PeerAuthentication,
AuthorizationPolicies per service, and a single ingress gateway at
https://aispm.local (port 443, browser-trusted via mkcert).
gVisor (runsc) RuntimeClass as the default for customer-uploaded
agent pods — per-pod kernel sandboxing for untrusted code.
Local container registry at localhost:5001, mirrored into every
kind node so dev image rebuilds land in seconds.

The chart applies in 6 phased tiers (infra → data → data-init →
platform → compute → frontend), serially, with auto-recovery from
immutable-PVC errors, failed Jobs, and stale registry state.
A single canonical seeder (scripts/seed_all.py) populates models,
posture history, integrations, cases, alerts, and policies via one
idempotent K8s Job.

Bootstrap & operability

One entry point: ./deploy/scripts/bootstrap-cluster.sh. Helper
scripts (kind-cluster.sh, kind-databases-ha.sh,
install-gvisor.sh, build-images.sh) are now invoked from
bootstrap; never directly.
Cluster-destroy prompt at the start. FORCE_DESTROY /
FORCE_KEEP / FORCE_CREATE env overrides for CI.
Verbose-by-default: every step prints what it does. xtrace goes to
/tmp/bootstrap-xtrace-<pid>.log on bash 4.1+.
Phased apply runs serially to keep kind-on-Docker load manageable.
Auto-recovery from immutable-PVC errors, failed/stuck Jobs, stuck
pod templates, stale Docker registry containers, and racing
cert-manager Certificate reconcile loops.
.env keys (Postgres password, Anthropic, Tavily, Groq, etc.)
auto-merge into platform-secrets every bootstrap — no more
re-entering them in the Integrations UI after a rebuild.

TLS / WSS

Chart's aispm-tls-certificate.yaml now gates on
ingress.certManager, so cert-manager doesn't fight the
bootstrap's mkcert Secret in dev.
Bootstrap actively repairs the WSS cert chain on every run —
re-upserts the mkcert Secret, restarts istio-ingressgateway,
verifies the wire issuer is mkcert development CA. Safari
WebSocket Secure connections now work end-to-end with no manual
steps.

Schema drift fixes (alembic 010 / 011 / 012)

New agent_kind enum + agents.kind column.
agents.risk migrated from risk_level to model_risk_tier;
model_risk_tier extended with low/medium/critical.
agents.policy_status migrated from policy_status enum to
policy_coverage; values mapped (covered → full).
model_provider extended with aws, azure, gcp, internal.
All migrations idempotent across fresh, hand-patched, and re-run
database states.

Resource sizing

spm-api ships with explicit 2Gi memory limits (was inheriting
the namespace LimitRange's 512Mi and getting OOMKilled every
~8 minutes).
db-seed Job memory bumped to 2Gi.

macOS dev caveat

gVisor's runsc doesn't work on Docker Desktop's Linuxkit kernel —
sandbox init crashes. values.dev.yaml overrides
agentRuntime.runtimeClassName to "" so agent pods use runc on
Mac. On a Linux dev host or in prod the gVisor sandbox is enforced
unchanged.

Assets 2

0 Join discussion

25 Apr 19:00

dshapi

#HelpNeeded

c5c6b2b

AI-SPM v1.0.0 — AI Security Posture Management

Release date: 2026-04-25
Codename: "MCP"

First production release of the AI Security Posture Management. Customers
can now upload their own AI agents as a single Python file, deploy
them into sandboxed containers, and have them chat through the full
security pipeline — prompt-guard → policy decider → Kafka → output-guard
— with attached policies, conversation memory, web search, and a live
activity timeline visible in the admin UI.

Highlights

End-to-end agent chat through the existing AI-SPM security pipeline,
with attached per-agent policies enforced on every turn.
Drop-in agent uploads. Operator drops in a single agent.py, the
platform validates it, mints per-agent tokens, spawns a sandboxed
Docker container, and routes traffic through Kafka. No custom image
required for the five example agent shapes we ship.
Provider-agnostic LLM proxy. Native dispatch for Anthropic and
Ollama (both OpenAI-compatible and native modes); operators switch
providers in the UI without restarts or code changes.
Live observability. Every chat turn, web-search call, and LLM
call emits a lineage event that lands in session_events and tails
in the per-agent Activity tab in the admin UI within 5 seconds.
DB-backed configuration. The agent SDK fetches its connection
bundle from the controller at boot — no platform secrets in the
agent's container env.

What's new

Agent runtime control plane

POST /api/spm/agents — upload agent.py (multipart) with
deploy_after=true. Validates syntax, top-level async def main,
and dry-import; mints per-agent mcp_token + llm_api_key; creates
the per-agent Kafka topics; spawns the runtime container; polls for
the SDK's aispm.ready() handshake.
POST /api/spm/agents/{id}/start | /stop — idempotent kick;
UI surfaces a persistent "working…" spinner until the polled
runtime_state actually changes.
DELETE /api/spm/agents/{id} — stops the container, drops the
topics, deletes the row.
POST /api/spm/agents/{id}/chat — full pipeline, SSE response.
GET /api/spm/agents/{id}/bootstrap — DB-backed SDK boot. The
agent's container only needs three env vars (AGENT_ID,
MCP_TOKEN, CONTROLLER_URL); everything else is fetched here.
GET /api/spm/agents/{id}/policies + PUT — atomic-replace
attach/detach. The chat handler reads linked_policies per turn
and forwards them to OPA so policies can scope evaluation.
GET /api/spm/agents/{id}/activity — unified timeline (chat
turns + AgentToolCall + AgentLLMCall), newest-first, capped at
200 rows. Polled by the Activity tab.

Agent-side SDK (`agent_runtime/aispm`)

aispm.ready() — lifecycle handshake.
aispm.chat.subscribe() / reply() — Kafka I/O. Consumer uses
auto_offset_reset="earliest" so the very first message after deploy
is never silently dropped during consumer-group join.
aispm.chat.history(session_id, limit) — replay persisted turns;
example agents use this for conversation memory across turns.
aispm.mcp.call("web_fetch", ...) — JSON-RPC over HTTP to the MCP
server; web_fetch is Tavily-backed.
aispm.llm.complete(messages=, model=…) — OpenAI-compatible call
through spm-llm-proxy; the SDK no longer pins a default model so
the operator's chosen provider model wins.
aispm.get_secret(name) — per-agent secret store.
aispm.log("step", trace=…) — structured lineage line on stdout.

Provider dispatch (spm-llm-proxy)

`connector_type`	Endpoint	Auth header	Model source
`anthropic`	`{base_url}/v1/messages`	`x-api-key` + `anthropic-version: 2023-06-01`	integration `model` (payload `model` honoured only when it starts with `claude`)
`ollama` (`/v1`)	`{base_url}/chat/completions` (OpenAI-compatible)	none	payload `model` > integration `model` > `llama3.1:8b` fallback
`ollama` (other)	`{base_url}/api/chat` (native)	none	payload `model` > integration `model` > `llama3.1:8b` fallback

Switching provider is a UI dropdown change on the AI-SPM Agent Runtime
Control Plane (MCP) integration row — no restart, no agent re-deploy.

Observability (`AgentToolCallEvent`, `AgentLLMCallEvent`)

spm-mcp emits AgentToolCallEvent after every web_fetch,
capturing tool name, args, ok/error, and duration_ms.
spm-llm-proxy emits AgentLLMCallEvent after every chat-completion
call (Anthropic and Ollama paths), capturing model, prompt and
completion token counts, and ok/error.
Both events publish to cpm.global.lineage_events. The existing
lineage_consumer persists them into session_events automatically.
Best-effort by design: a producer init failure never blocks the
serving path. A lineage_producer.send failed warning is the only
signal when Kafka is unreachable; chat keeps working.

Admin UI

Inventory → Agents tab lists live agents alongside mock rows
with a runtime-state pip and risk tint.
PreviewPanel (right-side panel on row click) carries the
Run/Stop toggle, Open Chat, View Detail, and Delete asset
actions.
AgentChatPanel (300px inline panel) opens from PreviewPanel's
Open Chat. Composer pinned to bottom (min-h-0 + max-h(100vh-120px)
so it can never be pushed off-screen by long chat history).
AgentDetailDrawer (560px overlay) opens from PreviewPanel's
View Detail button. Five tabs: Overview, Configure, Activity
(live tail, polls every 5s), Sessions, Lineage.
PolicySelector lets operators attach/detach policies on a live
agent without leaving the panel.
Add Integration modal: enum_integration fields render as real
dropdowns of existing integrations (no more pasting UUIDs).
Run/Stop button stays in a "working…" state until the next poll
observes the actual runtime-state change.

Examples

A new top-level Example agents/ folder ships five
ready-to-deploy agents — one per agent_type enum value:

File	`agent_type`	Demonstrates
`custom_agent.py`	`custom`	Bare-SDK happy path with `aispm.chat.history()` conversation memory and a strong web-search prompt.
`langchain_agent.py`	`langchain`	Off-the-shelf LangChain `AgentExecutor` + `@tool` calling our MCP / LLM proxies.
`llamaindex_agent.py`	`llamaindex`	LlamaIndex chat-engine routed through `aispm.llm`, with a hand-rolled retrieval fallback.
`autogpt_agent.py`	`autogpt`	Self-prompting plan → execute → reflect loop, capped at 3 hops.
`openai_assistant_agent.py`	`openai_assistant`	OpenAI Assistants-style request shape (system + user + tools), no framework.

The runtime image now has langchain==0.3.*, langchain-openai==0.2.*,
llama-index-core==0.11.*, and llama-index-llms-openai-like==0.2.*
baked in, so langchain_agent.py and llamaindex_agent.py deploy
cleanly without bringing your own image.

Bug fixes

paused agent immediately after deploy. The upload route's
_wait_for_ready was reading a stale identity-mapped Agent row
from its own SQLAlchemy session and timing out, then overwriting
the (correctly running) row to crashed. Fixed with db.expire_all()
on every poll iteration.
First message after deploy silently dropped. The agent's Kafka
consumer joined the group with the default auto_offset_reset= "latest", so any message produced between aispm.ready() flipping
the row to running and the consumer registering with the broker
was skipped. Fixed by switching to earliest.
Prompt blocked by safety guard. (S2) on the literal word "yes".
Three different code sites (two adapters and one module-level
function injected via guard_fn=) had the same anti-pattern that
forced verdict=block whenever any S1–S15 category appeared, even
when the guard's own verdict was allow. Replaced with a length-based
bypass for inputs under GUARD_MIN_TEXT_LEN=8 chars and a
score-threshold (GUARD_BLOCK_SCORE=0.6) gate on the
category-escalation path.
502 Load failed on chat. The agent_chat.py SSE handler was
importing aiokafka lazily but the package wasn't in spm-api's
requirements. Added the dep.
500 ModuleNotFoundError: No module named 'services.spm_api' in
both spm-llm-proxy and spm-mcp. Both fell back to a brittle
cross-service import. Inlined _decode_secret and dropped the
cross-service registry lookup so each service is self-contained.
POST /v1/chat/completions returning 500. The proxy was hardcoded
to Ollama's /api/chat shape; pointing Default LLM at Anthropic
produced a 404 from api.anthropic.com. Now branches on
connector_type and translates request + response shape per provider.
**web_fetch...

Assets 2

3 Join discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v1.0.1 — May 2026

Bootstrap & operability

TLS / WSS

Schema drift fixes (alembic 010 / 011 / 012)

Resource sizing

macOS dev caveat

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

AI-SPM v1.0.0 — AI Security Posture Management

Highlights

What's new

Agent runtime control plane

Agent-side SDK (`agent_runtime/aispm`)

Provider dispatch (spm-llm-proxy)

Observability (`AgentToolCallEvent`, `AgentLLMCallEvent`)

Admin UI

Examples

Bug fixes

Uh oh!

Uh oh!

Releases: dshapi/AI-SPM

Headline: 3-node Kubernetes (kind) HA deployment by default

v1.0.1 — May 2026

Bootstrap & operability

TLS / WSS

Schema drift fixes (alembic 010 / 011 / 012)

Resource sizing

macOS dev caveat

Uh oh!

AI-SPM v1.0.0 — AI Security Posture Management

AI-SPM v1.0.0 — AI Security Posture Management

Highlights

What's new

Agent runtime control plane

Agent-side SDK (agent_runtime/aispm)

Provider dispatch (spm-llm-proxy)

Observability (AgentToolCallEvent, AgentLLMCallEvent)

Admin UI

Examples

Bug fixes

Uh oh!

Agent-side SDK (`agent_runtime/aispm`)

Observability (`AgentToolCallEvent`, `AgentLLMCallEvent`)