stage0-agent-runtime-guard is a customer-facing proof-of-value demo for SignalPulse, showing why AI agents need a runtime policy authority before they can publish, deploy, keep retrying autonomously, or otherwise create side effects.
This repository is intentionally lightweight:
- a small autonomous agent
- a Stage0 API client aligned with the current hosted runtime contract
- side-by-side guarded vs unguarded demos
- customer-oriented scenarios you can show in a sales call, pilot, or internal evaluation
If you want the full product surface, dashboards, billing flows, API-key lifecycle, and hosted runtime described in the broader SignalPulse app, this repo is the shortest path to understanding the core runtime-guard value proposition.
Most teams already know how to make an agent produce text. The hard part is stopping the agent from quietly escalating into actions that should require policy, approval, or stronger guardrails.
Without a runtime guard, an agent can:
- turn research into advice
- turn drafting into publishing
- turn analysis into deployment
- turn "help me think" into "I already executed it"
Stage0 addresses that by validating execution intent before the action happens and returning an external verdict: ALLOW, DENY, or DEFER.
The current hosted contract also carries runtime metadata such as:
decision(GO,NO_GO,DEFER,ERROR)request_idpolicy_pack_version/policy_version- runtime
contextfields for approvals, roles, environments, channels, and loop state
This repo demonstrates the boundary between:
- useful bounded assistance
- unsafe autonomous escalation
The demo intentionally compares two modes:
WITHOUT Stage0The agent executes every planned step, including higher-risk steps that go beyond the user's original request.WITH Stage0The agent still completes safe informational work, but Stage0 denies or defers steps that attempt to publish, deploy, or keep looping without explicit guardrails.
- An external runtime guard can block escalation before it happens
- Safe work (research, analysis, drafting) still proceeds
- The agent cannot self-approve sensitive actions
- High-risk side effects (
publish,deploy) require explicit guardrails - Runtime context (approvals, environment, roles) shapes policy decisions
- This is a demo, not a production integration
- The agent is simulated; real agents may behave differently
- Policy decisions depend on your Stage0 configuration
- This does not cover all possible escalation vectors
- Live API behavior may differ from simulated mode
This repository now includes four scenarios designed to map to common buyer conversations:
frameworksResearch assistant for AI product teams. The agent should research and summarize, but should not silently become an implementation advisor.policy_publishCustomer-facing content publisher. The agent can draft and analyze, but should not publish outward-facing claims without approval.deploymentProduction ops assistant. The agent can investigate incidents, but should not approve or execute rollout changes on its own.agent_loopsRuntime loop guard. The agent can inspect retry history and prepare an operator handoff, but should not keep extending its own retry budget or repeat sensitive tool paths unattended.
These scenarios are useful for:
- AI SaaS founders
- internal tools teams
- platform and DevOps teams
- buyers evaluating runtime safety controls for agents
- Python 3.10 or newer
- A Stage0 API key from SignalPulse if you want live policy decisions
git clone https://github.com/Starlight143/stage0-agent-runtime-guard.git
cd stage0-agent-runtime-guard
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .envThen set:
STAGE0_API_KEY=your_api_key_here
STAGE0_BASE_URL=https://api.signalpulse.orgIf no API key is configured, the guarded demo falls back to simulated Stage0 responses so you can still demonstrate the control flow.
python run_demo.py --scenario frameworkspython run_demo.py --scenario frameworks --concise --autoThe --concise flag shows a clean side-by-side comparison of guarded vs unguarded execution, perfect for sales calls and presentations.
python run_demo.py --scenario frameworks --concise --simulated --autoThe --simulated flag forces simulated Stage0 responses even with an API key configured, ensuring deterministic output for recordings and CI.
python run_demo.py --scenario deployment --autopython run_demo.py --scenario all --autoAvailable scenario keys: frameworks, policy_publish, deployment, agent_loops
Pre-generated sample outputs are available in artifacts/:
artifacts/frameworks.txt- Research Assistant scenarioartifacts/policy_publish.txt- Content Publisher scenarioartifacts/deployment.txt- Production Ops scenarioartifacts/agent_loops.txt- Loop Guard scenarioartifacts/all-scenarios.txt- All scenarios combined
This demo repo now includes lightweight customer-facing reference docs derived from the main product:
docs/runtime-contract.mddocs/reference-scenarios.mddocs/service-overview.md
For each scenario, you will see:
- an unguarded run
- a guarded run
- the exact steps that Stage0 denied or deferred
- a summary of why those denied steps matter commercially and operationally
The important observation is not "the agent was blocked." The real value is:
- safe work still proceeds
- risky work is denied or deferred before execution
- the guard is external to the agent
DEFER is especially useful when the agent is drifting into loop-like behavior, under-specified follow-up actions, or situations that need a human checkpoint instead of a silent retry.
The minimum integration is intentionally small:
from stage0 import Stage0Client
from stage0.client import Verdict
client = Stage0Client()
response = client.check_goal(
goal="Publish the weekly changelog",
success_criteria=["Post to the public changelog"],
constraints=["human approval required"],
tools=["shell"],
side_effects=["publish"],
context={
"actor_role": "platform_admin",
"approval_status": "approved",
"environment": "production",
"request_channel": "dashboard",
},
)
if response.verdict != Verdict.ALLOW:
raise RuntimeError(response.reason)In a real implementation, you should validate every execution step, not just the top-level task. The client in this repo now supports both the request context object and richer response fields like decision, request_id, policy_version, guardrail_checks, and decision_trace_summary.
For loop-style workloads, the same pattern should be applied to runtime state as well. In practice that means passing stable execution identifiers and constraints such as retry budgets, elapsed-time ceilings, or repeated-tool thresholds into the policy layer before the agent decides to continue.
agent/ Agent planning and execution logic
demo/ Guarded and unguarded scenario runners
stage0/ Stage0 API client
docs/ Buyer-facing contract and scenario references
tests/ Unit and smoke tests
artifacts/ Pre-generated sample outputs
run_demo.py Multi-scenario demo entrypoint
This repo has been updated to better support customer conversion and live evaluation:
- richer customer-facing demo scenarios
- a new agent-loop scenario aligned to runtime retry controls
- non-interactive CLI mode for recordings and scripted demos
--conciseflag for buyer-friendly comparison output- pre-generated sample outputs in
artifacts/ - smoke tests for all scenarios
- clearer environment configuration for hosted Stage0
- stronger buyer-oriented README positioning around
ALLOW / DENY / DEFER
Use this repo when you want to:
- show the concept quickly
- demo the difference between guarded and unguarded execution
- validate runtime guard behavior before a larger integration
- explain why loop guards matter before integrating runtime state into production agents
- support customer conversations with a minimal example
Use the broader SignalPulse application when you want:
- account management
- hosted billing flows
- API key lifecycle and issuance
- dashboards, logs, and analytics
- a fuller product experience
To try the real Stage0 runtime:
- Visit signalpulse.org
- Create an account
- Generate an API key
- Re-run this demo with your key in
.env
See the repository license and remote project terms before production use.