agent-safety

Here are 35 public repositories matching this topic...

corv89 / shannot

Human-in-the-loop execution for LLM agents

python linux cli security devops automation mcp sandbox sysadmin python3 developer-tools human-in-the-loop llm llm-agents agent-safety supervised-execution

Updated Jan 11, 2026
Python

Pro-GenAI / Agent-Action-Guard

Star

🛡️ Safe AI Agents through Action Classifier

Updated Feb 17, 2026
Python

SafellmHub / hguard-go

Star

Guardrails for LLMs: detect and block hallucinated tool calls to improve safety and reliability.

middleware machine-learning ai language-models ai-safety prompt-engineering llms toolformer hallucination-detection tool-calling agent-safety

Updated Jul 18, 2025
Go

oathe-ai / otc

Star

Open Threat Classification (OTC) — 10 threat patterns for AI agent skills, MCP servers, and plugins. CC-BY-4.0.

ai-security behavioral-analysis mcp-security agent-safety threat-taxonomy

Updated Feb 26, 2026

shcherbak-ai / tethered

Star

Runtime network egress control for Python. One function call to restrict which hosts your code can connect to.

security egress-filtering network-security devsecops egress supply-chain-security llms agent-safety

Updated Feb 26, 2026
Python

hexitlabs / vigil

Star

🛡️ Open-source safety guardrail for AI agent tool calls. <2ms, zero dependencies.

security ai mcp guardrails llm langchain agent-safety tool-validation

Updated Feb 15, 2026
TypeScript

Agent-Sudo-Org / agent-sudo

Star

The missing safety layer for AI Agents. Adaptive High-Friction Guardrails (Time-locks, Biometrics) for critical operations to prevent catastrophic errors.

ai-safety human-in-the-loop ai-agents guardrails llm-security agent-security agent-safety

Updated Jan 28, 2026
TypeScript

aerosta / rewardhackwatch

Star

Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).

nlp machine-learning monitoring deep-learning transformers pytorch alignment ai-safety fastapi huggingface streamlit distilbert llm rlhf llm-agents agent-safety reward-hacking misalignment

Updated Dec 11, 2025
Python

lemnk / Sudo-agent

Star

A runtime authorization layer for LLM tool calls policy, approval, audit logs.

python agent security authorization developer-tools human-in-the-loop policy-engine jsonl runtime-security audit-logging guardrails llm agent-safety

Updated Feb 6, 2026
Python

paolosyloslabini / ethics

Star

ETHICS.md — A statement of ethical principles for AI agents. Drop it in your repo root.

readme developer-tools ai-safety ethics ai-agents claude ai-ethics ai-alignment responsible-ai llm prompt-injection agent-safety ethics-md

Updated Feb 19, 2026

Maxbanker / negentropy-constellation

Star

Safety-first agentic toolkit: 10 packages for collapse detection, governance, and reproducible runs.

benchmark time-series simulation reliability observability governance ethics anomaly-detection mlops agent-safety

Updated Dec 9, 2025
Python

imran-siddique / awesome-ai-governance

Sponsor

Star

🛡️ A curated list of tools, frameworks, standards, and resources for AI agent governance, safety, and compliance

awesome owasp awesome-list compliance ai-safety ai-agents ai-ethics guardrails responsible-ai ai-governance mcp-security agent-safety agent-governance

Updated Feb 25, 2026
Shell

Pro-GenAI / A2A-Agent-Action-Guard

Star

A2A version of Agent Action Guard: Safe AI Agents through Action Classifier

Updated Dec 14, 2025
Python

KarmaKoala / The-Agent-Genome-Project

Star

An open-source engineering blueprint for defining and designing the core capabilities, boundaries, and ethics of any AI agent.

protocol specification standard autonomous-agents dev-tools agp ai-ethics agent-framework ai-agent agent-design llm llm-agents agent-architecture agent-safety

Updated Sep 6, 2025

craig-whitfield / think-mcp

Star

MCP server for intent security pre-flight checks for autonomous AI agents

mcp model-context-protocol mcp-server agent-safety intent-security

Updated Feb 28, 2026
TypeScript

Skwert001 / Reams-Legality-Gate

Star

Energy based legality gating SDK for AI reasoning. Predicts, repairs, and audits collapse before it happens; reduces hallucinations and provides numeric audit logs.

middleware reliability audit compliance observability control-theory ai-safety llm reasoning-language-models agent-safety

Updated Oct 25, 2025

sherifkozman / afl

Star

Runtime-agnostic hook harness that catches unverifiable prompts, enforces failure-mode templates, and gates task completion on passing tests.

hooks ai-safety ai-agents guardrails failure-modes llm-agents agent-safety

Updated Feb 25, 2026
Python

minrescue / safe-superintelligence-framework

Star

Canonical texts and implementation primitives for the Safe Superintelligence Framework (v1.2.1): Constitution, Minimum Rescue Protocol, system prompt, decision matrix.

ai-safety risk-management ai-alignment responsible-ai ai-governance system-prompt auditability agent-safety minimum-rescue

Updated Jan 3, 2026

craig-whitfield / undo-mcp

Star

MCP server for reversibility intelligence — check if actions can be undone

mcp reversibility model-context-protocol mcp-server agent-safety

Updated Feb 28, 2026
TypeScript

craig-whitfield / context-mcp

Star

MCP server for situational awareness — holidays, business hours, platform status

mcp situational-awareness model-context-protocol mcp-server agent-safety

Updated Feb 28, 2026
TypeScript

Improve this page

Add a description, image, and links to the agent-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the agent-safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent-safety

Here are 35 public repositories matching this topic...

corv89 / shannot

Pro-GenAI / Agent-Action-Guard

SafellmHub / hguard-go

oathe-ai / otc

shcherbak-ai / tethered

hexitlabs / vigil

Agent-Sudo-Org / agent-sudo

aerosta / rewardhackwatch

lemnk / Sudo-agent

paolosyloslabini / ethics

Maxbanker / negentropy-constellation

imran-siddique / awesome-ai-governance

Pro-GenAI / A2A-Agent-Action-Guard

KarmaKoala / The-Agent-Genome-Project

craig-whitfield / think-mcp

Skwert001 / Reams-Legality-Gate

sherifkozman / afl

minrescue / safe-superintelligence-framework

craig-whitfield / undo-mcp

craig-whitfield / context-mcp

Improve this page

Add this topic to your repo