Agent Operating Intelligence Layer for Production AI Systems
Langfuse tells you what your LLM said. Agentability tells you why your agent decided it.
Agentability shows not just what your agents did, but why they decided it, how they reasoned through it, and where the failure originated.
Production multi-agent systems fail in ways standard monitoring cannot explain. Existing tools record LLM calls. They do not understand that an agent made a low-confidence decision because memory retrieval returned six-month-old embeddings, which triggered a causal chain that caused an escalation.
Agentability is built specifically for agents:
| Capability | Standard APM | LLM Tracers | Agentability |
|---|---|---|---|
| Decision provenance (why) | — | Partial | ✓ Complete |
| Memory subsystem tracking | — | — | ✓ All five types |
| Causal graph (decision → decision) | — | — | ✓ Native |
| Multi-agent conflict analysis | — | — | ✓ Game-theoretic |
| Confidence drift detection | — | — | ✓ Statistical |
| Offline mode (zero infra) | — | — | ✓ SQLite |
| Open-source core | — | Partial | ✓ MIT |
pip install agentabilityOptional extras:
pip install "agentability[langchain]" # LangChain integration
pip install "agentability[crewai]" # CrewAI integration
pip install "agentability[autogen]" # AutoGen integration
pip install "agentability[llamaindex]" # LlamaIndex integration
pip install "agentability[all-integrations]" # All framework integrations
pip install "agentability[dev]" # Development toolsfrom agentability import Tracer, DecisionType
# Zero infrastructure — stores to local SQLite file
tracer = Tracer(offline_mode=True)
with tracer.trace_decision(
agent_id="risk_agent",
decision_type=DecisionType.CLASSIFICATION,
input_data={"loan_amount": 50_000, "credit_score": 680},
) as ctx:
ctx.set_confidence(0.74)
ctx.add_reasoning_step("Credit score meets minimum threshold")
ctx.add_reasoning_step("Income verification pending — data is 90 days old")
tracer.record_decision(
output={"approved": False},
uncertainties=["Employment stability not verified"],
constraints_violated=["income_freshness_days <= 30"],
data_sources=["credit_bureau", "income_api"],
)
decisions = tracer.query_decisions(agent_id="risk_agent", limit=10)
tracer.close()Every decision records the complete reasoning chain: inputs, outputs, reasoning steps, uncertainties, assumptions, constraints checked vs violated, and data sources consulted.
tracer.record_decision(
output={"approved": False},
confidence=0.42,
reasoning=["Income data is stale (90 days)", "Cannot verify employment"],
uncertainties=["No recent bank statements"],
assumptions=["Reported income figure is accurate"],
constraints_checked=["min_credit_score >= 650"],
constraints_violated=["income_freshness_days <= 30"],
)Track all five memory subsystems — the only observability library that does this.
# Vector memory — RAG staleness detection
tracer.record_memory_operation(
agent_id="rag_agent",
memory_type=MemoryType.VECTOR,
operation=MemoryOperation.RETRIEVE,
latency_ms=38.5,
items_processed=10,
avg_similarity=0.61, # low — stale embeddings
oldest_item_age_hours=4320.0, # 180 days old — staleness signal
)
# Episodic memory — context window saturation alert
tracer.record_memory_operation(
agent_id="chat_agent",
memory_type=MemoryType.EPISODIC,
operation=MemoryOperation.RETRIEVE,
latency_ms=12.1,
items_processed=5,
context_tokens_used=3_840,
context_tokens_limit=4_096, # 93% full — truncation imminent
temporal_coherence=0.93,
)tracer.record_conflict(
session_id="session_42",
conflict_type=ConflictType.GOAL_CONFLICT,
involved_agents=["risk_agent", "sales_agent"],
agent_positions={
"risk_agent": {"decision": "deny", "confidence": 0.82},
"sales_agent": {"decision": "approve", "confidence": 0.78},
},
severity=0.71,
resolution_strategy="confidence_based",
)tracer.record_llm_call(
agent_id="summariser",
provider="anthropic",
model="claude-sonnet-4",
prompt_tokens=1_500,
completion_tokens=340,
latency_ms=1_180.0,
cost_usd=0.0083,
finish_reason="end_turn",
)from agentability.integrations.anthropic_sdk import AnthropicInstrumentation
import anthropic
tracer = Tracer(offline_mode=True)
client = anthropic.Anthropic()
client = AnthropicInstrumentation(tracer=tracer, agent_id="my_agent").wrap_client(client)
# All subsequent client.messages.create() calls are now tracked automaticallyfrom agentability.integrations.langchain import AgentabilityLangChainCallback
callback = AgentabilityLangChainCallback(tracer=tracer, agent_id="langchain_agent")
chain.invoke({"input": "Summarise this document"}, config={"callbacks": [callback]})from agentability.integrations.crewai import CrewAIInstrumentation
crew = CrewAIInstrumentation(tracer=tracer).instrument_crew(crew)
result = crew.kickoff()from agentability.integrations.autogen import AutoGenInstrumentation
agent = AutoGenInstrumentation(tracer=tracer).instrument_agent(agent)The Agentability dashboard provides a Datadog-quality dark UI for agent observability.
# Start the API server
AGENTABILITY_DB=agentability.db \
uvicorn platform.api.main:app --host 0.0.0.0 --port 8000
# Start the dashboard (separate terminal)
cd dashboard && npm install && npm run dev -- --host 0.0.0.0Open http://localhost:3000 to see:
- Overview — KPI cards, confidence trend, latency, cost, conflicts
- Decisions — Full decision explorer with reasoning chain drill-down
- Agents — Per-agent confidence timeline + drift alerts
- Conflicts — Multi-agent conflict hotspot map and timeline
- Cost & LLM — Token spend breakdown by model and provider
Your Agent System
│
▼
Agentability SDK (Python — TypeScript & Go coming in v0.5/v0.7)
│
▼
Storage Layer
┌──────────┬──────────┬──────────────┐
│ SQLite │ DuckDB │ TimescaleDB │
│ (offline)│(analytics│ (production) │
└──────────┴──────────┴──────────────┘
│
▼
Analytics Engine
┌──────────────┬──────────────┬───────────────┐
│ Causal Graph │ Drift Detect │ Conflict Anal │
└──────────────┴──────────────┴───────────────┘
│
▼
FastAPI Platform → React Dashboard
| Feature | Agentability | Langfuse | AgentOps | Arize Phoenix |
|---|---|---|---|---|
| Memory tracking (all 5 types) | ✓ | — | — | — |
| Decision provenance (why) | ✓ Complete | Partial | Partial | — |
| Multi-agent conflict analysis | ✓ Game-theoretic | — | — | — |
| Temporal causal graphs | ✓ | — | — | — |
| Confidence drift detection | ✓ | — | — | — |
| Offline / SQLite mode | ✓ | — | — | Limited |
| Open-source core | ✓ MIT | Partial | — | ✓ Apache 2 |
-
AsyncTracerwithcontextvars.ContextVar(asyncio-safe) - OpenTelemetry OTLP exporter backend
- DuckDB analytics backend + Parquet export
- OpenAI Agents SDK integration
- LangGraph node-level tracing
- Pydantic AI integration
- CUSUM drift detection algorithm
- TypeScript SDK
- WebSocket real-time streaming to dashboard
- D3 causal graph visualiser in dashboard
- Docker Compose one-command deploy
- Counterfactual analysis
- A/B testing framework for agent versions
- RBAC + Audit logs (SOC2 alignment)
- SSO (OIDC + SAML2)
git clone https://github.com/inteleion-ai/Agentability.git
cd Agentability
# Install SDK in editable mode with dev dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run the test suite (379 tests, 94% coverage)
make test
# Quality gate
make lint && make type-check
# Seed demo data and start the stack
python3 sdk/python/examples/seed_demo.py
make api # http://localhost:8000/docs
make dashboard # http://localhost:3000Contributions are welcome. See CONTRIBUTING.md for setup, style guide, and the quality gate. See CODE_OF_CONDUCT.md for community standards.
MIT — see LICENSE.
Commercial cloud and enterprise features: see COMMERCIAL.md.