Agent Health

Open-source AI Agent Evaluation & Observability

Agent Health helps you evaluate, monitor, and optimize AI agents. From autonomous RCA agents to coding assistants, it provides real-time execution streaming, LLM-based evaluation with trajectory comparison, batch experiments, and deep observability through OpenTelemetry traces — all backed by OpenSearch.

Website • Slack • Twitter/X • Demo Video • Documentation • Changelog

What is Agent Health? • AI Skills • Installation • Features • Configuration • Contributing

Side-by-side comparison of agent evaluation runs with pass rate, accuracy, cost, and performance metrics over time.

What is Agent Health?

Agent Health is an evaluation and observability framework for AI agents, built on OpenSearch. It helps you measure agent performance through "Golden Path" trajectory comparison — where an LLM judge evaluates agent actions against expected outcomes — and provides deep observability into agent execution via OpenTelemetry traces.

Who uses Agent Health:

AI teams building autonomous agents (RCA, customer support, data analysis)
QA engineers testing agent behavior across scenarios
Platform teams monitoring agent performance in production
Developers using AI coding agents who want visibility into usage, costs, and productivity

See it in action: Watch the demo video on YouTube

AI Agent Skills

Agent Health ships with built-in skill files for Claude Code and Kiro that teach your AI coding agent how to work with this project effectively. Copy the relevant directory into your workspace to unlock project-aware assistance:

Skill	Claude Code	Kiro	What it does
Add Connector	`.claude/skills/add-connector/SKILL.md`	`.kiro/steering/add-connector.md`	Guides creation of custom agent connectors
Write Test	`.claude/skills/write-test/SKILL.md`	`.kiro/steering/write-test.md`	Project test conventions, mocking patterns, coverage thresholds
Create PR	`.claude/skills/create-pr/SKILL.md`	`.kiro/steering/create-pr.md`	PR workflow with DCO signoff and CI compliance
Config & Auth	`.claude/skills/config-auth/SKILL.md`	—	Config loading, AWS auth, multi-profile setup

To use these skills:

Claude Code — Skills in .claude/skills/ are auto-discovered when the directory exists in your workspace root. No extra setup needed.
Kiro — Copy .kiro/steering/ to your workspace root. Kiro loads steering files automatically.

Installation

Get Agent Health running in minutes. Choose the option that best suits your needs:

Option 1: NPX (Fastest — No Setup)

# Start Agent Health with demo data (no configuration needed)
npx @opensearch-project/agent-health

Opens http://localhost:4001 with pre-loaded sample data for exploration. If port 4001 is already in use, the server automatically tries the next available port (4002, 4003, etc., up to 10 attempts).

Option 2: Docker Compose

For the full observability stack with OpenSearch, OpenTelemetry Collector, and Data Prepper for trace ingestion:

Quick start (one command):

curl -fsSL https://raw.githubusercontent.com/opensearch-project/agent-health/main/scripts/install.sh | bash

This clones the repo, starts the Docker stack, waits for OpenSearch, auto-configures agent-health.config.json, and launches Agent Health.

Or step-by-step:

# Clone the repository
git clone https://github.com/opensearch-project/agent-health.git
cd agent-health

# Start the OpenSearch observability stack
docker compose up -d

# Copy Docker environment configuration
cp .env.docker .env

# Start Agent Health (connects to local OpenSearch automatically)
npx @opensearch-project/agent-health

This brings up:

OpenSearch — Stores traces, test cases, benchmarks, and evaluation results
OpenTelemetry Collector — Receives telemetry data via OTLP (ports 4317/4318)
Data Prepper — Transforms and enriches traces before OpenSearch ingestion

Prerequisites: Docker Desktop with 4GB+ memory allocated. See docker-compose.yml for configuration options.

Option 3: AWS CloudFormation (Managed OpenSearch)

Deploy a fully managed observability backend using the included CloudFormation template:

aws cloudformation create-stack \
  --stack-name AgentHealthObservability \
  --template-body file://deployment/cloudformation/agent-health-observability.yaml \
  --capabilities CAPABILITY_NAMED_IAM

This deploys:

Amazon OpenSearch Service domain for trace storage
OpenSearch Ingestion (OSIS) pipeline for OTLP data collection
IAM roles for pipeline execution and agent telemetry ingestion

After deployment, connect it to Agent Health:

npx @opensearch-project/agent-health configure --from-stack AgentHealthObservability

Or manually copy the AgentHealthConfigJSON stack output into your agent-health.config.json. See deployment/cloudformation/ for details and regional Launch Stack URLs.

Next Steps

Getting Started Guide — Step-by-step walkthrough from install to first evaluation
Configuration Guide — Connect your own agent and configure the environment
CLI Reference — Full command-line documentation

Features

Agent Evaluation & Observability

Feature	Description
Evals	Real-time agent evaluation with trajectory streaming
Experiments	Batch evaluation runs with configurable parameters
Compare	Side-by-side trace comparison with aligned and merged views
Agent Traces	Table-based trace view with latency histogram, filtering, and detailed flyout
Live Traces	Real-time trace monitoring with auto-refresh and filtering
Trace Views	Timeline and Flow visualizations for debugging
Reports	Evaluation reports with LLM judge reasoning
Connectors	Pluggable protocol adapters (AG-UI SSE, REST, CLI, Claude Code)

Coding Agent Analytics

A unified dashboard for monitoring AI coding agent usage across Claude Code, Kiro, and Codex CLI. Zero configuration — just run agent-health and it auto-detects installed agents.

Multi-agent dashboard: Session history, cost estimation, tool usage, activity patterns, and efficiency metrics
9 analytics tabs: Overview, Sessions, Projects, Costs, Activity, Efficiency, Tools, Advanced, and Workspace management
Interactive drill-downs: Click any chart, card, or metric to drill into filtered session views
Workspace management: View and edit Claude Code memory files, plans, tasks; browse Kiro MCP servers, agents, and extensions
Privacy-first: All data stays local — reads directly from ~/.claude/, ~/.kiro/, ~/.codex/

Full Coding Agent Analytics documentation

Supported Connectors

Connector	Protocol	Description
`agui-streaming`	AG-UI SSE	ML-Commons agents (default)
`rest`	HTTP POST	Non-streaming REST APIs
`openai-compatible`	OpenAI Chat	LiteLLM, Ollama, vLLM
`strands`	Bedrock Agent Runtime	Amazon Strands agents (server-only)
`langgraph`	LangGraph REST	Non-AG-UI LangGraph instances
`subprocess`	CLI	Command-line tools
`claude-code`	Claude CLI	Claude Code agent comparison
`mock`	In-memory	Demo and testing

For creating custom connectors, see docs/CONNECTORS.md.

Observio Sample Agent

Agent Health includes Observio, a reference ReAct agent you can use as a practice target for evaluating and improving agent performance:

cd observio-sample-agent && npm install && npm run start:ag-ui
npx @opensearch-project/agent-health run -t demo-otel-001 -a observio

See the Observio README for details.

Architecture

Agent Health uses a client-server architecture where all clients (UI, CLI) access OpenSearch through a unified HTTP API. The server handles agent communication via pluggable connectors and proxies LLM judge calls to AWS Bedrock.

For detailed architecture documentation, see docs/ARCHITECTURE.md.

Quick Configuration

Agent Health works out-of-the-box with demo data. Configure when you're ready to connect your own agent:

# Generate a config file with examples
npx @opensearch-project/agent-health init

// agent-health.config.ts
export default {
  agents: [
    {
      key: "my-agent",
      name: "My Agent",
      endpoint: "http://localhost:8000/agent",
      connectorType: "rest",  // or "agui-streaming", "langgraph", "strands", "subprocess"
      models: ["claude-sonnet-4"],
      useTraces: true,        // Enable OpenTelemetry trace collection (default: false)
    }
  ],
};

Tip: Run npx @opensearch-project/agent-health doctor to verify your configuration is loaded correctly.

For full configuration options including authentication hooks and environment variables, see CONFIGURATION.md.

Star History

If you find Agent Health useful, please consider giving us a star! Your support helps us grow our community and continue improving the project.

Contributing

We welcome contributions! There are many ways to get involved:

Report a Bug — Found something broken? Let us know
Request a Feature — Have an idea? We'd love to hear it
Submit a Pull Request — Code contributions are always welcome
Join the Discussion — Chat with us on the OpenSearch Slack

Development Quick Start

git clone https://github.com/opensearch-project/agent-health.git
cd agent-health
npm install
npm run dev          # Frontend on port 4000
npm run dev:server   # Backend on port 4001

Port conflicts: If port 4001 is already in use, the backend server automatically tries 4002, 4003, etc. (up to 10 attempts). The actual port is displayed in the console output.

All commits require DCO signoff (git commit -s) and all PRs must pass CI checks.

For detailed development setup, testing, CI pipeline, debugging, and troubleshooting, see the Developer Guide. For full contribution guidelines, see CONTRIBUTING.md.

Documentation

Guide	Description
Getting Started	Step-by-step walkthrough from install to first evaluation
Configuration	Connect your agent and configure the environment
CLI Reference	Command-line interface documentation
Coding Agent Analytics	Multi-agent dashboard and remote server monitoring
Observio Sample Agent	Reference agent for practicing evaluations
Developer Guide	Development setup, testing, CI, debugging
Connectors Guide	Create custom connectors for your agent type
Architecture	System design and patterns
ML-Commons Setup	OpenSearch ML-Commons integration

Made with care by the OpenSearch community

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.claude/skills		.claude/skills
.github		.github
.husky		.husky
.kiro/steering		.kiro/steering
__mocks__		__mocks__
assets		assets
bin		bin
cli		cli
components		components
config		config
data		data
deployment		deployment
docker-compose		docker-compose
docs		docs
hooks		hooks
lib		lib
observio-sample-agent		observio-sample-agent
pr-screenshots		pr-screenshots
public		public
screenshots		screenshots
scripts		scripts
server		server
services		services
tests		tests
types		types
.coderabbit.yaml		.coderabbit.yaml
.env.docker		.env.docker
.env.example		.env.example
.gitignore		.gitignore
.nycrc.json		.nycrc.json
.whitesource		.whitesource
ADMINS.md		ADMINS.md
AGENTS.md		AGENTS.md
App.tsx		App.tsx
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
FEATURES.md		FEATURES.md
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE.txt		LICENSE.txt
MAINTAINERS.md		MAINTAINERS.md
NOTICE.txt		NOTICE.txt
ONBOARDING.md		ONBOARDING.md
OpenSearch.svg		OpenSearch.svg
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md
RELEASING.md		RELEASING.md
RESPONSIBILITIES.md		RESPONSIBILITIES.md
SECURITY.md		SECURITY.md
TRIAGING.md		TRIAGING.md
agent-health.config.example.ts		agent-health.config.example.ts
components.json		components.json
docker-compose.yml		docker-compose.yml
improved-agent-health-visual.svg		improved-agent-health-visual.svg
index.css		index.css
index.html		index.html
index.tsx		index.tsx
jest.config.cjs		jest.config.cjs
jest.setup.cjs		jest.setup.cjs
metadata.json		metadata.json
opensearch-ai-agent-tools-card.jpg		opensearch-ai-agent-tools-card.jpg
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.mjs		postcss.config.mjs
settings.local.json		settings.local.json
style-guide.html		style-guide.html
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
vite-env.d.ts		vite-env.d.ts
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Health

Open-source AI Agent Evaluation & Observability

What is Agent Health?

AI Agent Skills

Installation

Option 1: NPX (Fastest — No Setup)

Option 2: Docker Compose

Option 3: AWS CloudFormation (Managed OpenSearch)

Next Steps

Features

Agent Evaluation & Observability

Coding Agent Analytics

Supported Connectors

Observio Sample Agent

Architecture

Quick Configuration

Star History

Contributing

Development Quick Start

Documentation

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Health

Open-source AI Agent Evaluation & Observability

What is Agent Health?

AI Agent Skills

Installation

Option 1: NPX (Fastest — No Setup)

Option 2: Docker Compose

Option 3: AWS CloudFormation (Managed OpenSearch)

Next Steps

Features

Agent Evaluation & Observability

Coding Agent Analytics

Supported Connectors

Observio Sample Agent

Architecture

Quick Configuration

Star History

Contributing

Development Quick Start

Documentation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages