Agent Observability

An educational project for learning about OpenTelemetry instrumentation for AI agents and implementing evaluations (evals) for LLM-based applications.

Purpose

This project demonstrates:

OpenTelemetry Integration: How to instrument AI agents with distributed tracing using OpenTelemetry semantic conventions
Multi-Backend Export: Sending telemetry data to multiple observability backends simultaneously (Langfuse + Azure Application Insights)
Agent Evals: Running experiments against datasets and evaluating agent performance with custom evaluators
LLM Observability Patterns: Tracing generations, tool calls, chains, and agent loops

Demo Setup

Data Flow

Agent generates traces using OpenTelemetry SDK with GenAI semantic conventions
OTLP Exporter sends traces to the OpenTelemetry Collector via gRPC
Collector batches and forwards traces to multiple backends:
- Langfuse: For LLM-specific observability, prompt management, and evals
- Azure Application Insights: For APM, dashboards, and alerting

Project Structure

agent-observability/
├── agent-core/              # Core agent abstractions
│   ├── ChatCompletion/      # Chat completion interfaces and models
│   ├── Prompts/             # Prompt provider interfaces
│   ├── Providers/           # OpenAI provider, Langfuse prompt provider
│   └── Tools/               # Tool registry and execution
│
├── agent-telemetry/         # OpenTelemetry instrumentation
│   ├── Services/            # AgentTelemetry implementation
│   ├── Models/              # TelemetryScope, GenerationScope
│   └── Constants/           # GenAI semantic attributes
│
├── agent-evals/             # Evaluation framework
│   ├── Evaluators/          # NLP and trajectory evaluators
│   ├── Services/            # ExperimentRunner, EvaluationRunner
│   └── Models/              # EvaluationContext, EvaluationResult
│
├── agent-cli/               # CLI utilities for interactive chat
│
├── demo/
│   └── manual-instrumented-agent/   # Demo agent with manual instrumentation
│       ├── Commands/        # CLI commands (chat, experiment, evaluate)
│       └── Tools/           # Sample tools (dice, cards)
│
├── docker-compose.yml       # OpenTelemetry Collector setup
├── otel-collector-config.yaml
└── env.template             # Environment variables template

Prerequisites

.NET 8 SDK
Docker (for OpenTelemetry Collector)
OpenAI API key
Langfuse account (free tier available at cloud.langfuse.com)
(Optional) Azure Application Insights resource

Setup

1. Configure Environment Variables

Copy the environment template and fill in your credentials:

cp env.template .env

Edit .env with your values:

# Langfuse - Base64 encode your keys: echo -n "pk-lf-xxx:sk-lf-xxx" | base64
LANGFUSE_BASE_URL=https://cloud.langfuse.com
LANGFUSE_AUTH=<base64-encoded-public:secret>

# Azure Application Insights (optional)
APPLICATIONINSIGHTS_CONNECTION_STRING=<your-connection-string>

2. Configure the Demo Agent

Create demo/manual-instrumented-agent/appsettings.local.json:

{
  "OpenAI": {
    "ApiKey": "sk-..."
  },
  "Langfuse": {
    "PublicKey": "pk-lf-...",
    "SecretKey": "sk-lf-..."
  }
}

3. Start the OpenTelemetry Collector

docker-compose up -d

Verify it's running:

docker-compose logs otel-collector

Running the Demo

Interactive Chat

Start a chat session with the agent:

cd demo/manual-instrumented-agent
dotnet run

Or explicitly run the chat command:

dotnet run -- chat

The agent has access to sample tools (dice rolling, card dealing) and all interactions are traced to your configured backends.

Run an Experiment

Execute the agent against a Langfuse dataset:

dotnet run -- experiment --dataset <dataset-name>

Options:

--dataset or -d: Name of the Langfuse dataset to run against
--run or -r: Custom name for this experiment run

Evaluate a Run

Evaluate a completed experiment run:

dotnet run -- evaluate --dataset <dataset-name> --run <run-name>

Options:

--dataset or -d: Name of the dataset
--run or -r: Name of the run to evaluate

Viewing Traces

Langfuse

Go to cloud.langfuse.com
Navigate to Traces to see agent executions
Click on a trace to see the full hierarchy:
- Root trace (session)
- Agent span
- Generation spans (LLM calls)
- Tool spans

Azure Application Insights

Go to your Application Insights resource in Azure Portal
Navigate to Transaction Search or Application Map
Use End-to-end transaction details to explore distributed traces

Key Concepts

OpenTelemetry Semantic Conventions

This project follows GenAI semantic conventions for LLM observability:

Attribute	Description
`gen_ai.system`	The AI provider (e.g., "openai")
`gen_ai.request.model`	Model used for generation
`gen_ai.operation.name`	Type of operation (chat, tool, etc.)
`gen_ai.usage.input_tokens`	Input token count
`gen_ai.usage.output_tokens`	Output token count

Trace Hierarchy

Trace: "Manual Agent"
└── Agent: "Manual Agent"
    ├── Chain: "Call LLM"
    │   └── Generation: "gpt-4o-mini"
    ├── Chain: "Tools"
    │   └── Tool: "roll_dice"
    └── Chain: "Call LLM"
        └── Generation: "gpt-4o-mini"

Evaluators

The agent-evals library includes:

NLP Evaluators: Semantic similarity, keyword matching, response length
Trajectory Evaluators: Tool usage validation, step counting, expected behavior verification

Development

Building All Projects

dotnet build

Adding Custom Tools

Create a tool class in demo/manual-instrumented-agent/Tools/
Decorate with [Tool] attribute
Register in DemoAgent.cs

Example:

[Tool("my_tool", "Description of what the tool does")]
public class MyTool
{
    public static string Execute(string param1, int param2)
    {
        return $"Result: {param1}, {param2}";
    }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Observability

Purpose

Demo Setup

Data Flow

Project Structure

Prerequisites

Setup

1. Configure Environment Variables

2. Configure the Demo Agent

3. Start the OpenTelemetry Collector

Running the Demo

Interactive Chat

Run an Experiment

Evaluate a Run

Viewing Traces

Langfuse

Azure Application Insights

Key Concepts

OpenTelemetry Semantic Conventions

Trace Hierarchy

Evaluators

Development

Building All Projects

Adding Custom Tools

Troubleshooting

Collector Not Receiving Traces

Traces Not Appearing in Langfuse

Azure Monitor Export Failing

Resources

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
agent-cli		agent-cli
agent-core		agent-core
agent-evals		agent-evals
agent-telemetry		agent-telemetry
demo/manual-instrumented-agent		demo/manual-instrumented-agent
documentation/assets		documentation/assets
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
env.template		env.template
otel-collector-config.yaml		otel-collector-config.yaml

carllapierre/agent-observability

Folders and files

Latest commit

History

Repository files navigation

Agent Observability

Purpose

Demo Setup

Data Flow

Project Structure

Prerequisites

Setup

1. Configure Environment Variables

2. Configure the Demo Agent

3. Start the OpenTelemetry Collector

Running the Demo

Interactive Chat

Run an Experiment

Evaluate a Run

Viewing Traces

Langfuse

Azure Application Insights

Key Concepts

OpenTelemetry Semantic Conventions

Trace Hierarchy

Evaluators

Development

Building All Projects

Adding Custom Tools

Troubleshooting

Collector Not Receiving Traces

Traces Not Appearing in Langfuse

Azure Monitor Export Failing

Resources

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages