Telemetron: Unified Telemetry for Multi-Agent Systems

Overview

Telemetron is a lightweight telemetry service that provides a unified, machine-readable snapshot of system state for multi-agent AI systems. Instead of chasing traces across fragmented observability tools, Telemetron exposes a single JSON endpoint that consolidates agent activity, runtime workload, task queues, and LLM usage into one comprehensive view.

The Problem

Modern multi-agent systems are inherently difficult to debug in production:

Agent reasoning traces stop at LLM invoke() calls
Real failures occur in containerized runtimes, Kubernetes pods, or external services
System behavior becomes fragmented across multiple observability contexts
Engineers spend 70% of debugging time on manual trace correlation ("data archaeology")

The Solution

Telemetron addresses this by providing:

Single unified schema for complete system state
Machine-first observability designed for AI debugging tools (Claude Code, etc.)
State representation over trace chasing
Minimal surface area focused purely on state exposure

Features

Unified System Snapshot

Single endpoint (/system/state) provides complete view of:

Agent State: Active tasks, authorized models, deployment mapping
Runtime Workload: Pod counts, resource usage, deployment status
Task Queues: Pending work, priorities, blocking relationships
LLM Usage: Rate limits, costs, provider status

Comprehensive Testing

81.8% coverage in service layer
67.4% coverage in repository layer
HTTP endpoint testing with JSON validation
Mock repositories for isolated testing

Clean Architecture

Separation of concerns (handlers, services, repositories, models)
Structured logging with Zap
Configuration management

Developer Experience

Swagger/OpenAPI documentation
Test coverage reporting
Race condition detection
Clean commit history

Quick Start

Prerequisites

Go 1.21 or higher
Git

Installation & Running

# Clone the repository
git clone <repository-url>
cd telemetron

# Install dependencies
go mod download

# Run the server
go run cmd/server/main.go

The server will start on http://localhost:8080

Basic Usage

# Get complete system state
curl http://localhost:8080/system/state

# View API documentation
open http://localhost:8080/swagger/

API Documentation

Primary Endpoint

`GET /system/state`

Returns unified system state as JSON conforming to the comprehensive schema.

Response Structure:

{
  "id": "system-instance-id",
  "agents": [
    {
      "name": "agent-name",
      "description": "Agent purpose",
      "max_parallel_invocations": 10,
      "deployment_name": "k8s-deployment",
      "models": ["gpt-4", "claude-3"],
      "activity": {
        "active_task_ids": [
          {
            "id": "task-123",
            "started_on": "2026-02-06T10:00:00Z",
            "status": "running"
          }
        ],
        "updated_at": "2026-02-06T10:05:00Z"
      }
    }
  ],
  "workload": [...],
  "queues": [...],
  "litellm": [...]
}

Additional Endpoints

GET / - Welcome message and navigation
GET /swagger/ - Interactive API documentation

Development

Project Structure

.
├── cmd/server/              # Application entry point
│   ├── main.go             # Server setup and routing
│   ├── main_test.go        # Integration tests
│   └── handler_test.go     # HTTP handler tests
├── internal/
│   ├── handlers/           # HTTP request handlers (placeholder)
│   ├── models/             # Data models and schemas
│   │   ├── system_state.go
│   │   └── system_state_test.go
│   ├── repositories/       # Data access layer (mock implementations)
│   │   ├── interfaces.go   # Repository contracts
│   │   ├── mock_agent.go   # Mock agent data
│   │   ├── mock_others.go  # Mock workload, queue, and LLM data
│   │   └── mock_test.go    # Repository tests
│   └── services/           # Business logic layer
│       ├── system_service.go
│       └── system_service_test.go
├── pkg/
│   ├── config/             # Configuration management (config.go)
│   └── logger/             # Structured logging (logger.go)
├── docs/                   # API documentation (Swagger/OpenAPI)
│   ├── docs.go
│   ├── swagger.json
│   └── swagger.yaml                # Utility scripts
├── go.mod                  # Go module definition
├── go.sum                  # Dependency checksums
└── README.md

Running Tests

# Run all tests with coverage
go test -v -cover ./...

# Generate coverage report
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.html

# Run with race detection (requires CGO)
CGO_ENABLED=1 go test -race ./...

Test Coverage

Package	Coverage
`internal/services`	81.8%
`internal/repositories`	67.4%
`cmd/server`	18.5%

Building

# Build for current platform
go build -o telemetron cmd/server/main.go

# Build for multiple platforms
GOOS=linux GOARCH=amd64 go build -o telemetron-linux cmd/server/main.go
GOOS=windows GOARCH=amd64 go build -o telemetron.exe cmd/server/main.go

Design Philosophy

Core Principles

Unified system snapshot over dashboards
- Structured JSON state rather than visual graphs
- Machine consumption over human-oriented dashboards
State representation over trace chasing
- Consolidated view instead of following trace IDs across systems
- Present runtime reality alongside agent intent
Machine-first observability
- Primary consumer is automated debugging agents
- Schema enforcement and clarity prioritized
Minimal surface area
- Focus strictly on state exposure
- Data collection and visualization out of scope

Schema-Driven Design

The system implements a comprehensive JSON schema that captures:

System Identity: Unique instance identification
Agent State: Active tasks, authorized models, deployment mapping
Runtime Workload: Kubernetes-aligned pod and resource metrics
Task Queues: Pending work with priorities and dependencies
LLM Usage: Rate limits, costs, provider configurations

This enables state diffing over time for debugging agents to reason about behavioral changes.

Configuration

Environment Variables

# Server configuration
SERVER_PORT=8080              # Default: 8080
LOG_LEVEL=info               # Default: info (debug, info, warn, error)

# Example
export SERVER_PORT=3000
export LOG_LEVEL=debug

Configuration File

The system uses pkg/config/config.go for centralized configuration management with sensible defaults.

Usage Scenarios

1. AI-Driven Debugging

import requests

# AI debugging agent queries system state
response = requests.get("http://telemetron:8080/system/state")
system_state = response.json()

# Analyze for anomalies
for agent in system_state["agents"]:
    if len(agent["activity"]["active_task_ids"]) > agent["max_parallel_invocations"]:
        print(f"Agent {agent['name']} is overloaded!")

2. Production Monitoring

# Health check script
curl -f http://telemetron:8080/system/state > /dev/null || exit 1

# State comparison for change detection  
curl http://telemetron:8080/system/state | jq . > current_state.json
diff previous_state.json current_state.json

3. Development Debugging

# Quick system overview
curl -s http://localhost:8080/system/state | jq '{
  agents: .agents | length,
  active_tasks: [.agents[].activity.active_task_ids[]] | length,
  queue_depth: [.queues[].tasks[]] | length
}'

Future Improvements

Real Kubernetes workload integration
Message queue connectivity (Kafka, RabbitMQ)
LiteLLM proxy integration
Agent activity collectors
Historical state storage
State diff endpoints
Webhook notifications
Performance metrics and alerting

Telemetron: Making multi-agent systems observable, debuggable, and explainable through unified state representation.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
cmd/server		cmd/server
docs		docs
internal		internal
pkg		pkg
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

Telemetron: Unified Telemetry for Multi-Agent Systems

Overview

The Problem

The Solution

Features

Unified System Snapshot

Comprehensive Testing

Clean Architecture

Developer Experience

Quick Start

Prerequisites

Installation & Running

Basic Usage

API Documentation

Primary Endpoint

GET /system/state

Additional Endpoints

Development

Project Structure

Running Tests

Test Coverage

Building

Design Philosophy

Core Principles

Schema-Driven Design

Configuration

Environment Variables

Configuration File

Usage Scenarios

1. AI-Driven Debugging

2. Production Monitoring

3. Development Debugging

Future Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /system/state`

Packages