Skip to content

DavidAkinpelu/energyevals

Repository files navigation

EnergyEvals

AI agent evaluation framework for energy analytics.

Features

  • ReAct Agent: Multi-provider LLM support (OpenAI, Anthropic, Google, DeepInfra)
  • Energy Tools: GridStatus, Tariffs, Renewables, Battery optimization, Dockets, Weather, Search
  • MCP Integration: External RAG and database tools via Model Context Protocol
  • Benchmark Framework: Evaluate agents across questions with metrics and comparison
  • Observability: JSON tracing with full execution data

Quick Start

# Install system dependencies (Ipopt solver for battery optimization)
sudo ./install.sh

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your keys

# Run a benchmark
python scripts/run_benchmark.py

Installation

System Dependencies

For battery optimization tools, install Ipopt solver:

# Debian/Ubuntu
sudo ./install.sh

The install script builds Ipopt and required third-party solvers from source, skipping Java test harness to avoid JDK issues.

Python Dependencies

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

For development (includes testing and linting tools):

pip install -r requirements-dev.txt

Configuration

API Keys

Create a .env file with your credentials:

# LLM Providers (at least one required)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
DEEPINFRA_API_KEY=...

# Tools (optional - enables specific functionality)
EXA_API_KEY=...                      # SearchTool
GRIDSTATUS_API_KEY=...               # GridStatusAPITool
OPENWEATHER_API_KEY=...              # OpenWeatherTool
OPEN_EI_API_KEY=...                  # TariffsTool
RENEWABLES_NINJA_API_KEY=...         # RenewablesTool

Copy .env.example for a template.

MCP Servers

MCP servers provide RAG and database access. They connect via remote URLs configured in your .env file. Set the URL env vars below to enable them; if neither is set, MCP is effectively disabled even with mcp.enabled: true.

RAG_SERVER_URL=https://energyevals-rag-mcp.tume.ai/sse
DATABASE_SERVER_URL=https://energyevals-db-mcp.tume.ai/sse

Usage

Ask a Question (Interactive)

The quickest way to use EnergyEvals is the interactive agent script. Type a question, get an answer:

# Start interactive mode (defaults to openai / gpt-4o-mini)
python scripts/run_agent.py

# Choose a provider
python scripts/run_agent.py -p anthropic
python scripts/run_agent.py -p google

# Pick a specific model
python scripts/run_agent.py -p openai -m gpt-4o

# Enable MCP tools (RAG + database)
python scripts/run_agent.py --mcp

# Run without tools (pure LLM)
python scripts/run_agent.py --no-tools

# Ask a single question (no interactive loop)
python scripts/run_agent.py -q "What are current ERCOT energy prices?"

Inside the interactive session, type your question at the > prompt. The agent will use its tools to research and answer. Type quit to exit.

Running Benchmarks

For detailed benchmark configuration, custom questions, evaluation, and multi-model comparison, see the Benchmark Guide. Benchmark runs require at least one explicit models entry in config; there is no provider/model fallback.

Multi-trial seed controls are configured in agent:

agent:
  num_trials: 3
  shuffle: true
  seed_mode: rotate              # fixed | rotate | random_per_trial
  seed: 12345                    # optional base seed
  # seeds: [101, 202, 303]       # optional explicit per-trial seeds

Architecture

ReAct Agent Loop

The agent uses a Reasoning-Acting loop:

  1. Thought: Analyze the question and plan next action
  2. Action: Select and execute a tool
  3. Observation: Process tool output
  4. Repeat: Continue until answer is complete

Maximum iterations default to 25 (configurable).

Provider Abstraction

A unified interface to run models from any major LLM provider:

  • OpenAI — GPT, O1, O3, and more
  • Anthropic — Claude models (Sonnet, Opus, Haiku)
  • Google — Gemini models (Flash, Pro)
  • DeepInfra — Open-source models (Llama, Mistral, and more)

Providers implement a common BaseProvider protocol with tool calling and streaming support.

Tool System

Tools are registered via the default tool registry in create_default_registry():

  1. Direct registration: Tools instantiated and registered in code
  2. MCP servers: External tools via Model Context Protocol

Each tool provides:

  • JSON schema for LLM tool calling
  • Async execution
  • Error handling with structured results

Observability

Traces capture full execution:

  • All ReAct steps (thought, action, observation)
  • Tool inputs/outputs
  • Token usage and latency
  • Failed calls with errors

Trace output is stored as local JSON or JSONL files.

Development

Run tests:

pytest

Lint and type check:

ruff check .
mypy energyevals

Documentation

About

AI agent evaluation framework for energy analytics — multi-provider LLMs, benchmark runner, MCP-based tool ecosystem.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors