Skip to content

chinmaykadu19/agent-reasoning-evaluator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Agent Reasoning Evaluator

🧠 Overview

Agent Reasoning Evaluator is a lightweight web app built with Streamlit that analyzes and annotates reasoning trajectories of AI agents.
It simulates how intelligent agents reason, make decisions, and sometimes make logical or factual errors β€” allowing users to evaluate and visualize their performance.

The project was inspired by agentic AI concepts and practical insights from workshops on NVIDIA NeMo Toolkit and AgentToolkit.


πŸš€ Live Demo

πŸ‘‰ https://agent-reasoning-evaluator.streamlit.app/


🧩 Features

  • βœ… Upload agent reasoning data (agent_data.json)
  • πŸ” Automatically evaluate outputs for factual or logical errors
  • 🧾 Generate structured annotations (annotations.json)
  • πŸ“Š View accuracy metrics and error distributions
  • πŸ’Ύ Download annotated results
  • 🌐 Clean and intuitive Streamlit interface

🧰 Tech Stack

  • Python 3.11+
  • Streamlit
  • Pandas
  • JSON

πŸ“ Project Structure

agent-reasoning-evaluator/ β”‚ β”œβ”€β”€ app.py # Streamlit dashboard β”œβ”€β”€ evaluate_agent.py # Command-line evaluation script β”œβ”€β”€ agent_data.json # Sample reasoning trajectories β”œβ”€β”€ annotations.json # Generated annotations β”œβ”€β”€ summary.csv # Summary results └── README.md

βš™οΈ How to Run Locally

  1. Clone the repository
    git clone https://github.com/<your-username>/agent-reasoning-evaluator.git
    cd agent-reasoning-evaluator
  2. Install dependencies pip install streamlit pandas
  3. Run the Streamlit app streamlit run app.py
  4. Open your browser at locahost

πŸ’‘ Future Enhancements

Add detailed reasoning step visualization

Integrate with real-world agent reasoning traces

Enable multi-file uploads and bulk evaluation

Add charts for reasoning accuracy trends

πŸ§‘β€πŸ’» Author

Chinmay Kadu Web Developer | AI Enthusiast | Agentic Systems Learner 🌐 https://agent-reasoning-evaluator.streamlit.app/ πŸ“§ chinmayrkadu@gmail.com

About

🧠 A Streamlit app that evaluates and visualizes reasoning trajectories of AI agents β€” built with Python and inspired by agentic AI workflows, reasoning analysis, and LLM evaluation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages