Agent Reasoning Evaluator is a lightweight web app built with Streamlit that analyzes and annotates reasoning trajectories of AI agents.
It simulates how intelligent agents reason, make decisions, and sometimes make logical or factual errors β allowing users to evaluate and visualize their performance.
The project was inspired by agentic AI concepts and practical insights from workshops on NVIDIA NeMo Toolkit and AgentToolkit.
π https://agent-reasoning-evaluator.streamlit.app/
- β
Upload agent reasoning data (
agent_data.json) - π Automatically evaluate outputs for factual or logical errors
- π§Ύ Generate structured annotations (
annotations.json) - π View accuracy metrics and error distributions
- πΎ Download annotated results
- π Clean and intuitive Streamlit interface
- Python 3.11+
- Streamlit
- Pandas
- JSON
agent-reasoning-evaluator/ β βββ app.py # Streamlit dashboard βββ evaluate_agent.py # Command-line evaluation script βββ agent_data.json # Sample reasoning trajectories βββ annotations.json # Generated annotations βββ summary.csv # Summary results βββ README.md
- Clone the repository
git clone https://github.com/<your-username>/agent-reasoning-evaluator.git cd agent-reasoning-evaluator
- Install dependencies pip install streamlit pandas
- Run the Streamlit app streamlit run app.py
- Open your browser at locahost
π‘ Future Enhancements
Add detailed reasoning step visualization
Integrate with real-world agent reasoning traces
Enable multi-file uploads and bulk evaluation
Add charts for reasoning accuracy trends
π§βπ» Author
Chinmay Kadu Web Developer | AI Enthusiast | Agentic Systems Learner π https://agent-reasoning-evaluator.streamlit.app/ π§ chinmayrkadu@gmail.com