This project is a UI developed for testing and comparing the performance of multiple large language models (LLMs). The interface is designed to support evaluation of different models' outputs using Chainlit as the backend for generating model responses.
The UI is configured to test the following models:
- Qwen/Qwen2-7B-Instruct
- Meta-Llama/Meta-Llama-3.1-8B-Instruct
- Google/Gemma-2-9B-IT
This LLM Testing UI includes belows
- Single-Turn
- Multi-Turn
- Select Model
- Select Hyper-parameter (temperature, top-p, ... etc)



