GitHub - ulab-uiuc/consistencychecker: [ACL 2025 Main] A Python package that evaluates Large Language Models (LLMs) through a novel tree-based approach

Introduction

ConsistencyChecker is a Python package that evaluates Large Language Models (LLMs) through a novel tree-based approach. It tests an LLM's consistency by applying a series of transform-reverse operations (like translation) and measuring how well the model maintains semantic consistency across these transformations.

Features

🤖 Support for both API-based (via LiteLLM) and local LLM evaluation
🌲 Tree-based evaluation structure with customizable depth and branching
📊 Multiple similarity metrics for consistency evaluation
🧰 CLI tool for easy evaluation

Usage

Local embedding model is set to 4096 token truncation.

The dependencies are managed by poetry. First, install poetry by running:

pip install poetry

To install the dependencies, run:

poetry install

Then simply use ConsistencyChecker as a CLI tool. There are three parameter combinations:

Generate benchmark only:

llmcheck --config <path_to_config> --benchmark_output <path_for_saving_benchmark> --benchmark_only

Evaluate using existing benchmark:

llmcheck --config <path_to_config> --benchmark <path_to_benchmark> --result_output_folder <path_for_saving_results>

Generate benchmark and evaluate:

llmcheck --config <path_to_config> --result_output_folder <path_for_saving_results>

The acceptable config file format is a little bit complicated but it is explained in the config.yaml file. To avoid outputs writing on top of each other, we default the result file to be tagged with the current time.

Here are examples to run the 2 stages separately:

llmcheck --config config_coding.yaml --benchmark_output bench_coding.yaml --benchmark_only

llmcheck --config config_coding.yaml --benchmark bench_coding.yaml --result_output_folder output_folder

LLM Options

As we uselitellm to connect to the LLM, you will have access to all the models that litellm supports. You should set API keys as environment variables. For example, to use OpenAI's GPT-4, you should set

export OPENAI_API_KEY=your_api_key

vllm/Ollama

You are recommended to use vllm or ollama to access modest-sized open-source LLMs.

vllm

Here is a simple example of how to use vllm to serve a model. The number of GPUs varies depending on the model size and your setup. The following command demonstrates how to serve a model on 4 GPUs.

GPU_MEMORY_UTILIZATION is the fraction of GPU memory that the model will use. The TENSOR_PARALLEL_SIZE is the number of GPUs that will be used for tensor parallelism.

Here host is set to localhost, for if you use 0.0.0.0, you might end up exposing your served model to the internet without access control.

export NCCL_P2P_DISABLE=1
export OMP_NUM_THREADS=32
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve \
    meta-llama/Llama-3.1-70B-Instruct \
    --host 127.0.0.1 \
    --port 8001 \
    --max_model_len 65536 \
    --gpu_memory_utilization 0.8 \
    --tensor-parallel-size 4

ollama

Ollama comes as a installable package. If you do not have permission to install software on your machine, you can download a pre-compiled version of 'ollama' from the releases page.

Cite

If you use ConsistencyChecker in your work, please cite:

@misc{hong2025consistencycheckertreebasedevaluationllm,
      title={ConsistencyChecker: Tree-based Evaluation of LLM Generalization Capabilities},
      author={Zhaochen Hong and Haofei Yu and Jiaxuan You},
      year={2025},
      eprint={2506.12376},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2506.12376},
}

Wrench icons created by flatart_icons - Flaticon

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github		.github
assets		assets
data		data
docs		docs
examples		examples
llmcheck		llmcheck
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
llmcheck.png		llmcheck.png
logo.png		logo.png
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Features

Usage

LLM Options

vllm/Ollama

vllm

ollama

Cite

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

ulab-uiuc/consistencychecker

Folders and files

Latest commit

History

Repository files navigation

Introduction

Features

Usage

LLM Options

vllm/Ollama

vllm

ollama

Cite

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages