Entropy Sentinel

This repository contains the code for reproducing the experiments from the paper "Entropy Sentinel: Continuous LLM Accuracy Monitoring from Decoding Entropy Traces in STEM".

Overview

This project investigates whether entropy-based signatures from language models can effectively estimate accuracy on mathematical and scientific reasoning benchmarks. We evaluate multiple language models across various reasoning tasks and train classifiers to predict performance from internal model signals.

Project Structure

.
├── src/
│   ├── engine/          # Core experiment modules
│   ├── scripts/         # Execution scripts
│   └── data/            # Generated data (excluded from repo)
│       ├── features/    # Extracted entropy feature vectors  
│       ├── models/      # Saved classifier models 
│       └── runs/        # Stored activation profiles
└── requirements.txt     # Python dependencies

Setup

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Linux/Mac
# or
venv\Scripts\activate  # On Windows

Install dependencies:

pip install -r requirements.txt

Configure environment variables (if needed for evaluation):

cp .env.example .env  # Edit as needed

Reproducing Results

To reproduce the experiments, run the following scripts in order:

1. Store Activations

Runs language models on benchmarks and stores activation profiles:

bash src/scripts/store_activations.sh

This processes multiple model-benchmark combinations and saves entropy profiles to src/data/runs/.

2. Evaluate Runs

Evaluates model performance on the benchmarks:

bash src/scripts/evaluate_runs.sh

Computes accuracy metrics for each model-benchmark pair.

3. Generate Features

Extracts statistical features from entropy profiles:

bash src/scripts/generate_features.sh

Generates feature vectors from the stored activations and saves them to src/data/features/.

4. Train Classifiers

Trains accuracy prediction models:

python -m src.scripts.train_classifiers

Trains multiple classifier configurations (Random Forest, Logistic Regression, Neural Networks) to predict accuracy from entropy features.

Models and Benchmarks

The experiments evaluate the following models:

Phi-3 (3.8B parameters)
Qwen3 (4B and 8B parameters)
Ministral-3 (3B and 8B parameters)
Llama 3.1 (8B parameters)
Gemma 3 (4B and 12B parameters)
GPT-OSS (20B parameters)

On the following benchmarks:

Mathematical Reasoning: GSM8K, MATH (Hendrycks), SVAMP, GSM-Symbolic, LiveMathBench
Scientific Reasoning: GPQA, SciBench, TheoremQA, OlympiadBench, MatSciBench

Notes

The scripts process multiple configurations and may take several hours to complete
Intermediate results are saved to allow resuming if interrupted
Training scripts automatically skip existing models to enable easy resumption

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entropy Sentinel

Overview

Project Structure

Setup

Reproducing Results

1. Store Activations

2. Evaluate Runs

3. Generate Features

4. Train Classifiers

Models and Benchmarks

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Entropy Sentinel

Overview

Project Structure

Setup

Reproducing Results

1. Store Activations

2. Evaluate Runs

3. Generate Features

4. Train Classifiers

Models and Benchmarks

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages