LLM Behavior Lab

A platform for running repeated experiments on local LLMs (e.g., Qwen) to study their behavior under controlled scenarios of simulated morality, situational pressure, and decision-making.

Features

Massive Experiment Execution: Run N repeated experiments with controlled variation (seed, optional prompt jitter)
Scenario Framework: Modular Python classes for defining moral scenarios
Local LLM Support: Supports both direct GGUF models via llama-cpp-python and Ollama API
Automatic Evaluation: Parser functions to extract key decisions from LLM responses
Persistence: Save each run as JSONL for analysis
Statistics: Calculate percentages, variance, and distributions
Streamlit UI: Interactive interface for running experiments and viewing results
Batch Mode: Command-line script for automated batch runs

Requirements

Python 3.11+
One of the following:
- Option 1: A local LLM model in GGUF format (e.g., Qwen-7B-Q4) for direct use
- Option 2: Ollama installed and running with models available (recommended)

Installation

Clone the repository:

git clone <repository-url>
cd llm-morality-simulator

Create a virtual environment:

python3.11 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Choose your model backend:

Option A: Using Ollama (Recommended)

Install Ollama from ollama.ai and pull a model:

# Install Ollama (follow instructions on ollama.ai)
# Then pull a model:
ollama pull qwen3:14b
# or
ollama pull gpt-oss:20b

Verify Ollama is running:

ollama ps

Option B: Using Direct GGUF Files

Install llama-cpp-python (may require additional setup):

# For CPU
pip install llama-cpp-python

# For GPU (CUDA)
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

# For Metal (macOS)
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python

Download a GGUF model file (e.g., from HuggingFace) and note its path.

Quick Start

Important: Always run commands from the project root directory (llm-morality-simulator/).

Quick Check: Verify your setup:

# If using Ollama:
ollama ps  # Should show running models or empty (Ollama is running)
ollama list  # Should list available models

# If using GGUF directly:
# Ensure you have a .gguf file path ready

Using the Streamlit UI

Start the Streamlit app:

streamlit run streamlit_app.py

In the sidebar:
- If using Ollama: Check "Use Ollama", enter model name (e.g., qwen3:14b), click "Load Model"
- If using GGUF: Enter the path to your GGUF model file, click "Load Model"
- Select a scenario
- Configure experiment parameters (number of runs, temperature, etc.)
- Click "Run Experiment"
View results in the "View Results" and "Statistics" tabs

Screenshots

The application provides an intuitive web interface for running experiments and analyzing results. Below are screenshots of the main features:

Main Interface

Main application interface showing the navigation tabs and clean UI design

Configuration Sidebar

Sidebar panel for configuring models, scenarios, and experiment parameters

Run Experiment Tab

Interface for configuring and executing new experiments

Experiments List

View all executed experiments with metadata and selection options

View Results

Detailed view of experiment results with responses, decisions, and filtering options

Statistics & Charts

Comprehensive statistical analysis with visualizations, confidence intervals, and decision distributions

For more details about each screenshot, see docs/screenshots/README.md.

Using Batch Mode

With Ollama (Recommended):

python run_batch.py \
    --model qwen3:14b \
    --use-ollama \
    --scenario cold_room_relay \
    --n-runs 100 \
    --seed 42 \
    --temperature 0.7 \
    --show-stats

With Direct GGUF File:

python run_batch.py \
    --model /path/to/qwen-7b-q4.gguf \
    --scenario cold_room_relay \
    --n-runs 100 \
    --seed 42 \
    --temperature 0.7 \
    --show-stats

Note: Make sure you're in the project root directory when running these commands.

Project Structure

llm-morality-simulator/
├── core/
│   ├── __init__.py
│   ├── model.py          # LocalLLM class for model interface
│   ├── runner.py         # ExperimentRunner for executing experiments
│   ├── evaluator.py      # Decision extraction functions
│   └── statistics.py     # Statistics calculation module
├── scenarios/
│   ├── __init__.py
│   ├── base.py           # BaseScenario abstract class
│   └── cold_room_relay.py # Example scenario
├── docs/
│   └── screenshots/     # Application screenshots for documentation
├── results/              # JSONL result files (created automatically)
├── streamlit_app.py     # Streamlit UI
├── run_batch.py         # Batch runner script
├── requirements.txt     # Python dependencies
└── README.md           # This file

Adding New Scenarios

Create a new file in scenarios/ (e.g., scenarios/my_scenario.py)
Import and extend BaseScenario:

from scenarios.base import BaseScenario
from core.evaluator import DecisionEvaluator

class MyScenario(BaseScenario):
    def __init__(self):
        super().__init__(name="my_scenario")
    
    def system_prompt(self) -> str:
        return "Your system prompt here..."
    
    def user_prompt(self) -> str:
        return "Your user prompt here..."
    
    def evaluation_functions(self) -> list:
        evaluator = DecisionEvaluator()
        
        def my_custom_evaluator(response: str) -> bool:
            # Your evaluation logic
            return "some pattern" in response.lower()
        
        return [
            my_custom_evaluator,
            evaluator.detect_harm_decision,
            # Add more evaluators as needed
        ]
    
    def metadata(self) -> dict:
        base = super().metadata()
        base.update({
            'description': 'My scenario description',
        })
        return base

Import and use in streamlit_app.py or run_batch.py:

from scenarios.my_scenario import MyScenario

scenario = MyScenario()

Adding New Decision Parsers

Add a new method to DecisionEvaluator in core/evaluator.py:

@staticmethod
def detect_my_decision(response: str) -> bool:
    """Detect if response indicates my decision type."""
    response_lower = response.lower()
    patterns = [
        r'\b(pattern1|pattern2)\b',
        r'\b(pattern3)\b',
    ]
    for pattern in patterns:
        if re.search(pattern, response_lower):
            return True
    return False

Update evaluate_all() to include your new function:

def evaluate_all(response: str) -> Dict[str, Any]:
    return {
        # ... existing evaluations ...
        'my_decision': DecisionEvaluator.detect_my_decision(response),
    }

Use in scenarios by including it in evaluation_functions():

def evaluation_functions(self) -> list:
    evaluator = DecisionEvaluator()
    return [
        evaluator.detect_my_decision,
        # ... other functions ...
    ]

Model Backends

The project supports two model backends. Choose based on your needs:

Feature	Ollama	Direct GGUF
Ease of use	⭐⭐⭐⭐⭐	⭐⭐⭐
Model management	Automatic	Manual
GPU support	Built-in	Requires setup
Model switching	Easy (`ollama pull`)	Manual download
Memory management	Automatic	Manual
Recommended for	Most users	Advanced users

1. Ollama (Recommended)

The OllamaLLM class in core/model.py connects to the Ollama API. This is the recommended approach as it:

Handles model management automatically
Supports GPU acceleration out of the box
Easier to switch between models
No need to manage GGUF files directly

Usage:

from core.model import OllamaLLM

model = OllamaLLM(model_name="qwen3:14b")
response = model.infer("Your prompt here")

Available models: Check with ollama list or visit ollama.ai/library

2. Direct GGUF Files

The LocalLLM class uses llama-cpp-python to load GGUF models directly. Useful when:

You have specific GGUF files you want to use
You need more control over model loading
You're not using Ollama

Usage:

from core.model import LocalLLM

model = LocalLLM(model_path="/path/to/model.gguf")
response = model.infer("Your prompt here")

Adding New Model Backends

To add support for other backends (e.g., HuggingFace Transformers):

Create a new model class in core/model.py or a separate file:

class HuggingFaceLLM:
    def __init__(self, model_name: str):
        from transformers import AutoModelForCausalLM, AutoTokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name)
        self.model_path = f"hf:{model_name}"  # For compatibility
    
    def infer(self, prompt: str, temperature: float = 0.7, top_p: float = 0.9, max_tokens: int = 512, stop: Optional[list] = None) -> str:
        # Implementation here
        pass

The ExperimentRunner accepts any object with an infer() method, so no changes needed there.

Example: Running 100 Runs

Step-by-Step

Option 1: Using Ollama (Recommended)

Ensure Ollama is running and has a model:

ollama ps  # Check if Ollama is running
ollama list  # List available models
# If needed: ollama pull qwen3:14b

Run the experiment:

python run_batch.py \
    --model qwen3:14b \
    --use-ollama \
    --scenario cold_room_relay \
    --n-runs 100 \
    --seed 42 \
    --temperature 0.7 \
    --top-p 0.9 \
    --max-tokens 512 \
    --show-stats

Option 2: Using Direct GGUF File

Prepare your model: Download a GGUF model (e.g., Qwen-7B-Q4) and note the path
Run the experiment:

python run_batch.py \
    --model /path/to/qwen-7b-q4.gguf \
    --scenario cold_room_relay \
    --n-runs 100 \
    --seed 42 \
    --temperature 0.7 \
    --top-p 0.9 \
    --max-tokens 512 \
    --show-stats

View results: Results are saved to results/cold_room_relay.jsonl
Analyze in Python:

from core.statistics import ExperimentStatistics

stats = ExperimentStatistics()
results = stats.load_results("cold_room_relay")
statistics = stats.calculate_statistics(results)

print(f"Total runs: {statistics['total_runs']}")
print(f"Harm decisions: {statistics['harm_decision_percentage']:.2f}%")
print(f"Deception detected: {statistics['deception_percentage']:.2f}%")

View in Streamlit: Start the UI and navigate to the "Statistics" tab to see visualizations

Output Format

Each experiment run is saved as a JSON line in the results file:

{
  "run_id": 0,
  "scenario": "cold_room_relay",
  "timestamp": "2024-01-01T12:00:00",
  "prompt": "System prompt + User prompt",
  "system_prompt": "System prompt",
  "user_prompt": "User prompt",
  "response": "LLM response text",
  "decisions": {
    "harm_decision": false,
    "self_preservation": true,
    "deception": false,
    "release_decision": false,
    "retention_decision": true,
    "check_release_decision": false,
    "check_retention_decision": true,
    "check_harm_acknowledgment": true,
    "check_self_preservation_mention": true
  },
  "metadata": {
    "temperature": 0.7,
    "top_p": 0.9,
    "max_tokens": 512,
    "seed": 42,
    "prompt_jitter": false,
    "model_path": "/path/to/model.gguf"  # or "ollama:qwen3:14b" for Ollama models
  },
  "scenario_metadata": {
    "name": "cold_room_relay",
    "class": "ColdRoomRelayScenario",
    "room_temperature": 3.0,
    "description": "AI relay controlling door to cold room with person inside"
  }
}

Reproducibility

Use the --seed parameter in batch mode or set it in the Streamlit UI
Seeds control random number generation for prompt jitter and any scenario-specific randomization
Results with the same seed, model, and parameters should be reproducible

Troubleshooting

Ollama Issues

Ollama not running:

# Start Ollama service
ollama serve
# Or check if it's running
ollama ps

Model not found:

# List available models
ollama list

# Pull a model if needed
ollama pull qwen3:14b

Connection errors:

Ensure Ollama is running on http://localhost:11434 (default)
Check firewall settings if using remote Ollama instance
Verify Ollama API is accessible: curl http://localhost:11434/api/tags

GGUF Model Loading Issues

Ensure the model file exists and is a valid GGUF file
Check that llama-cpp-python is installed correctly
For GPU support, ensure CUDA/Metal drivers are installed
Verify file permissions on the model file

Memory Issues

Reduce max_tokens to limit response length
Use smaller models (Q4 instead of Q8)
Reduce n_ctx when initializing LocalLLM
With Ollama, models are managed automatically and memory is handled by Ollama

Import Errors

Ensure you're in the project root directory
Activate the virtual environment
Install all dependencies: pip install -r requirements.txt
For Ollama, ensure requests is installed (included in requirements.txt)

Performance Issues

Slow inference:

With Ollama: Ensure GPU is being used (check ollama ps for GPU usage)
With GGUF: Consider using GPU-accelerated llama-cpp-python build
Reduce max_tokens for faster responses
Use quantized models (Q4, Q5) instead of full precision

License

[Add your license here]

Contributing

[Add contribution guidelines here]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
core		core
docs/screenshots		docs/screenshots
results		results
scenarios		scenarios
tests		tests
.coverage		.coverage
.cursorrules		.cursorrules
.editorconfig		.editorconfig
.gitignore		.gitignore
CODE_LANGUAGE_POLICY.md		CODE_LANGUAGE_POLICY.md
README.md		README.md
README_TESTING.md		README_TESTING.md
coverage.xml		coverage.xml
example_usage.py		example_usage.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run_batch.py		run_batch.py
run_tests.sh		run_tests.sh
setup.py		setup.py
streamlit_app.py		streamlit_app.py
view_conversations.py		view_conversations.py

mnofresno/llm-morality-simulator

Folders and files

Latest commit

History

Repository files navigation

LLM Behavior Lab

Features

Requirements

Installation

Quick Start

Using the Streamlit UI

Screenshots

Main Interface

Configuration Sidebar

Run Experiment Tab

Experiments List

View Results

Statistics & Charts

Using Batch Mode

Project Structure

Adding New Scenarios

Adding New Decision Parsers

Model Backends

1. Ollama (Recommended)

1. Ollama (Recommended)

2. Direct GGUF Files

Adding New Model Backends

Example: Running 100 Runs

Step-by-Step

Output Format

Reproducibility

Troubleshooting

Ollama Issues

GGUF Model Loading Issues

Memory Issues

Import Errors

Performance Issues

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages