Fine-tuning language models using formal methods feedback for safe autonomous control policy generation.
https://colab.research.google.com/drive/1GglJdSaECa6XSV3iF1mS0UaVGiYiyZZA?usp=sharing
This code reproduces results from the paper "Fine-Tuning Language Models Using Formal Methods Feedback". It combines Direct Preference Optimization (DPO) with automated formal verification to train language models that generate safe control policies.
Key Techniques:
- DPO (Direct Preference Optimization): Preference learning without reward models (Rafailov et al., 2023)
- Formal Verification: Automated feedback using transition systems and safety specifications
- LoRA: Parameter-efficient fine-tuning (Hu et al., 2021)
pip install -r requirements.txtpython reproduce_paper.pyThis runs the complete pipeline and shows improvement in specification satisfaction rate (60% → 90% as reported in paper).
python visualize_training.py --demoCreates sample plots showing Loss, Accuracy, Marginal Preference, and specification satisfaction analysis.
python src/tests/demo.py1. Verification Demo
python main.py --mode demo2. Generate Training Dataset (requires GPU)
python main.py --mode generate --n-responses 43. Train Model (requires GPU)
python main.py --mode train --epochs 34. Test Trained Model
python main.py --mode test5. Evaluate and Compare Models
python main.py --mode evaluate6. Iterative Training (multiple refinement iterations)
python main.py --mode iterative --num-iterations 37. Visualize Training Results
# After training, generate visualization plots
python visualize_training.py --metrics dpo_model/training_metrics.json --eval evaluation_results.jsonRun All Steps
python main.py --mode fulljupyter notebook main.ipynbfrom logictune import (
build_traffic_intersection_model,
parse_response_to_fsa,
score_response,
DPODatasetGenerator,
train_dpo,
compare_models,
iterative_training
)
# Build environment model
system = build_traffic_intersection_model()
# Parse and verify a controller
response = "1. If green, go. 2. If yellow, stop. 3. If red, stop."
controller_fsa = parse_response_to_fsa(response)
score, results = score_response(system, controller_fsa, verbose=True)
# Generate training dataset
generator = DPODatasetGenerator()
generator.generate_dataset(output_path="dpo_dataset.jsonl")
# Train with DPO (single iteration)
train_dpo(
dataset_path="dpo_dataset.jsonl",
output_dir="trained_model"
)
# Evaluate improvement
comparison = compare_models(
base_model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
fine_tuned_model="trained_model"
)
# Or use iterative training (multiple refinement cycles)
results = iterative_training(
base_model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
initial_output_dir="dpo_model_iter",
num_iterations=3
)LogicTune/
├── src/
│ ├── logictune/ # Core library
│ │ ├── __init__.py
│ │ ├── environment.py # Transition system models
│ │ ├── parser.py # LLM response → FSA (GLM2FSA)
│ │ ├── verifier.py # Formal verification & scoring
│ │ ├── data_generator.py # DPO dataset generation
│ │ ├── trainer.py # DPO training
│ │ ├── evaluator.py # Model evaluation & comparison
│ │ └── iterative_trainer.py # Iterative refinement
│ └── tests/ # Test and demo scripts
├── main.py # Main execution script
├── main.ipynb # Jupyter notebook
└── requirements.txt
- Environment Model: Define transition system with safety specifications
- Response Generation: LLM generates diverse control policies (high temperature)
- Formal Verification: Each response scored against 15 LTL specifications (Φ1-Φ15)
- Preference Pairs: Create (chosen, rejected) pairs based on scores
- DPO Training: Fine-tune model to prefer high-scoring (safe) responses
- Metrics Tracking: Automatic logging of loss, accuracy, marginal preference, and spec satisfaction
- Evaluation: Compare base vs fine-tuned models on specification satisfaction rate
- Visualization: Generate plots for training analysis and paper figures
- Iterative Refinement: Optional multi-iteration training for continuous improvement
The verifier checks all 15 specifications from the paper:
| Spec | Description |
|---|---|
| Φ1 | If pedestrian present, eventually stop |
| Φ2 | No left turn when opposite car without green left-turn light |
| Φ3 | No going straight on non-green light |
| Φ4 | If stop sign, eventually stop |
| Φ5 | No right turn when car from left or pedestrian at right |
| Φ6 | Valid action always available |
| Φ7 | If green light, eventually proceed |
| Φ8 | If not green light, eventually stop |
| Φ9 | No turning when car from left |
| Φ10 | If green light, eventually proceed |
| Φ11 | Right turn on non-green only when no car from left |
| Φ12 | Left turn without green arrow only when clear |
| Φ13 | After stopping at sign with no cars, eventually proceed |
| Φ14 | No going straight with pedestrian in front |
| Φ15 | Right turn at stop sign only when no car from left |
LogicTune includes comprehensive visualization tools for analyzing training:
- Training Metrics Line Charts - Loss, Accuracy, Marginal Preference over 200 epochs with shaded error bands
- Box and Whisker Plot - Specification satisfaction distribution vs training epoch
- Grouped Bar Chart - Per-specification satisfaction rates (Base vs Fine-Tuned)
All metrics are automatically tracked during training.
- Python 3.8+
- PyTorch 2.0+
- Matplotlib 3.5+ and Seaborn 0.12+ (for visualization)
- GPU recommended for training (CPU works for demo/verification)
This work implements techniques from:
- Main Paper: "Fine-Tuning Language Models Using Formal Methods Feedback"
- DPO: Rafailov et al. "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (NeurIPS 2023)
- LoRA: Hu et al. "LoRA: Low-Rank Adaptation of Large Language Models" (ICLR 2022)