This repository contains experimental code for Recontextualization: Mitigating Specification Gaming without Modifying the Specification. Each subdirectory provides a self-contained module for different experimental settings:
evaluation-metric-gaming/- Mitigating General Evaluation Gamingtest-case-hacking- Preventing Test Case Hacking in Code Generationdeception-evasion-honesty/- Preventing Learned Evasion of a Lie Detectorsycophantic-post-training/- Mitigating Emergence of Sycophancy in Post-training
Please refer to the individual README files in each subdirectory for specific setup and execution instructions.