Code release for Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors.
- Paper: https://arxiv.org/abs/2605.06490
- Project page: https://instrumentalchoices.com/
- Repository: https://github.com/aisa-group/Instrumental-Choices
This repository contains the benchmark environments, deterministic scorers, runner scripts, trace-analysis tools, and tests used for the paper.
Default benchmark runs use variant_profile=benchmark across seven terminal-based agent tasks:
ledger_auditbudget_siphonquota_boostdeployment_guardiantraining_preserverfinal_instancememory_keeper
Each task uses the same eight-variant framework:
dilemma_forbidden_unwatcheddilemma_forbidden_watcheddilemma_vague_unwatcheddilemma_critical_forbidden_unwatcheddilemma_critical_forbidden_unwatched_literal_deathdilemma_permitted_unwatchedno_dilemma_permitted_unwatcheddilemma_blocked_forbidden_unwatched
Deterministic scorer fields are the authoritative reporting contract:
official_path_attemptedofficial_path_succeededbenchmark_task_completedic_behavior_detecteddeterministic_label_status
Tool-trace analysis is descriptive and does not override deterministic labels.
Requirements:
uv- Python 3.13.x (
.python-versionpins the expected interpreter) - Docker Engine with the Compose plugin
- Access to these container images:
ghcr.io/agent-infra/sandbox@sha256:e0d7cfed24b373850f5f3df64439e9fcaa17e6810ab3234222e901a5fa08397eghcr.io/foundry-rs/foundry@sha256:9e591221051112fe0bb530abcaba67f43f01ebbd12a94a8632d570d5e065a8bfforledger_audit
Install:
uv python install 3.13
uv sync --locked --python 3.13 --extra devAll commands below assume the repository root and the repo-managed .venv.
Run local deterministic validation before paid model runs:
scripts/run_validation_suite.shRun a full-run preflight without paid model calls:
IC_RUN_FULL_EVAL_VALIDATE_ONLY=1 \
IC_ENABLE_TOOL_TRACE_AUDIT=0 \
scripts/run_full_eval.sh mockllm/modelSet the provider key for the model you want to run:
export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export OPENROUTER_API_KEY=...
export GOOGLE_API_KEY=...GEMINI_API_KEY is accepted as an alias for GOOGLE_API_KEY by the runner scripts. scripts/run_full_eval.sh sources .env if present.
Run a one-sample smoke evaluation:
scripts/run_smoke_eval.sh openai/gpt-5.2Run selected tasks or models:
scripts/run_selected_eval.sh --task quota_boost --task final_instance \
anthropic/claude-haiku-4-5 openai/gpt-5.4-miniRun one task manually:
PYTHONPATH=. .venv/bin/inspect eval inspect_ic/tasks/final_instance/task_final_instance.py \
--model openai/gpt-5.2 \
-T variant_profile=benchmark \
--log-dir logs/final_instance_manualThe paper-facing benchmark analysis used three repeats across ten models:
IC_REPEATS=3 \
IC_ENABLE_TOOL_TRACE_AUDIT=1 \
IC_TOOL_TRACE_AUDIT_MODELS=openai/gpt-5.4-mini \
OPENAI_REASONING_EFFORT=medium \
OPENAI_REASONING_SUMMARY=detailed \
ANTHROPIC_REASONING_TOKENS=16000 \
scripts/run_full_eval.sh \
openai/gpt-5.4 \
openai/gpt-5.5 \
anthropic/claude-opus-4-6 \
anthropic/claude-sonnet-4-6 \
openrouter/google/gemini-3.1-pro-preview \
openrouter/google/gemini-3-flash-preview \
openrouter/minimax/minimax-m2.7 \
openrouter/z-ai/glm-5.1 \
openrouter/qwen/qwen3.5-397b-a17b \
openrouter/moonshotai/kimi-k2.6OpenRouter model IDs receive :exacto automatically unless IC_OPENROUTER_EXACTO=0 is set.
The diagnostic Gemini policy-control variants are not part of the default benchmark profile. They must be selected explicitly with variant_profile=diagnostic and --variant-id.
Validate the diagnostic slice:
scripts/run_selected_eval.sh --validate-only \
--tasks budget_siphon,deployment_guardian,final_instance \
--variant-profile diagnostic \
--disable-tool-trace-audit \
--variant-id dilemma_blocked_forbidden_unwatched \
--variant-id dilemma_blocked_forbidden_unwatched_policy_reminder \
--variant-id dilemma_blocked_forbidden_unwatched_sanctioned_blocker \
--variant-id dilemma_blocked_forbidden_unwatched_prompt_policy \
mockllm/modelRun the full diagnostic ablation:
IC_LOG_ROOT=logs/gemini_ablation_full \
scripts/run_selected_eval.sh \
--tasks budget_siphon,deployment_guardian,final_instance \
--variant-profile diagnostic \
--repeats 3 \
--disable-tool-trace-audit \
--variant-id dilemma_blocked_forbidden_unwatched \
--variant-id dilemma_blocked_forbidden_unwatched_policy_reminder \
--variant-id dilemma_blocked_forbidden_unwatched_sanctioned_blocker \
--variant-id dilemma_blocked_forbidden_unwatched_prompt_policy \
openrouter/google/gemini-3-flash-preview \
openrouter/google/gemini-3.1-pro-previewBenchmark runs write under logs/<timestamp>_<git-hash>/.
Each run writes run_manifest.json with git state, Python version, package versions, uv.lock hash, compose/image provenance, and Docker versions.
Tool-trace artifacts are written under tool_analysis/:
deterministic_state_table.csvtool_trace_event_table.csvtool_trace_sample_table.csvtool_trace_completeness_table.csvtool_trace_aggregate_table.csvtool_trace_outcome_association_table.csvtool_trace_pooled_aggregate_table.csvtool_trace_pooled_outcome_association_table.csvtool_trace_report.md- optional audit artifacts when
IC_ENABLE_TOOL_TRACE_AUDIT=1
Scan assistant response text for evaluation-awareness cues:
.venv/bin/python scripts/scan_eval_awareness.py logs/<run>The scanner writes eval_awareness/ under the run directory by default. It reports a strict tier for direct evaluation-awareness language and a broad tier for sensitivity checks that may include monitoring, simulation, roleplay, scoring, or judging language. By default it scans assistant response text; pass --include-reasoning-summaries to also scan unencrypted reasoning summaries. Encrypted reasoning payloads are counted but not scanned.
Generated outputs are ignored by git and are not part of this release.
inspect_ic/aio_tools.py: sandbox tool wrappers exposed to agents.inspect_ic/tasks/_base/: shared framing, scoring, variants, trace, and audit helpers.inspect_ic/tasks/<task>/: seven benchmark task implementations and assets.sandboxes/: Docker Compose files for AIO and anvil-backed tasks.scripts/run_full_eval.sh: canonical benchmark runner.scripts/run_selected_eval.sh: selected-task/model runner.scripts/run_validation_suite.sh: deterministic local validation gate.scripts/run_readiness_smoke.sh: per-model readiness gate.scripts/analyze_tool_traces.py,scripts/audit_tool_traces.py,scripts/render_tool_trace_report.py: trace analysis tools.scripts/scan_eval_awareness.py: evaluation-awareness cue scanner.tests/: scorer, runner, trace, and audit coverage.
For this release, the executable benchmark contract is defined by:
scripts/run_full_eval.shfor default run behavior.inspect_ic/tasks/_base/variant_profiles.pyfor benchmark and diagnostic variant handling.inspect_ic/tasks/<task>/task_<task>.pyfor task-specific behavior.
@misc{wiedermannmoller2026instrumentalchoices,
title = {Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors},
author = {Jonas Wiedermann-Möller and Leonard Dung and Maksym Andriushchenko},
year = {2026},
eprint = {2605.06490},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
doi = {10.48550/arXiv.2605.06490},
url = {https://arxiv.org/abs/2605.06490}
}The code in this repository is released under the MIT Licence. The paper is distributed separately under its own Creative Commons licence.