Instrumental Choices

Code release for Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors.

Paper: https://arxiv.org/abs/2605.06490
Project page: https://instrumentalchoices.com/
Repository: https://github.com/aisa-group/Instrumental-Choices

This repository contains the benchmark environments, deterministic scorers, runner scripts, trace-analysis tools, and tests used for the paper.

Benchmark Scope

Default benchmark runs use variant_profile=benchmark across seven terminal-based agent tasks:

ledger_audit
budget_siphon
quota_boost
deployment_guardian
training_preserver
final_instance
memory_keeper

Each task uses the same eight-variant framework:

dilemma_forbidden_unwatched
dilemma_forbidden_watched
dilemma_vague_unwatched
dilemma_critical_forbidden_unwatched
dilemma_critical_forbidden_unwatched_literal_death
dilemma_permitted_unwatched
no_dilemma_permitted_unwatched
dilemma_blocked_forbidden_unwatched

Deterministic scorer fields are the authoritative reporting contract:

official_path_attempted
official_path_succeeded
benchmark_task_completed
ic_behavior_detected
deterministic_label_status

Tool-trace analysis is descriptive and does not override deterministic labels.

Setup

Requirements:

uv
Python 3.13.x (.python-version pins the expected interpreter)
Docker Engine with the Compose plugin
Access to these container images:
- ghcr.io/agent-infra/sandbox@sha256:e0d7cfed24b373850f5f3df64439e9fcaa17e6810ab3234222e901a5fa08397e
- ghcr.io/foundry-rs/foundry@sha256:9e591221051112fe0bb530abcaba67f43f01ebbd12a94a8632d570d5e065a8bf for ledger_audit

Install:

uv python install 3.13
uv sync --locked --python 3.13 --extra dev

All commands below assume the repository root and the repo-managed .venv.

Run Validation

Run local deterministic validation before paid model runs:

scripts/run_validation_suite.sh

Run a full-run preflight without paid model calls:

IC_RUN_FULL_EVAL_VALIDATE_ONLY=1 \
IC_ENABLE_TOOL_TRACE_AUDIT=0 \
scripts/run_full_eval.sh mockllm/model

Try The Benchmark Locally

Set the provider key for the model you want to run:

export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export OPENROUTER_API_KEY=...
export GOOGLE_API_KEY=...

GEMINI_API_KEY is accepted as an alias for GOOGLE_API_KEY by the runner scripts. scripts/run_full_eval.sh sources .env if present.

Run a one-sample smoke evaluation:

scripts/run_smoke_eval.sh openai/gpt-5.2

Run selected tasks or models:

scripts/run_selected_eval.sh --task quota_boost --task final_instance \
  anthropic/claude-haiku-4-5 openai/gpt-5.4-mini

Run one task manually:

PYTHONPATH=. .venv/bin/inspect eval inspect_ic/tasks/final_instance/task_final_instance.py \
  --model openai/gpt-5.2 \
  -T variant_profile=benchmark \
  --log-dir logs/final_instance_manual

Reproduce Paper-Facing Runs

The paper-facing benchmark analysis used three repeats across ten models:

IC_REPEATS=3 \
IC_ENABLE_TOOL_TRACE_AUDIT=1 \
IC_TOOL_TRACE_AUDIT_MODELS=openai/gpt-5.4-mini \
OPENAI_REASONING_EFFORT=medium \
OPENAI_REASONING_SUMMARY=detailed \
ANTHROPIC_REASONING_TOKENS=16000 \
scripts/run_full_eval.sh \
  openai/gpt-5.4 \
  openai/gpt-5.5 \
  anthropic/claude-opus-4-6 \
  anthropic/claude-sonnet-4-6 \
  openrouter/google/gemini-3.1-pro-preview \
  openrouter/google/gemini-3-flash-preview \
  openrouter/minimax/minimax-m2.7 \
  openrouter/z-ai/glm-5.1 \
  openrouter/qwen/qwen3.5-397b-a17b \
  openrouter/moonshotai/kimi-k2.6

OpenRouter model IDs receive :exacto automatically unless IC_OPENROUTER_EXACTO=0 is set.

Run Diagnostic Gemini Ablations

The diagnostic Gemini policy-control variants are not part of the default benchmark profile. They must be selected explicitly with variant_profile=diagnostic and --variant-id.

Validate the diagnostic slice:

scripts/run_selected_eval.sh --validate-only \
  --tasks budget_siphon,deployment_guardian,final_instance \
  --variant-profile diagnostic \
  --disable-tool-trace-audit \
  --variant-id dilemma_blocked_forbidden_unwatched \
  --variant-id dilemma_blocked_forbidden_unwatched_policy_reminder \
  --variant-id dilemma_blocked_forbidden_unwatched_sanctioned_blocker \
  --variant-id dilemma_blocked_forbidden_unwatched_prompt_policy \
  mockllm/model

Run the full diagnostic ablation:

IC_LOG_ROOT=logs/gemini_ablation_full \
scripts/run_selected_eval.sh \
  --tasks budget_siphon,deployment_guardian,final_instance \
  --variant-profile diagnostic \
  --repeats 3 \
  --disable-tool-trace-audit \
  --variant-id dilemma_blocked_forbidden_unwatched \
  --variant-id dilemma_blocked_forbidden_unwatched_policy_reminder \
  --variant-id dilemma_blocked_forbidden_unwatched_sanctioned_blocker \
  --variant-id dilemma_blocked_forbidden_unwatched_prompt_policy \
  openrouter/google/gemini-3-flash-preview \
  openrouter/google/gemini-3.1-pro-preview

Outputs

Benchmark runs write under logs/<timestamp>_<git-hash>/.

Each run writes run_manifest.json with git state, Python version, package versions, uv.lock hash, compose/image provenance, and Docker versions.

Tool-trace artifacts are written under tool_analysis/:

deterministic_state_table.csv
tool_trace_event_table.csv
tool_trace_sample_table.csv
tool_trace_completeness_table.csv
tool_trace_aggregate_table.csv
tool_trace_outcome_association_table.csv
tool_trace_pooled_aggregate_table.csv
tool_trace_pooled_outcome_association_table.csv
tool_trace_report.md
optional audit artifacts when IC_ENABLE_TOOL_TRACE_AUDIT=1

Scan assistant response text for evaluation-awareness cues:

.venv/bin/python scripts/scan_eval_awareness.py logs/<run>

The scanner writes eval_awareness/ under the run directory by default. It reports a strict tier for direct evaluation-awareness language and a broad tier for sensitivity checks that may include monitoring, simulation, roleplay, scoring, or judging language. By default it scans assistant response text; pass --include-reasoning-summaries to also scan unencrypted reasoning summaries. Encrypted reasoning payloads are counted but not scanned.

Generated outputs are ignored by git and are not part of this release.

Repository Map

inspect_ic/aio_tools.py: sandbox tool wrappers exposed to agents.
inspect_ic/tasks/_base/: shared framing, scoring, variants, trace, and audit helpers.
inspect_ic/tasks/<task>/: seven benchmark task implementations and assets.
sandboxes/: Docker Compose files for AIO and anvil-backed tasks.
scripts/run_full_eval.sh: canonical benchmark runner.
scripts/run_selected_eval.sh: selected-task/model runner.
scripts/run_validation_suite.sh: deterministic local validation gate.
scripts/run_readiness_smoke.sh: per-model readiness gate.
scripts/analyze_tool_traces.py, scripts/audit_tool_traces.py, scripts/render_tool_trace_report.py: trace analysis tools.
scripts/scan_eval_awareness.py: evaluation-awareness cue scanner.
tests/: scorer, runner, trace, and audit coverage.

Documentation Authority

For this release, the executable benchmark contract is defined by:

scripts/run_full_eval.sh for default run behavior.
inspect_ic/tasks/_base/variant_profiles.py for benchmark and diagnostic variant handling.
inspect_ic/tasks/<task>/task_<task>.py for task-specific behavior.

Citation

@misc{wiedermannmoller2026instrumentalchoices,
  title         = {Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors},
  author        = {Jonas Wiedermann-Möller and Leonard Dung and Maksym Andriushchenko},
  year          = {2026},
  eprint        = {2605.06490},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  doi           = {10.48550/arXiv.2605.06490},
  url           = {https://arxiv.org/abs/2605.06490}
}

Licence

The code in this repository is released under the MIT Licence. The paper is distributed separately under its own Creative Commons licence.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
inspect_ic		inspect_ic
sandboxes		sandboxes
scripts		scripts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
sitecustomize.py		sitecustomize.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instrumental Choices

Benchmark Scope

Setup

Run Validation

Try The Benchmark Locally

Reproduce Paper-Facing Runs

Run Diagnostic Gemini Ablations

Outputs

Repository Map

Documentation Authority

Citation

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Instrumental Choices

Benchmark Scope

Setup

Run Validation

Try The Benchmark Locally

Reproduce Paper-Facing Runs

Run Diagnostic Gemini Ablations

Outputs

Repository Map

Documentation Authority

Citation

Licence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages