An autonomous MUD player with three main components:
- MH (Memory Head): Maintains situational awareness via parallel API calls
- DH (Decision Head): Chooses actions and updates goals
- Auto-Prompt-Engineering Loop (Critic → Engineer → Editor): Periodically reviews gameplay and improves the DH prompt
All testing was done on tbaMUD (based on CircleMUD/DikuMUD).
- Model: Configurable (default: gpt-4o-mini)
- Purpose: Maintains the agent's internal game state (situational awareness). Runs six API calls in parallel, one per file.
Per-file prompts (under prompts/): mh_current_location.txt, mh_session_summary.txt, mh_inventory.txt, mh_equipment.txt, mh_statbar.txt, mh_spells.txt.
Input (per call): New MUD output + previous content for that file.
Output: Updated memory files (current_location, session_summary, inventory, equipment, statbar, spells). On partial failure, that file keeps its previous value and a warning is logged.
- Model: Configurable (default: gpt-4o-mini)
- Purpose: (1) Chooses the next MUD command. (2) Updates goals based on outcomes.
Two modes:
- Action mode: Given full MH state + goals + game buffer, output the single best command. Also outputs a reason/rationale for the choice.
- Goals mode: Given state, action taken, and MUD output, update goals.md (mark completed, add new goals).
Action criteria (priority order): Survival → Progress → Exploration. Respects inventory/spell constraints (won't try to eat food not in inventory or cast unknown spells).
Output format (action mode):
Reason: <rationale>
Command: <command>
Every N steps (configurable via orchestrator.critic_interval), three agents run in sequence to improve the DH prompt:
- Model: Smarter model (e.g. gpt-4o)
- Input: All gameplay log entries since last critic run
- Output: Structured diagnosis — what's going well (2-4 bullets), what's not going well (2-4 bullets)
- Logs to:
data/logs/critic.jsonl
- Model: Smarter model (e.g. gpt-4o)
- Input: Critic's diagnosis + current DH prompt (prompts/dh.txt)
- Output: Strict, bounded edit instructions (max 2 one-sentence instructions) or
No changes needed. - Logs to:
data/logs/engineer_changes.jsonl
- Model: Cheaper model (e.g. gpt-4o-mini)
- Input: Engineer's edit instructions + current DH prompt
- Output: Complete new DH prompt (written to prompts/dh.txt), or unchanged prompt if instructions are malformed/invalid
This loop allows the agent to learn from mistakes (e.g. repeating failed commands) and improve its decision-making prompt automatically.
Guardrails currently in prompts:
- Engineer is constrained to tiny, deduplicated edits and must avoid metadata output like
Reason:/Command:. - Editor applies patch-style instructions only, preserves placeholders/section order, and falls back to unchanged prompt when instructions are invalid.
-
Kickoff: Send commands like
look,score,inventory,equipment,practiceto populate initial state. -
Main Loop (each step):
- MH Update: New MUD output → 6 parallel API calls → updated memory files
- DH Action: Build context → run_dh_action → (reason, command)
- Execute: Send command to MUD; wait for silence
- DH Goals: run_dh_goals(state, action, output) → update goals.md
- Debug log: Append to gameplay.jsonl
-
Auto-Prompt-Engineering (every N steps):
- Run Critic on gameplay log since last run
- Run Engineer with diagnosis + current DH prompt
- Run Editor to apply changes → write new prompts/dh.txt
-
Repeat until max_steps or disconnect.
Rollback workflow:
- Restore baseline prompts from
prompts/baselines/withpython scripts/reset_prompts.py(or--all). - Start orchestrator.
- If edits become repetitive/noisy again, reset prompts and continue.
Path: data/logs/gameplay.jsonl
Per-line JSON fields:
step: intmh_context: object with current_location, session_summary, statbar, goals, inventory, equipment, commands, spells, mobs, game_bufferaction: string (command sent)mud_output: string (MUD response)goals_after: string (goals.md after this step)
Key settings in config.yaml:
orchestrator:
critic_interval: 20 # Run critic/engineer/editor every N steps (null = disabled)
game_buffer_max_lines: 100
game_buffer_max_chars_for_critic: 16000
openai:
model: gpt-4o-mini # Default model (MH, DH)
model_critic: gpt-4o # Smarter model for critic
model_engineer: gpt-4o # Smarter model for engineer
model_editor: gpt-4o-mini # Cheaper model for editor
paths:
gameplay_log: data/logs/gameplay.jsonl
critic_log: data/logs/critic.jsonl
engineer_changes_log: data/logs/engineer_changes.jsonl- All agents are prompt-driven API calls (OpenAI).
- Memory files (except commands.md) are cleared at startup for a fresh run.
- Log files (gameplay, critic, engineer_changes) are also cleared at startup.
- Manual override: type a command in the orchestrator terminal to inject it instead of DH's choice.
- The DH prompt (prompts/dh.txt) may be modified by the auto-prompt-engineering loop during a run.
- Prompt baselines are stored in
prompts/baselines/and can be restored withscripts/reset_prompts.py.