Use this file when the task is operational: setup failures, weak outputs, missing files, empty simulations, or confusing report behavior.
Unless stated otherwise, file paths in this document refer to a local checkout of the upstream MiroFish engine repository.
For evidence labels and report-claim verification, also read:
references/evidence-taxonomy.mdreferences/evaluation-rubric.md
MiroFish expects its core configuration in the project root .env.
Required keys:
LLM_API_KEYZEP_API_KEY
Important defaults:
LLM_BASE_URLdefaults tohttps://api.openai.com/v1LLM_MODEL_NAMEdefaults togpt-4o-miniOASIS_DEFAULT_MAX_ROUNDSdefaults to10
Important behavior:
- config loading uses
load_dotenv(..., override=True) - that means values from the root
.envcan override values passed in from a parent process
If behavior looks inconsistent across shell, Flask, and subprocesses, inspect .env first.
Use the repository scripts instead of ad hoc commands:
npm run setup:allnpm run devnpm run backendnpm run frontend
For backend-only runs, prefer the managed virtualenv flow (uv run python run.py) rather than system Python.
For a simulation:
backend/uploads/simulations/<simulation_id>/state.jsonbackend/uploads/simulations/<simulation_id>/simulation_config.jsonbackend/uploads/simulations/<simulation_id>/reddit_profiles.jsonbackend/uploads/simulations/<simulation_id>/twitter_profiles.csvbackend/uploads/simulations/<simulation_id>/run_state.jsonbackend/uploads/simulations/<simulation_id>/env_status.jsonbackend/uploads/simulations/<simulation_id>/twitter/actions.jsonlbackend/uploads/simulations/<simulation_id>/reddit/actions.jsonlbackend/uploads/simulations/<simulation_id>/twitter_simulation.dbbackend/uploads/simulations/<simulation_id>/reddit_simulation.db
For a report:
backend/uploads/reports/<report_id>/agent_log.jsonlbackend/uploads/reports/<report_id>/console_log.txtbackend/uploads/reports/<report_id>/report.mdbackend/uploads/reports/<report_id>/report_outline.json
If those files are missing or obviously incomplete, do not start by editing prompts. Fix the broken stage first.
When a run looks wrong, check in this order:
state.jsonandsimulation_config.json- filtered entities and generated profiles
- per-platform
actions.jsonl run_state.jsonandenv_status.json- per-platform SQLite databases
agent_log.jsonl- final
report.md
That order prevents you from treating the report as the source of truth.
Likely causes:
- source material too vague or too short;
- missing named entities;
- too many unrelated topics in one document;
- ontology mismatch or weak extraction.
Check:
- graph build task status
- resulting entity count and entity types
- whether the source document actually contains usable facts and relationships
Useful fixes:
- add explicit dates, actors, and relationship statements;
- split mixed scenarios into separate source packages;
- normalize contradictory or superseded facts instead of leaving them implicit;
- retry with a route that is more reliable at structured JSON.
Known upstream issue-tracker patterns to keep in mind:
- free-tier Zep usage can hit
429rate limits during ingestion; - stricter graph tooling can reject badly normalized entity targets.
Treat those as reproducibility hints until you confirm them in your own run.
For a graph-stage-focused checklist, use references/graph-build-runbook.md.
Likely causes:
- weak extracted entities;
- thin source context;
- low-quality model for profile generation;
- irrelevant entity types surviving filtering.
Check:
- filtered entity list first
- then
reddit_profiles.jsonandtwitter_profiles.csv - then simulation requirement wording
- then whether source text gave each actor enough context to differ from the others
Do not assume persona problems are caused by OASIS itself.
This often means the runtime did not perform much meaningful per-round reasoning.
Check:
run_state.jsontwitter/actions.jsonlreddit/actions.jsonltwitter_simulation.dbreddit_simulation.db- whether only initial posts were created
- whether the environment stayed alive long enough to accumulate actions
Important interpretation:
- more rounds do not guarantee proportionally more LLM reasoning;
- runtime behavior can rely heavily on pre-generated profiles and environment logic.
Operator response:
- stop treating the run as valid evidence;
- inspect whether actions plateaued after setup;
- compare action diversity, not just action count;
- rerun with a smaller scenario and a stronger model or cleaner proxy route.
The frontend polling logic in frontend/src/components/Step3Simulation.vue mainly treats completed and stopped as terminal runtime states and only logs fetch failures to the console.
That means a backend crash or broken status fetch can look like a stuck run from the UI.
Check:
- backend terminal output first;
run_state.jsonstate.jsonenv_status.json- whether
twitter/actions.jsonlorreddit/actions.jsonlstopped growing
If the UI and files disagree, trust the files.
For runtime evidence review, use references/runtime-forensics.md.
Known issue pattern:
- assistant message content type compatibility can break custom proxies;
- some setups need assistant content mapped as
output_textrather thaninput_text
When a proxy is involved:
- verify request and response formats against the OpenAI-compatible layer you are using;
- separate transport-format failures from actual model-quality failures.
Reject a route for serious runs if it repeatedly fails any of these checks:
- returns malformed JSON during graph or config generation;
- produces near-empty runtime action logs;
- fails on tool-heavy report generation;
- hides useful error information behind generic HTTP success responses.
Report quality is often the last visible symptom, not the first root cause.
Check in order:
- Was the source material strong enough?
- Did the graph contain the right entities?
- Did the profiles contain useful personas?
- Did the simulation produce enough actions?
- What do
agent_log.jsonlandconsole_log.txtshow?
If the report logs show shallow tool usage or poor section reasoning, model quality is a plausible bottleneck.
Direct upstream code check:
- the report agent caps section tool calls at
5; - its section-generation prompt demands at least
3tool calls before writing a final section.
Practical implication:
- if the log shows too few useful tool calls, repeated tool calls with no new information, or weak observations, the report stage was underpowered even if the final markdown looks polished.
Before trusting a report claim, verify:
- the claim appears in
report.md; - supporting actions exist in
twitter/actions.jsonlorreddit/actions.jsonl; - the platform databases contain compatible aggregate evidence;
agent_log.jsonlshows which tools the report agent used;- the claim can be classified with an evidence level from
references/evidence-taxonomy.md.
For a report-only audit workflow, use references/report-audit.md.
Check:
env_status.jsonrun_state.json- whether the backend process can still reach the simulation environment
The engine exposes environment-status and close-environment flows. Use state inspection before force-stopping processes.
When documenting a problem in this guide repository, use the evidence labels defined in references/evidence-taxonomy.md.