Skip to content

Latest commit

 

History

History
181 lines (119 loc) · 6.2 KB

File metadata and controls

181 lines (119 loc) · 6.2 KB

MiroFish Workflow Map

This file captures the MiroFish pipeline in operator terms and ties each stage back to concrete engine behavior.

Unless stated otherwise, file paths in this document refer to the upstream MiroFish engine repository, not to this guide repository.

Stage 1: Source Material In

MiroFish accepts uploaded source files and uses them as the basis for graph extraction and simulation setup.

Code-grounded facts:

  • accepted file types: pdf, md, txt, markdown
  • upload size limit: 50 MB
  • text is split into chunks before graph ingestion
  • default chunk settings in the graph builder are chunk_size=500 and chunk_overlap=50

Why it matters:

  • the engine does not magically invent missing stakeholders;
  • sparse or one-sided source material creates a sparse or biased graph;
  • the quality of later personas is constrained by what survives extraction here.

Practical guidance:

  • include named entities, relationships, dates, numbers, and competing viewpoints;
  • use one focused scenario per source package;
  • avoid giant mixed-context dumps when the simulation question is narrow.
  • make temporal order explicit when the scenario depends on changing facts.

Stage 2: Graph Build In Zep

The graph builder creates a standalone Zep graph, sets ontology, chunks text, uploads episodes, and waits for Zep processing to complete.

Relevant engine areas:

  • backend/app/services/graph_builder.py
  • backend/app/api/graph.py

Operator implications:

  • graph quality is constrained by both source text and ontology quality;
  • if the graph is weak, later stages inherit that weakness;
  • graph build is asynchronous, so operators should watch task state instead of assuming completion.

Stage 3: Entity Filtering

MiroFish does not simulate every possible node blindly. It reads graph entities, filters them, and enriches them with context before simulation preparation.

Relevant engine areas:

  • backend/app/services/zep_entity_reader.py
  • backend/app/api/simulation.py

Operator implications:

  • agent count depends on filtered entities, not on a hardcoded persona list;
  • if you want better agents, improve extraction quality and entity relevance first;
  • inspect entity types before concluding the engine "made bad personas".

Stage 4: OASIS Profile Generation

Each retained entity can become an OASIS profile for Twitter and Reddit simulation.

Relevant engine areas:

  • backend/app/services/oasis_profile_generator.py
  • backend/scripts/test_profile_format.py

Code-grounded facts:

  • profiles include fields such as persona, bio, mbti, country, profession, and platform-specific counters;
  • MiroFish can enrich profile generation with additional Zep search context;
  • generated files include reddit_profiles.json and twitter_profiles.csv.

Practical guidance:

  • do not judge this stage only by the agent name;
  • inspect persona richness and whether the entity source was relevant;
  • if personas are generic, the first suspects are weak source material and weak extracted context.

Stage 5: Simulation Config Generation

The simulation config is generated by an LLM from the simulation requirement, source text, and filtered entities.

Relevant engine areas:

  • backend/app/services/simulation_config_generator.py
  • backend/app/services/simulation_manager.py

Code-grounded defaults worth knowing:

  • time config defaults to 72 simulated hours
  • minutes_per_round defaults to 60
  • activity assumptions are centered on a China-style daily rhythm
  • peak hours default to 19-22
  • off-peak hours default to 0-5

Operator implications:

  • a vague simulation requirement produces a vague config;
  • if your scenario is not China-centric, note that in the requirement or adjust downstream expectations;
  • generated config quality is part prompt quality, part entity quality, part model quality.

Stage 6: Runtime Execution

MiroFish can run Twitter, Reddit, or both in parallel and records run state continuously.

Relevant engine areas:

  • backend/app/services/simulation_runner.py
  • backend/scripts/run_parallel_simulation.py
  • backend/scripts/run_twitter_simulation.py
  • backend/scripts/run_reddit_simulation.py

Generated simulation artifacts usually include:

  • state.json
  • simulation_config.json
  • reddit_profiles.json
  • twitter_profiles.csv
  • run_state.json
  • twitter/actions.jsonl
  • reddit/actions.jsonl
  • env_status.json
  • twitter_simulation.db
  • reddit_simulation.db

Important interpretation detail:

  • "number of rounds" is not a reliable proxy for "number of LLM calls";
  • some runtime behavior is driven by pre-generated profiles and environment state, not by a fresh LLM completion every round.

Stage 7: Report Generation

The report stage is where MiroFish performs structured reasoning over the simulation outputs using tool-backed analysis.

Relevant engine areas:

  • backend/app/services/report_agent.py
  • backend/app/services/zep_tools.py
  • backend/app/api/report.py

Code-grounded facts:

  • the report agent follows a ReACT-style loop;
  • tool calls include insight_forge, panorama_search, quick_search, and interview_agents;
  • the section-generation prompt requires at least 3 tool calls and the hard cap is 5;
  • report logs are stored separately from runtime logs.

Generated report artifacts include:

  • reports/<report_id>/agent_log.jsonl
  • reports/<report_id>/console_log.txt

Operator implications:

  • report quality depends heavily on model quality;
  • report debugging should start from these artifacts, not from guesswork about the final prose.
  • final markdown is a summary layer, not the primary evidence layer.

End-To-End Quality Heuristic

When a MiroFish result is weak, inspect stages in this order:

  1. source material quality
  2. graph extraction quality
  3. entity relevance
  4. profile richness
  5. simulation requirement specificity
  6. runtime artifact health
  7. report logs

That order prevents you from trying to fix report quality at the very end when the actual problem started at the beginning.

For the operator loop around those stages, use references/operator-workflow.md. For graph-build failures, runtime evidence, and report verification, also use:

  • references/graph-build-runbook.md
  • references/runtime-forensics.md
  • references/report-audit.md