MiroFish Workflow Map

This file captures the MiroFish pipeline in operator terms and ties each stage back to concrete engine behavior.

Unless stated otherwise, file paths in this document refer to the upstream MiroFish engine repository, not to this guide repository.

Stage 1: Source Material In

MiroFish accepts uploaded source files and uses them as the basis for graph extraction and simulation setup.

Code-grounded facts:

accepted file types: pdf, md, txt, markdown
upload size limit: 50 MB
text is split into chunks before graph ingestion
default chunk settings in the graph builder are chunk_size=500 and chunk_overlap=50

Why it matters:

the engine does not magically invent missing stakeholders;
sparse or one-sided source material creates a sparse or biased graph;
the quality of later personas is constrained by what survives extraction here.

Practical guidance:

include named entities, relationships, dates, numbers, and competing viewpoints;
use one focused scenario per source package;
avoid giant mixed-context dumps when the simulation question is narrow.
make temporal order explicit when the scenario depends on changing facts.

Stage 2: Graph Build In Zep

The graph builder creates a standalone Zep graph, sets ontology, chunks text, uploads episodes, and waits for Zep processing to complete.

Relevant engine areas:

backend/app/services/graph_builder.py
backend/app/api/graph.py

Operator implications:

graph quality is constrained by both source text and ontology quality;
if the graph is weak, later stages inherit that weakness;
graph build is asynchronous, so operators should watch task state instead of assuming completion.

Stage 3: Entity Filtering

MiroFish does not simulate every possible node blindly. It reads graph entities, filters them, and enriches them with context before simulation preparation.

Relevant engine areas:

backend/app/services/zep_entity_reader.py
backend/app/api/simulation.py

Operator implications:

agent count depends on filtered entities, not on a hardcoded persona list;
if you want better agents, improve extraction quality and entity relevance first;
inspect entity types before concluding the engine "made bad personas".

Stage 4: OASIS Profile Generation

Each retained entity can become an OASIS profile for Twitter and Reddit simulation.

Relevant engine areas:

backend/app/services/oasis_profile_generator.py
backend/scripts/test_profile_format.py

Code-grounded facts:

profiles include fields such as persona, bio, mbti, country, profession, and platform-specific counters;
MiroFish can enrich profile generation with additional Zep search context;
generated files include reddit_profiles.json and twitter_profiles.csv.

Practical guidance:

do not judge this stage only by the agent name;
inspect persona richness and whether the entity source was relevant;
if personas are generic, the first suspects are weak source material and weak extracted context.

Stage 5: Simulation Config Generation

The simulation config is generated by an LLM from the simulation requirement, source text, and filtered entities.

Relevant engine areas:

backend/app/services/simulation_config_generator.py
backend/app/services/simulation_manager.py

Code-grounded defaults worth knowing:

time config defaults to 72 simulated hours
minutes_per_round defaults to 60
activity assumptions are centered on a China-style daily rhythm
peak hours default to 19-22
off-peak hours default to 0-5

Operator implications:

a vague simulation requirement produces a vague config;
if your scenario is not China-centric, note that in the requirement or adjust downstream expectations;
generated config quality is part prompt quality, part entity quality, part model quality.

Stage 6: Runtime Execution

MiroFish can run Twitter, Reddit, or both in parallel and records run state continuously.

Relevant engine areas:

backend/app/services/simulation_runner.py
backend/scripts/run_parallel_simulation.py
backend/scripts/run_twitter_simulation.py
backend/scripts/run_reddit_simulation.py

Generated simulation artifacts usually include:

state.json
simulation_config.json
reddit_profiles.json
twitter_profiles.csv
run_state.json
twitter/actions.jsonl
reddit/actions.jsonl
env_status.json
twitter_simulation.db
reddit_simulation.db

Important interpretation detail:

"number of rounds" is not a reliable proxy for "number of LLM calls";
some runtime behavior is driven by pre-generated profiles and environment state, not by a fresh LLM completion every round.

Stage 7: Report Generation

The report stage is where MiroFish performs structured reasoning over the simulation outputs using tool-backed analysis.

Relevant engine areas:

backend/app/services/report_agent.py
backend/app/services/zep_tools.py
backend/app/api/report.py

Code-grounded facts:

the report agent follows a ReACT-style loop;
tool calls include insight_forge, panorama_search, quick_search, and interview_agents;
the section-generation prompt requires at least 3 tool calls and the hard cap is 5;
report logs are stored separately from runtime logs.

Generated report artifacts include:

reports/<report_id>/agent_log.jsonl
reports/<report_id>/console_log.txt

Operator implications:

report quality depends heavily on model quality;
report debugging should start from these artifacts, not from guesswork about the final prose.
final markdown is a summary layer, not the primary evidence layer.

End-To-End Quality Heuristic

When a MiroFish result is weak, inspect stages in this order:

source material quality
graph extraction quality
entity relevance
profile richness
simulation requirement specificity
runtime artifact health
report logs

That order prevents you from trying to fix report quality at the very end when the actual problem started at the beginning.

For the operator loop around those stages, use references/operator-workflow.md. For graph-build failures, runtime evidence, and report verification, also use:

references/graph-build-runbook.md
references/runtime-forensics.md
references/report-audit.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MiroFish Workflow Map

Stage 1: Source Material In

Stage 2: Graph Build In Zep

Stage 3: Entity Filtering

Stage 4: OASIS Profile Generation

Stage 5: Simulation Config Generation

Stage 6: Runtime Execution

Stage 7: Report Generation

End-To-End Quality Heuristic

FilesExpand file tree

workflow.md

Latest commit

History

workflow.md

File metadata and controls

MiroFish Workflow Map

Stage 1: Source Material In

Stage 2: Graph Build In Zep

Stage 3: Entity Filtering

Stage 4: OASIS Profile Generation

Stage 5: Simulation Config Generation

Stage 6: Runtime Execution

Stage 7: Report Generation

End-To-End Quality Heuristic