fsm-llm is a Python framework for building stateful LLM programs — dialog bots, agents, reasoning chains, workflows, and long-context pipelines — that all compile to and execute on the same typed λ-calculus runtime. You author programs in whichever surface fits the problem; one verb (Program.invoke) runs all of them.
from fsm_llm import Program
# Surface A — FSM JSON: dialog with persistent per-turn state.
prog = Program.from_fsm("my_bot.json")
result = prog.invoke(message="Hi, I'd like to book a flight")
print(result.value) # "Sure — where to?"
print(result.conversation_id) # auto-started session id
# Surface B — λ-DSL: a one-shot pipeline, agent, or recursion.
from fsm_llm import react_term
prog = Program.from_term(react_term(decide_prompt=..., synth_prompt=...))
result = prog.invoke(inputs={"question": "What is 17 * 23?"})
print(result.value) # the agent's final answer
print(result.oracle_calls) # 2 — exactly what the planner predictedThe second guarantee — oracle_calls matches the planner's static prediction — is Theorem 2 of the design. Every Fix subtree carries a closed-form cost; the executor honors it.
- One runtime. FSM JSON dialogs and λ-DSL pipelines compile to the same AST. There is no separate "agents engine" plus "workflows engine" plus "FSM engine" — there is one β-reduction interpreter, and everything is a λ-term.
- Theorem-2 cost prediction. For any program with planner-bounded recursion, the executor's oracle-call count equals the planner's prediction. Budget LLM calls before running.
- Provider-agnostic. Built on LiteLLM — 100+ providers (OpenAI, Anthropic, Ollama, Google, Bedrock, Together, …) behind one interface. Switch with a string.
- Layered API. Four documented layers: L1 substrate, L2 composition, L3 authoring, L4 invoke. Use only what you need.
- Typed throughout. Pydantic v2 models for AST, definitions, results. Frozen, JSON-roundtrippable.
pip install fsm-llm # core: dialog, runtime, stdlib
pip install fsm-llm[reasoning] # reasoning engine (no extra deps)
pip install fsm-llm[agents] # agents (no extra deps)
pip install fsm-llm[workflows] # workflows (no extra deps)
pip install fsm-llm[monitor] # web dashboard (fastapi, uvicorn)
pip install fsm-llm[mcp] # MCP tool provider
pip install fsm-llm[otel] # OpenTelemetry exporter
pip install fsm-llm[oolong] # OOLONG long-context bench loader
pip install fsm-llm[all] # everythingPython 3.10–3.12. Set OPENAI_API_KEY (or any provider key) in .env or your shell.
Program is the unified entry point. Three constructors fix the mode at construction time; the same .invoke(...) returns a Result in every mode.
Author a state machine as JSON, compile to a λ-term, run turn by turn:
from fsm_llm import Program
prog = Program.from_fsm("intake_bot.json") # path or dict or FSMDefinition
result = prog.invoke(message="hello", conversation_id=None)
# result.value — the response string
# result.conversation_id — auto-started or echoed backSee examples/basic, examples/intermediate, and examples/advanced for runnable FSMs.
Author a term directly in the DSL:
from fsm_llm import Program, leaf, let_, var
term = let_(
"summary", leaf(template="Summarise: {doc}", input_vars=("doc",)),
leaf(template="Translate to French: {summary}", input_vars=("summary",)),
)
prog = Program.from_term(term)
result = prog.invoke(inputs={"doc": "..."})Or use a stdlib factory:
from fsm_llm import Program, react_term
term = react_term(
decide_prompt="Given {question}, propose a tool call as JSON.",
synth_prompt="Tool returned {observation}. Final answer:",
)
prog = Program.from_term(term)
result = prog.invoke(inputs={"question": "Capital of France?", "tool_dispatch": my_tools})Program.from_factory calls a factory at construction time with explicit args:
from fsm_llm import Program
from fsm_llm.stdlib.long_context import niah_term
prog = Program.from_factory(
niah_term,
factory_kwargs={"question": "Where is the artefact stored?", "tau": 256, "k": 2},
)
result = prog.invoke(inputs={"document": long_doc})Bundle prompt prefixes, leaf overrides, and provider kwargs at construction:
from fsm_llm import HarnessProfile, ProviderProfile, Program, register_harness_profile
register_harness_profile(
"ollama:qwen3.5:4b",
HarnessProfile(
system_prompt_base="You are a precise, terse assistant.",
leaf_template_overrides={"leaf_001_summarise": "Be brief: {doc}"},
provider_profile_name="ollama:qwen3.5:4b",
),
)
prog = Program.from_term(my_term, profile="ollama:qwen3.5:4b")Profiles apply once at construction; Theorem-2 strict equality is preserved.
Hook into 8 timing points across an FSM turn or a term reduction. Two timings (PRE_PROCESSING, POST_PROCESSING) splice into the AST via compose; the other six dispatch host-side.
from fsm_llm import HandlerBuilder, HandlerTiming, Program
audit = (
HandlerBuilder("audit")
.at(HandlerTiming.PRE_PROCESSING)
.do(lambda **kw: log_event(kw))
.build()
)
prog = Program.from_fsm("bot.json", handlers=[audit])See docs/handlers.md.
The package ships five console scripts:
fsm-llm --mode run --fsm path/to/fsm.json # interactive run
fsm-llm --mode validate --fsm path/to/fsm.json # schema check
fsm-llm --mode visualize --fsm path/to/fsm.json # ASCII state graph
# Single-purpose subcommand aliases — same code, simpler signatures.
fsm-llm-validate --fsm path/to/fsm.json
fsm-llm-visualize --fsm path/to/fsm.json
fsm-llm-monitor # web dashboard (requires fsm-llm[monitor])
fsm-llm-meta # interactive artifact builder FSM JSON (Category A) λ-DSL (Category B / C)
│ │
▼ fsm_llm.dialog.compile_fsm ▼ fsm_llm.runtime.dsl
┌─────────────────────────────────────────────────────┐
│ λ-AST (typed Term) │
│ Var · Abs · App · Let · Case · Combinator · Fix │
│ · Leaf │
└─────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Executor (β-reduction, depth-bounded) │
│ Planner (closed-form k*, τ*, d, calls) │
│ Oracle (one per Program — uniform) │
│ Session (per-conversation persistence) │
│ Cost (per-leaf accumulator) │
└──────────────────────────────────────────┘
│
▼
Program.invoke(...) → Result
The kernel (runtime/) is closed against the dialog surface — no upward edges. The dialog surface (dialog/) is the FSM-JSON compiler and orchestrator. The standard library (stdlib/) is named factories built on the kernel. Program (in program.py) is the L4 facade.
See docs/architecture.md for the full picture.
| Doc | What it covers |
|---|---|
docs/quickstart.md |
Five-minute tour: install, FSM hello-world, λ-term hello-world, handlers, profiles |
docs/api_reference.md |
Every public name across L1–L4 with signatures and examples |
docs/architecture.md |
The runtime, the layers, Theorem 2, the cross-cutting decisions |
docs/handlers.md |
All 8 timing points; AST-side vs host-side; HandlerBuilder cookbook |
docs/fsm_design.md |
Patterns and anti-patterns for authoring FSM JSON |
docs/migration_0.7_to_0.8.md |
Migration guide: every removed surface, before/after, FAQ |
docs/lambda.md |
The architectural thesis — why λ-calculus is the substrate |
docs/lambda_fsm_merge.md |
Canonical merge contract — invariants I1–I6, falsification gates G1–G5, deprecation calendar |
docs/threat_model.md |
Trust boundaries, T-01..T-11, dismissed proposals |
CHANGELOG.md |
Release notes |
172 runnable examples across 10 trees. Run with:
python examples/basic/echo_bot/run.py
python examples/pipeline/react/run.py
python examples/long_context/niah_demo/run.pyAll examples support OpenAI and Ollama out of the box. See examples/README.md for the index.
0.8.0 is the post-0.7.0 cleanup release. Eight more surfaces are hard
removals at the source-tree level — no new deprecation cycle was
introduced; the 0.7.0 deferred items shipped:
from fsm_llm import Handler→from fsm_llm import FSMHandler(the alias is gone;BaseHandleris the canonical base class).from fsm_llm import LLMInterface→from fsm_llm.runtime._litellm import LLMInterface(top-level re-export removed).from fsm_llm import BUILTIN_OPS→from fsm_llm.runtime import BUILTIN_OPS(registry is still architecturally closed).has_workflows()/has_reasoning()/has_agents()and the matchingget_*helpers → gone; the stdlib subpackages are not optional since 0.7.0.from fsm_llm.dialog.definitions import FSMError(and the 4 other re-exports) →from fsm_llm.types import FSMError(canonical home since 0.7.0; the back-compat re-export was removed in 0.8.0).Program(_api=..., _profile=...)private kwargs → gone; the public ctor is term-mode only. UseProgram.from_fsmfor FSM mode.Program.from_fsm(**api_kwargs)catch-all → replaced with explicit kwargs (model,api_key,temperature,max_tokens,max_history_size,max_message_length,handler_error_mode,transition_config) plus**llm_kwargsfor LiteLLM passthrough.- Reasoning factory parameter renames:
analytical_term(prompt_a, prompt_b, prompt_c)→analytical_term(decomposition_prompt, analysis_prompt, integration_prompt). Every reasoning factory instdlib/reasoning/lam_factories.pywas migrated to descriptive parameter names matching its bind_names.
Two private modules also moved: dialog/extraction.py (extracted from
turn.py; holds ExtractionEngine) and runtime/_handlers_ast.py
(holds compose + AST splicers, moved from handlers.py). Public API
is unchanged — from fsm_llm import compose and from fsm_llm.handlers import compose continue to work as re-exports.
See docs/migration_0.7_to_0.8.md for
the detailed before/after walkthrough per surface and
CHANGELOG.md for the full diff.
make install-dev # editable install with all extras + pre-commit
make test # pytest -v
make lint format # ruff
make type-check # mypymake test should report ~3300 tests passing on a clean checkout. Verify
the exact count with pytest --collect-only -q | tail -3.
GPL-3.0-or-later. See LICENSE.
