Build AI agents your application can actually trust in production — with small models cheaply, or frontier models more reliably.
Most agent frameworks help you build an agent in an afternoon. fastWorkflow is for the moment after the demo, when you need the agent to stop calling the wrong tool, stop hallucinating parameters, and stop confidently doing the wrong thing on real, messy user input.
fastWorkflow improves agent reliability two ways:
- It lets small, free models (e.g. Mistral Small) perform far above their weight on structured workflows — matching frontier models on agentic benchmarks.
- It makes frontier models more reliable by shrinking the active toolset, validating every parameter, and forcing clarification instead of silent wrong actions.
You wire up a dozen tools to a capable frontier model. In dev, against clean prompts, it works great. Then real users show up:
User: "cancel that blue jacket order from last week and give me credit, not a refund"
[ Generic tool-calling stack + frontier model ]
search_orders(query="blue jacket") ✓
cancel_order(order_id="44821") ✓
process_refund(order_id="44821", method="original_payment") ✗ ← user asked for store credit
Nothing crashed. The logs look fine. But the customer asked for store credit and got a refund to their card. This is the dangerous failure class: plausible-looking, semantically wrong execution. No amount of prompt engineering reliably prevents it at scale, because the problem is structural — ambiguous language, missing parameters, and a crowded toolset — not a weak model.
Here's the same request through fastWorkflow:
User: "cancel that blue jacket order from last week and give me credit, not a refund"
[ fastWorkflow ]
Intent detected: cancel_order
Parameter validation: order_id unresolved → ask, don't guess
Agent: "I found two recent orders — #44821 (Blue Jacket, $89) and
#44798 (Blue Scarf, $34). Which should I cancel?"
User: "the jacket"
Parameter validation: refund_method = store_credit ✓ (from "credit, not a refund")
cancel_order(order_id="44821", refund_method="store_credit") ✓
notify_customer(order_id="44821") ✓
Same model. Same application code. Different execution discipline. The framework makes the system harder to use incorrectly.
Instead of dumping your whole tool catalog into a prompt and hoping the model navigates it, fastWorkflow puts a structured execution layer between natural language and your application's side effects:
- Intent detection is trained locally — a tiny BERT/DistilBERT classifier (runs on CPU, ~milliseconds) maps utterances to commands instead of relying entirely on the LLM to infer what the user "probably meant."
- Every parameter is validated against your Pydantic
Fielddefinitions before your code runs — malformed or missing values are caught, not executed. - Clarification is a first-class behavior — when a required parameter is missing or ambiguous, the agent asks instead of guessing.
- Tools are organized into context hierarchies — the model only ever sees the handful of tools relevant to the current state, never all 40 at once.
- Your application code stays the source of truth — fastWorkflow wraps it; it never replaces or rewrites it.
A common reaction is: "Nice for cheap small models, but I already use GPT-4o / Claude / Bedrock." That's exactly where fastWorkflow still earns its place. Frontier models are better at language — but they still fail on the parts of agent systems that are architectural, not linguistic:
| Failure mode | What goes wrong | fastWorkflow's structural fix |
|---|---|---|
| Tool overload | Picks a valid-but-wrong tool from a crowded prompt | Context hierarchies keep the active toolset small |
| Parameter overconfidence | Extracts one slot wrong and executes anyway | Pydantic validation gate before execution |
| State blindness | Acts as if every tool is always available | Tools enabled/disabled by runtime context |
| Ambiguity collapse | Resolves uncertainty internally instead of asking | Clarification is built in, not prompted for |
With small models, fastWorkflow is mostly about cost. With large models, it's about reliability and reducing expensive mistakes. Either way, you get one consistent command layer for UI chat, backend automation, tests, and internal agents.
fastWorkflow was benchmarked on Tau Bench — an industry-standard benchmark for conversational agents that complete realistic, multi-step, tool-using customer-service workflows (order management, flight rebooking, policy enforcement). This measures exactly what breaks in production: reliable tool execution under ambiguity, not generic chat quality.
Retail: orders, returns, account operations |
Airline: rebooking, baggage, loyalty workflows |
fastWorkflow with Mistral Small (free tier) matches frontier models on these structured workflows — because the validation pipeline outweighs raw model capability where it counts.
Citation: Sanchit Satija, Aditya Bhatt, Priyanshu Jani, and Dhar Rawal. 2026. fastWorkflow: Closing the Performance Gap Between Small and Frontier Language Models for Conversational Agents. In Proceedings of the ACM Conference on AI Systems (CAIS '26). ACM, San Jose, CA, USA, 161–180. https://doi.org/10.1145/3786335.3813158
- Quick Start: run an example in 5 minutes
- AI-enable your own app (without restructuring it)
- How complex workflows scale: context hierarchies
- Production deployment
- Developer FAQ
- Key concepts (going deeper)
- Architecture overview
- Installation
- CLI reference
- Environment variables reference
- Troubleshooting / FAQ
- For contributors
- Our work & references
- License
This is the fastest way to see fastWorkflow in action.
# 1. Install (Linux/macOS; on Windows use WSL. Python 3.11+)
pip install fastworkflow
# 2. Fetch the hello_world example + env file templates
fastworkflow examples fetch hello_world
# 3. Add your API key (a free Mistral key works for every role)
nano ./examples/fastworkflow.passwords.env
# 4. Build the intent models for this command set (one-time, ~5 min on CPU)
fastworkflow train ./examples/hello_world ./examples/fastworkflow.env ./examples/fastworkflow.passwords.env
# 5. Run it
fastworkflow run ./examples/hello_world ./examples/fastworkflow.env ./examples/fastworkflow.passwords.envYou'll get a User > prompt. Try "what can you do?" or "add 49 + 51". Run fastworkflow examples list to see the rest.
Note
"Train" doesn't mean GPUs or fine-tuning a foundation model. fastworkflow train is closer to compiling a conversational interface: it generates synthetic utterances and fits small BERT-class intent classifiers for your commands. You run it once per command set, re-run it only when commands change, ship the resulting artifacts with your app, and need no GPU at runtime.
Tip
Get a free API key from Mistral AI (works with mistral-small-latest) or OpenRouter. You can assign different models to different roles in the same workflow.
You do not rewrite your application around fastWorkflow. You wrap your existing code with thin command files. Say you already have this service:
# your_app/orders.py ← your existing code, untouched
class OrderService:
def cancel_order(self, order_id: str, refund_method: str) -> dict: ...
def get_order_status(self, order_id: str) -> dict: ...
def update_shipping_address(self, order_id: str, address: str) -> dict: ...The fastest path for a non-trivial app is the integrate-chat-agent skill with Cursor or Claude Code:
Open fastworkflow/docs/integrate-chat-agent/SKILL.md
Prompt: "Integrate a fastWorkflow chat agent for OrderService in orders.py"
The agent introspects your code and generates _commands/cancel_order.py, _commands/get_order_status.py, the context_inheritance_model.json, and env scaffolding — then trains and smoke-tests it with you. Your orders.py is never modified.
# _commands/cancel_order.py ← new file; wraps your existing code
import fastworkflow
from fastworkflow.train.generate_synthetic import generate_diverse_utterances
from pydantic import BaseModel, Field
from your_app.orders import OrderService
class Signature:
class Input(BaseModel):
order_id: str = Field(
description="The order ID to cancel",
examples=["44821", "ORD-2024-001"],
default="NOT_FOUND", # missing → fastWorkflow asks instead of guessing
)
refund_method: str = Field(
description="How to refund the customer",
examples=["store_credit", "original_payment"],
default="original_payment",
)
plain_utterances = [
"Cancel order #44821 and give store credit",
"cancel that blue jacket order, I want credit not a refund",
]
@staticmethod
def generate_utterances(workflow: fastworkflow.Workflow, command_name: str) -> list[str]:
return [command_name.split("/")[-1].lower().replace("_", " ")] + \
generate_diverse_utterances(Signature.plain_utterances, command_name)
class ResponseGenerator:
def __call__(
self,
workflow: fastworkflow.Workflow,
command: str,
command_parameters: Signature.Input,
) -> fastworkflow.CommandOutput:
result = OrderService().cancel_order(
order_id=command_parameters.order_id,
refund_method=command_parameters.refund_method,
)
return fastworkflow.CommandOutput(
command_responses=[fastworkflow.CommandResponse(response=str(result))]
)Then fastworkflow train and fastworkflow run against your workflow directory. That's the entire integration pattern: a thin command layer over code you already have.
Tip
Prefer to learn by building the smallest possible workflow by hand first? fastworkflow examples fetch messaging_app_1 is a minimal, fully-worked single-command workflow you can read end-to-end.
At 5 tools, a frontier model is reliable. At 40 — a realistic enterprise workflow — accuracy drops: the model sees every tool in the prompt and starts choosing valid-but-wrong ones.
fastWorkflow keeps the active toolset small by modeling your application's object model as contexts. The agent only sees tools relevant to the current context:
User ← always visible
├── search_orders()
├── get_customer_info()
│
└── Order (active once an order is selected)
├── cancel_order()
├── update_address()
│
└── Refund (active during a refund flow)
├── issue_store_credit()
├── issue_original_payment()
└── escalate_to_human()
Context relationships live in one file, context_inheritance_model.json — not code. Each entry uses base (parent contexts whose commands are inherited) and optionally / (commands declared directly on the context):
{
"Order": {
"base": ["User"]
},
"Refund": {
"base": ["Order"]
}
}This is what lets small models stay accurate as your app grows — and what keeps frontier models from drowning in tool definitions.
Expose your workflow over HTTP with JWT auth, SSE/NDJSON streaming, and MCP support:
pip install "fastworkflow[server]"
python -m fastworkflow.run_fastapi_mcp \
--workflow_path ./order_agent \
--env_file_path ./fastworkflow.env \
--passwords_file_path ./fastworkflow.passwords.env \
--port 8000Key endpoints: /initialize (create session + JWT), /invoke_agent, /invoke_agent_stream (SSE/NDJSON), /invoke_assistant (deterministic, non-agentic), /perform_action (direct programmatic calls), /new_conversation, /conversations, /probes/healthz, /probes/readyz.
The execution core is synchronous and transport-free. Create one WorkflowExecutionContext per session and call process_message per turn:
import fastworkflow
from dotenv import dotenv_values
from fastworkflow.workflow_execution_context import WorkflowExecutionContext
# Load env + secrets once at startup
env_vars = {
**dotenv_values("fastworkflow.env"),
**dotenv_values("fastworkflow.passwords.env"),
}
fastworkflow.init(env_vars=env_vars)
# One context + bound workflow per session
ctx = WorkflowExecutionContext(run_as_agent=True, session_key="user-123")
app_workflow = fastworkflow.Workflow.create("./order_agent", workflow_id_str="user-123")
ctx.bind_app_workflow(app_workflow)
@app.post("/chat")
def chat(message: str):
output = ctx.process_message(message) # synchronous; run in a worker thread under async
return {"response": output.command_responses[0].response}The service ships liveness/readiness probes out of the box. /probes/readyz returns 503 until the intent models are loaded, so traffic isn't routed before the agent is actually ready:
livenessProbe:
httpGet: { path: /probes/healthz, port: 8000 }
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet: { path: /probes/readyz, port: 8000 }
initialDelaySeconds: 5
periodSeconds: 5Do I need a GPU? No. Intent detection (BERT/DistilBERT) runs on CPU in milliseconds. LLM calls go to whatever API you configure.
Does training re-run on every deploy?
No. fastworkflow train runs once per command set and writes artifacts to ___command_info/. Bake those into your Docker image or CI artifact store; re-train only when you add or change commands.
What actually ships to production?
Your application code + your _commands/ wrappers + the trained ___command_info/ artifacts (small BERT checkpoints). No GPU at runtime.
Can I use Claude / GPT-4o / Bedrock instead of Mistral?
Yes. fastWorkflow uses LiteLLM, so any provider works — set e.g. LLM_AGENT=openai/gpt-4o in fastworkflow.env. You can use different models for different roles (intent vs. extraction vs. response vs. planning).
Can I route through a corporate LiteLLM proxy?
Yes — prefix models with litellm_proxy/ and set LITELLM_PROXY_API_BASE. See Using LiteLLM Proxy.
What if a user asks something out of scope? Intent detection returns low confidence and fastWorkflow surfaces a clarification — it does not hallucinate a tool call. That's the core reliability guarantee.
Can commands call REST APIs or databases, not just Python functions?
Yes. ResponseGenerator.__call__ is plain Python — call requests, httpx, an ORM, gRPC stubs, anything. fastWorkflow owns the NLP layer; your business logic is unrestricted.
Adaptive intent understanding — Misunderstandings happen in every conversation. fastWorkflow does 1-shot adaptation from intent-detection mistakes, learning your conversational vocabulary as you interact; corrections can be persisted to improve the model across sessions.
Signatures — Pydantic BaseModel + Field (à la DSPy) is the contract between natural language and your code. Strong descriptions and examples directly improve extraction accuracy, and the same schema feeds DSPy integration.
Context navigation at runtime — Classes hold state; method availability can change with state. fastWorkflow enables/disables commands and navigates object hierarchies at run-time, which is what makes complex, finite-state workflows possible.
Deep code understanding — fastWorkflow understands classes, methods, inheritance, and aggregation, so you can AI-enable large-scale Python applications by mapping them onto contexts and commands.
DSPy for response generation — use dspy.Predict inside ResponseGenerator when deterministic logic isn't enough; dspySignature bridges your Pydantic models to DSPy signatures while preserving types, descriptions, and examples:
from fastworkflow.utils.dspy_utils import dspySignature
import dspy
dspy_sig = dspySignature(Signature.Input, Signature.Output)
prediction = dspy.Predict(dspy_sig)(command_parameters)Startup commands & headless mode — initialize context or run non-interactively (batch/CI) by combining a startup command/action with --keep_alive False:
fastworkflow run my_workflow/ .env passwords.env \
--startup_command "process daily report" --keep_alive FalseDeep-dive articles:
- From functions to classes: building stateful AI agents
- Leveraging class inheritance in fastWorkflow
- Building complex context hierarchies
fastWorkflow separates build-time, train-time, and run-time. At build-time you create a command interface from your code (recommended via the integrate-chat-agent skill). train builds the NLP models; run executes the workflow. Your existing code is never modified — fastWorkflow sits as a layer on top.
graph LR
subgraph A[Build-Time]
A1(Your Python App) --> A2{Coding Agent + integrate-chat-agent skill};
A2 --> A3(Generated _commands);
A3 --> A4(context_inheritance_model.json);
A4 --> A5(Review & refine);
end
subgraph B[Train-Time — runs once per command set]
B1(_commands) --> B2{fastworkflow train};
B2 --> B3(Trained models in ___command_info);
end
subgraph C[Run-Time — per request]
C1(User/Agent input) --> C2{Intent detection + validation\nBERT, CPU};
C2 --> C3{Parameter extraction + Pydantic validation};
C3 -->|missing/ambiguous| C4(Clarification prompt);
C3 -->|valid| C5(CommandExecutor);
C5 --> C6(Your app logic — DSPy or deterministic);
C6 --> C7(Response);
end
A --> B --> C
order_agent/ # <-- The workflow_folderpath
├── application/ # <-- Your app code (untouched)
│ └── orders.py
├── _commands/ # <-- Command wrappers (generated + edited)
│ ├── cancel_order.py
│ └── context_inheritance_model.json
├── ___command_info/ # <-- Trained models (generated by `train`)
├── ___convo_info/ # <-- Conversation logs (run-time)
└── ___workflow_contexts/ # <-- Session state (run-time)
fastworkflow.env # model strings, logging, intent model ids
fastworkflow.passwords.env # API keys
Tip
Add ___workflow_contexts, ___command_info, and ___convo_info to your .gitignore.
pip install fastworkflow # core (CPU inference, plain litellm client)
pip install "fastworkflow[server]" # adds the FastAPI/MCP HTTP service
pip install "fastworkflow[training]" # adds HuggingFace datasets for the train step
# Or with uv: uv pip install fastworkflowNotes
- Linux/macOS only — on Windows use WSL. Python 3.11+.
- Installs PyTorch; the first install may take a few minutes.
fastworkflow trainneeds the optional HuggingFacedatasetspackage (pip install datasets, orpoetry install --with devfrom this repo).
The core depends on plain litellm (client only — no proxy server stack), so it co-installs cleanly with downstream apps that pin a plain litellm. Server-only deps live behind the server extra.
| Package | Supported range | Notes |
|---|---|---|
transformers |
>=4.48.2,<6.0.0 |
Works on transformers 5.x (BERT/DistilBERT load natively) |
dspy |
>=3.0.1,<4.0.0 |
DSPy 3.x API |
openai |
>=2.8.0 |
Compatible with openai 2.x |
litellm |
>=1.83.7,<2.0.0 |
Client only; FastAPI server deps are in the server extra |
sentence-transformers |
not a dependency | imposes no constraint downstream |
The intent-detection base models are configurable via INTENT_DETECTION_TINY_MODEL / INTENT_DETECTION_LARGE_MODEL.
# Examples
fastworkflow examples list
fastworkflow examples fetch hello_world
# Train intent-detection models (once per command set)
fastworkflow train <workflow_dir> <env_file> <passwords_file>
# Run — agentic mode is the default
fastworkflow run <workflow_dir> <env_file> <passwords_file>
fastworkflow run <workflow_dir> <env_file> <passwords_file> --assistant # deterministic, non-agentic
# Headless (batch/CI)
fastworkflow run <workflow_dir> <env_file> <passwords_file> \
--startup_command "your command" --keep_alive False
# Host as a FastAPI/MCP service
python -m fastworkflow.run_fastapi_mcp --workflow_path ./wf --port 8000Tip
Prefix a natural-language command with / during an interactive run to force deterministic (non-agentic) execution. Add --help to any command for its full options.
Two files per workflow (templates ship with fastworkflow examples fetch).
| Variable | Purpose | When needed | Default |
|---|---|---|---|
SPEEDDICT_FOLDERNAME |
Directory name for workflow contexts | Always | ___workflow_contexts |
LOG_LEVEL |
Log level (DEBUG…CRITICAL) |
Optional | INFO |
LLM_SYNDATA_GEN |
Model for synthetic utterance generation | train |
mistral/mistral-small-latest |
LLM_PARAM_EXTRACTION |
Model for parameter extraction | train, run |
mistral/mistral-small-latest |
LLM_RESPONSE_GEN |
Model for response generation | run |
mistral/mistral-small-latest |
LLM_PLANNER |
Model for the agent's task planner | run (agent) |
mistral/mistral-small-latest |
LLM_AGENT |
Model for the DSPy agent | run (agent) |
mistral/mistral-small-latest |
LLM_CONVERSATION_STORE |
Model for conversation topic/summary | FastAPI service | mistral/mistral-small-latest |
LITELLM_PROXY_API_BASE |
LiteLLM Proxy URL | with litellm_proxy/ models |
not set |
INTENT_DETECTION_TINY_MODEL |
HF id for the small intent model | train (optional) |
google/bert_uncased_L-4_H-128_A-2 |
INTENT_DETECTION_LARGE_MODEL |
HF id for the large intent model | train (optional) |
distilbert-base-uncased |
| Variable | For | When needed |
|---|---|---|
LITELLM_API_KEY_SYNDATA_GEN |
LLM_SYNDATA_GEN |
train |
LITELLM_API_KEY_PARAM_EXTRACTION |
LLM_PARAM_EXTRACTION |
train, run |
LITELLM_API_KEY_RESPONSE_GEN |
LLM_RESPONSE_GEN |
run |
LITELLM_API_KEY_PLANNER |
LLM_PLANNER |
run (agent) |
LITELLM_API_KEY_AGENT |
LLM_AGENT |
run (agent) |
LITELLM_API_KEY_CONVERSATION_STORE |
LLM_CONVERSATION_STORE |
FastAPI service |
LITELLM_PROXY_API_KEY |
shared LiteLLM Proxy key | with litellm_proxy/ models |
Route LLM calls through a LiteLLM Proxy to centralize keys or unify providers — prefix model strings with litellm_proxy/:
# fastworkflow.env
LLM_AGENT=litellm_proxy/bedrock_mistral_large_2407
LITELLM_PROXY_API_BASE=http://127.0.0.1:4000
# fastworkflow.passwords.env
LITELLM_PROXY_API_KEY=your-proxy-api-keyWhen a model uses the litellm_proxy/ prefix, the per-role keys are ignored and the shared proxy key is used. You can mix proxied and direct models.
PARAMETER EXTRACTION ERROR— the LLM couldn't extract a required parameter. Rephrase more specifically, or strengthen theField(description=…, examples=[…])in your Signature.
CRASH RUNNING FASTWORKFLOW— the___workflow_contextsfolder is corrupted. Delete it and re-run.
Slow first training run — the first run downloads BERT/DistilBERT from HuggingFace and makes LLM calls for synthetic-utterance generation. Set
HF_HOME=/path/to/cacheto control model storage; later runs skip the download. A small workflow trains in ~5–8 minutes on CPU.
Commands not recognized — a command module with an import/syntax error won't load and won't appear as an intent. Check your
_commands/*.pyfiles.
Tip
To debug command files, set up a VSCode launch.json with justMyCode: false, add breakpoints, and run in debug mode.
git clone https://github.com/radiantlogicinc/fastworkflow.git
cd fastworkflow
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"Join our Discord — ask questions, discuss functionality, and showcase your fastWorkflows.
- Optimizing intent classification with a sentence-transformer pipeline — Part 1
- Optimizing intent classification with a sentence-transformer pipeline — Part 2
- Structured understanding: parameter extraction across leading LLMs
- A generalized parameter extraction framework
- DSPy — Compiling Declarative Language Model Calls into Self-Improving Pipelines
- LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
fastWorkflow is released under the Apache License 2.0 — see LICENSE.




