unillm is a minimalist LLM framework that replaces complex graph abstractions with a clean, stateful interpreter pattern. It treats an AI workflow like a standard program: data goes in, a visitor processes it, and the state stays right where you can see it.
- Logic-First: Define your workflow as data (Nodes) and your logic as processors (Evaluators).
- Zero Prop-Drilling: Store your state in self within the Executor. No passing context dictionaries around.
- First-Class Observability: Built-in OpenTelemetry and Phoenix support.
- Type Safe: Native Pydantic support for structured data extraction.
Nodes are simple data structures that represent a step in your graph.
from dataclasses import dataclass
from unillm.graph import Node
@dataclass
class RouterNode(Node):
routes: dict[str, Node]
@dataclass
class RAGNode(Node):
model: str
@dataclass
class SpecialistNode(Node):
model: str
instructions: strCreate an Executor to act as the interpreter. Ideally, each request would gets its own instance, ensuring clean state isolation.
NOTE: When you call executor.execute(tree, query), the input is stored on the Executor instance as self.query. This makes it directly accessible in every evaluator without passing it around explicitly.
from unillm.graph import Executor, evaluator
from unillm.models import GoogleLLMClient
from unillm.vectorstore import PineconeStore
class MyAssistant(Executor):
def __init__(self):
super().__init__()
self.client = GoogleLLMClient(api_key="...")
self.vector_store = PineconeStore(
api_key="your-api-key",
index="some-index",
namespace="some-namespace", # __default__ if not set
embedding_client=self.client,
embedding_model="gemini-embedding-001"
)
@evaluator(RouterNode)
def _route(self, node: RouterNode):
response = self.client.generate_content(
model=node.model,
prompt=self.query,
system_prompt="Determine whether a query is a technical question about unillm? If so, reply ONLY with 'unillm', otherwise 'chat'."
)
route = node.routes.get(response.text, "I'm sorry I can't help you with that.")
# Here route is either Node or string. If a Node is returned, it is evaluated, and its result is returned. Otherwise the value is just returned (e.g. anything other than nodes).
return route
@evaluator(RAGNode)
def _rag_flow(self, node: RAGNode):
# Retrieve context
docs = self.vector_store.retrieve(self.query, top_k=3)
context = "\n".join([d.metadata['text'] for d in docs])
# Answer using the context
res = self.client.generate_content(
model=node.model,
prompt="Context:\n{context}\n\nQuestion: {self.query}",
system_prompt="Answer strictly using the provided context."
)
return res.text
@evaluator(SpecialistNode)
def _handle_specialty(self, node: SpecialistNode):
res = self.client.generate_content(
model=node.model,
messages=self.query,
system_prompt=node.instructions
)
return res.textThe visitor traverses the nodes, updating the internal state as you defined in the evaluators.
# Build the tree/graph
tree = [
RouterNode(routes={
"unillm": RAGNode(model="gemini-2.5-flash"),
"chat": SpecialistNode(model="gemini-2.5-flash", instructions="You are a specialist in Generative AI")
})
]
# Execute it
assistant = MyAssistant()
result = assistant.execute(tree, "What is unillm?")
print(result)unillm has native support for OTEL / Phoenix. It is as simple as registering and instantiating a middleware implementation for Phoenix Arize.
from unillm.graph.middleware.phoenix import PhoenixMiddleware
from phoenix.otel import register
register(
endpoint="http://localhost:6006/v1/traces",
project_name="test-project"
)
tree = [
RouterNode(routes={
"unillm": RAGNode(model="gemini-2.5-flash"),
"chat": SpecialistNode(model="gemini-2.5-flash", instructions="You are a specialist in Generative AI")
})
]
assistant = MyAssistant()
assistant.add_middleware(PhoenixMiddleware())
result = assistant.execute(tree, "What is unillm?")
print(result)The tree is stateless and can be safely defined once at module level and shared across all requests. The Executor, however, holds request-scoped state and should always be instantiated fresh per request.
from fastapi import FastAPI
app = FastAPI()
# Define once
tree = [
RouterNode(routes={
"unillm": RAGNode(model="gemini-2.5-flash"),
"chat": SpecialistNode(model="gemini-2.5-flash", instructions="You are a specialist in Generative AI")
})
]
@app.post("/chat")
async def chat(query: str):
# Fresh instance per request
assistant = MyAssistant()
return assistant.execute(tree, query)Most LLM frameworks treat workflows as a Directed Acyclic Graph (DAG) where state is passed along edges. While this is powerful for massive autonomous agents, this often leads to you spending more time debugging the framework's state-management than your own code and prompts.
unillm is built for less complex systems which don't require such bloat. You could think of unillm as a bare bones tree interpreter (because it is just that). The following three points are the philosophy unillm follows:
-
Visitor Pattern
In a standard graph, the nodes often contain both data and logic. In unillm, nodes are just data. You're supposed to declare the logic within your Executor. This is how it's supposed to be done, this is how interpreters and compilers do it. In this case:
- The tree (graph) is the "source code" (AST)
- The Executor is the "virtual machine" (In this case, a essentially a tree walker)
This ensures a clean separation of data and logic, which makes it more enjoyable to code.
-
Explicit State
The visitor pattern has a crucial benifit: instead of storing state in a dictionary, or some object, and passing it around, your state now lives in the Executor instance in the form of attributes. Every evaluator which implements node logic has direct access to the Executor, which again, makes coding more enjoyable.
-
Observability
Even though it's a minimal implementation of telemetry, it provides insight into the execution flow of your tree/graph. Nodes are automatically traced, along with their outputs. LLM calls and vector retrievals are automatically traced if they're invoked within a node.
For example, implementing retry logic requires no special graph constructs; because logic lives in the Executor rather than the graph, you get imperative control flow for free. Retries, loops, and conditionals are just Python, no special node types or graph topology required.
@evaluator(RetryNode)
def _retry(self, node: RetryNode):
for attempt in range(node.max_retries):
try:
return self.evaluate(node.child)
except Exception:
if attempt == node.max_retries - 1:
raiseEverybody is welcome to contribute. To keep the process smooth, please read the following before opening a PR.
Open an issue first for any non-trivial change (a bug fix, new feature, or architectural change). This avoids wasted effort if the change doesn't align with the project's direction.
Fork the repository and create a branch named feature/<your-feature> or fix/<your-fix>.
Open a PR against main with a clear description of what you changed and why.
Before building something, ask whether it fits the three principles in the Philosophy section. unillm is intentionally minimal, if a feature adds complexity that most users won't need, it probably belongs in userland rather than the core framework.
Good candidates:
- Bug fixes
- New built-in Node types or Evaluator patterns that are broadly useful
- New LLM or vector store client integrations
- Observability improvements
Likely out of scope:
- Features that reintroduce graph complexity (e.g. edge-based state passing)
- Opinionated tooling that only fits specific use cases
When in doubt, open an issue and ask.
- Class and type names use CamelCase, variables and functions use snake_case
- Follow the existing separation of data (Nodes) and logic (Executors), don't add logic to Node definitions!
- Keep the end-user API Pythonic and minimal; if it feels clunky to use, it probably needs rethinking