unillm

unillm is a minimalist LLM framework that replaces complex graph abstractions with a clean, stateful interpreter pattern. It treats an AI workflow like a standard program: data goes in, a visitor processes it, and the state stays right where you can see it.

Why unillm?

Logic-First: Define your workflow as data (Nodes) and your logic as processors (Evaluators).
Zero Prop-Drilling: Store your state in self within the Executor. No passing context dictionaries around.
First-Class Observability: Built-in OpenTelemetry and Phoenix support.
Type Safe: Native Pydantic support for structured data extraction.

Quick Start

1. Define your nodes

Nodes are simple data structures that represent a step in your graph.

from dataclasses import dataclass
from unillm.graph import Node

@dataclass
class RouterNode(Node):
    routes: dict[str, Node]

@dataclass
class RAGNode(Node):
    model: str

@dataclass
class SpecialistNode(Node):
    model: str
    instructions: str

2. Implement the Logic

Create an Executor to act as the interpreter. Ideally, each request would gets its own instance, ensuring clean state isolation.

NOTE: When you call executor.execute(tree, query), the input is stored on the Executor instance as self.query. This makes it directly accessible in every evaluator without passing it around explicitly.

from unillm.graph import Executor, evaluator
from unillm.models import GoogleLLMClient
from unillm.vectorstore import PineconeStore

class MyAssistant(Executor):
    def __init__(self):
        super().__init__()

        self.client = GoogleLLMClient(api_key="...")
        self.vector_store = PineconeStore(
            api_key="your-api-key",
            index="some-index",
            namespace="some-namespace", # __default__ if not set
            embedding_client=self.client,
            embedding_model="gemini-embedding-001"
        )

    @evaluator(RouterNode)
    def _route(self, node: RouterNode):
        response = self.client.generate_content(
            model=node.model, 
            prompt=self.query,
            system_prompt="Determine whether a query is a technical question about unillm? If so, reply ONLY with 'unillm', otherwise 'chat'."
        )

        route = node.routes.get(response.text, "I'm sorry I can't help you with that.")

        # Here route is either Node or string. If a Node is returned, it is evaluated, and its result is returned. Otherwise the value is just returned (e.g. anything other than nodes).
        return route

    @evaluator(RAGNode)
    def _rag_flow(self, node: RAGNode):
        # Retrieve context
        docs = self.vector_store.retrieve(self.query, top_k=3)
        context = "\n".join([d.metadata['text'] for d in docs])

        # Answer using the context
        res = self.client.generate_content(
            model=node.model,
            prompt="Context:\n{context}\n\nQuestion: {self.query}",
            system_prompt="Answer strictly using the provided context."
        )

        return res.text

    @evaluator(SpecialistNode)
    def _handle_specialty(self, node: SpecialistNode):
        res = self.client.generate_content(
            model=node.model,
            messages=self.query,
            system_prompt=node.instructions
        )

        return res.text

3. Run It

The visitor traverses the nodes, updating the internal state as you defined in the evaluators.

# Build the tree/graph
tree = [
    RouterNode(routes={
        "unillm": RAGNode(model="gemini-2.5-flash"),
        "chat": SpecialistNode(model="gemini-2.5-flash", instructions="You are a specialist in Generative AI")
    })
]

# Execute it
assistant = MyAssistant()
result = assistant.execute(tree, "What is unillm?")

print(result)

Observability

unillm has native support for OTEL / Phoenix. It is as simple as registering and instantiating a middleware implementation for Phoenix Arize.

from unillm.graph.middleware.phoenix import PhoenixMiddleware
from phoenix.otel import register

register(
    endpoint="http://localhost:6006/v1/traces",
    project_name="test-project"
)

tree = [
    RouterNode(routes={
        "unillm": RAGNode(model="gemini-2.5-flash"),
        "chat": SpecialistNode(model="gemini-2.5-flash", instructions="You are a specialist in Generative AI")
    })
]

assistant = MyAssistant()
assistant.add_middleware(PhoenixMiddleware())
result = assistant.execute(tree, "What is unillm?")

print(result)

Usage with FastAPI

The tree is stateless and can be safely defined once at module level and shared across all requests. The Executor, however, holds request-scoped state and should always be instantiated fresh per request.

from fastapi import FastAPI

app = FastAPI()

# Define once
tree = [
    RouterNode(routes={
        "unillm": RAGNode(model="gemini-2.5-flash"),
        "chat": SpecialistNode(model="gemini-2.5-flash", instructions="You are a specialist in Generative AI")
    })
]

@app.post("/chat")
async def chat(query: str):
    # Fresh instance per request
    assistant = MyAssistant()
    return assistant.execute(tree, query)

Philosophy

Most LLM frameworks treat workflows as a Directed Acyclic Graph (DAG) where state is passed along edges. While this is powerful for massive autonomous agents, this often leads to you spending more time debugging the framework's state-management than your own code and prompts.

unillm is built for less complex systems which don't require such bloat. You could think of unillm as a bare bones tree interpreter (because it is just that). The following three points are the philosophy unillm follows:

Visitor Pattern

In a standard graph, the nodes often contain both data and logic. In unillm, nodes are just data. You're supposed to declare the logic within your Executor. This is how it's supposed to be done, this is how interpreters and compilers do it. In this case:
- The tree (graph) is the "source code" (AST)
- The Executor is the "virtual machine" (In this case, a essentially a tree walker)
This ensures a clean separation of data and logic, which makes it more enjoyable to code.
Explicit State

The visitor pattern has a crucial benifit: instead of storing state in a dictionary, or some object, and passing it around, your state now lives in the Executor instance in the form of attributes. Every evaluator which implements node logic has direct access to the Executor, which again, makes coding more enjoyable.
Observability

Even though it's a minimal implementation of telemetry, it provides insight into the execution flow of your tree/graph. Nodes are automatically traced, along with their outputs. LLM calls and vector retrievals are automatically traced if they're invoked within a node.

For example, implementing retry logic requires no special graph constructs; because logic lives in the Executor rather than the graph, you get imperative control flow for free. Retries, loops, and conditionals are just Python, no special node types or graph topology required.

@evaluator(RetryNode)
def _retry(self, node: RetryNode):
    for attempt in range(node.max_retries):
        try:
            return self.evaluate(node.child)
        except Exception:
            if attempt == node.max_retries - 1:
                raise

Contributions

Everybody is welcome to contribute. To keep the process smooth, please read the following before opening a PR.

Getting Started

Open an issue first for any non-trivial change (a bug fix, new feature, or architectural change). This avoids wasted effort if the change doesn't align with the project's direction. Fork the repository and create a branch named feature/<your-feature> or fix/<your-fix>. Open a PR against main with a clear description of what you changed and why.

Is My Contribution In Scope?

Before building something, ask whether it fits the three principles in the Philosophy section. unillm is intentionally minimal, if a feature adds complexity that most users won't need, it probably belongs in userland rather than the core framework.

Good candidates:

Bug fixes
New built-in Node types or Evaluator patterns that are broadly useful
New LLM or vector store client integrations
Observability improvements

Likely out of scope:

Features that reintroduce graph complexity (e.g. edge-based state passing)
Opinionated tooling that only fits specific use cases

When in doubt, open an issue and ask.

Code Style

Class and type names use CamelCase, variables and functions use snake_case
Follow the existing separation of data (Nodes) and logic (Executors), don't add logic to Node definitions!
Keep the end-user API Pythonic and minimal; if it feels clunky to use, it probably needs rethinking

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
example		example
unillm		unillm
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

unillm

Why unillm?

Quick Start

1. Define your nodes

2. Implement the Logic

3. Run It

Observability

Usage with FastAPI

Philosophy

Contributions

Getting Started

Is My Contribution In Scope?

Code Style

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

unillm

Why unillm?

Quick Start

1. Define your nodes

2. Implement the Logic

3. Run It

Observability

Usage with FastAPI

Philosophy

Contributions

Getting Started

Is My Contribution In Scope?

Code Style

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages