Skip to content

vladimirdabic/unillm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

unillm

unillm is a minimalist LLM framework that replaces complex graph abstractions with a clean, stateful interpreter pattern. It treats an AI workflow like a standard program: data goes in, a visitor processes it, and the state stays right where you can see it.

Why unillm?

  • Logic-First: Define your workflow as data (Nodes) and your logic as processors (Evaluators).
  • Zero Prop-Drilling: Store your state in self within the Executor. No passing context dictionaries around.
  • First-Class Observability: Built-in OpenTelemetry and Phoenix support.
  • Type Safe: Native Pydantic support for structured data extraction.

Quick Start

1. Define your nodes

Nodes are simple data structures that represent a step in your graph.

from dataclasses import dataclass
from unillm.graph import Node

@dataclass
class RouterNode(Node):
    routes: dict[str, Node]

@dataclass
class RAGNode(Node):
    model: str

@dataclass
class SpecialistNode(Node):
    model: str
    instructions: str

2. Implement the Logic

Create an Executor to act as the interpreter. Ideally, each request would gets its own instance, ensuring clean state isolation.

NOTE: When you call executor.execute(tree, query), the input is stored on the Executor instance as self.query. This makes it directly accessible in every evaluator without passing it around explicitly.

from unillm.graph import Executor, evaluator
from unillm.models import GoogleLLMClient
from unillm.vectorstore import PineconeStore

class MyAssistant(Executor):
    def __init__(self):
        super().__init__()

        self.client = GoogleLLMClient(api_key="...")
        self.vector_store = PineconeStore(
            api_key="your-api-key",
            index="some-index",
            namespace="some-namespace", # __default__ if not set
            embedding_client=self.client,
            embedding_model="gemini-embedding-001"
        )

    @evaluator(RouterNode)
    def _route(self, node: RouterNode):
        response = self.client.generate_content(
            model=node.model, 
            prompt=self.query,
            system_prompt="Determine whether a query is a technical question about unillm? If so, reply ONLY with 'unillm', otherwise 'chat'."
        )

        route = node.routes.get(response.text, "I'm sorry I can't help you with that.")

        # Here route is either Node or string. If a Node is returned, it is evaluated, and its result is returned. Otherwise the value is just returned (e.g. anything other than nodes).
        return route

    @evaluator(RAGNode)
    def _rag_flow(self, node: RAGNode):
        # Retrieve context
        docs = self.vector_store.retrieve(self.query, top_k=3)
        context = "\n".join([d.metadata['text'] for d in docs])

        # Answer using the context
        res = self.client.generate_content(
            model=node.model,
            prompt="Context:\n{context}\n\nQuestion: {self.query}",
            system_prompt="Answer strictly using the provided context."
        )

        return res.text

    @evaluator(SpecialistNode)
    def _handle_specialty(self, node: SpecialistNode):
        res = self.client.generate_content(
            model=node.model,
            messages=self.query,
            system_prompt=node.instructions
        )

        return res.text

3. Run It

The visitor traverses the nodes, updating the internal state as you defined in the evaluators.

# Build the tree/graph
tree = [
    RouterNode(routes={
        "unillm": RAGNode(model="gemini-2.5-flash"),
        "chat": SpecialistNode(model="gemini-2.5-flash", instructions="You are a specialist in Generative AI")
    })
]

# Execute it
assistant = MyAssistant()
result = assistant.execute(tree, "What is unillm?")

print(result)

Observability

unillm has native support for OTEL / Phoenix. It is as simple as registering and instantiating a middleware implementation for Phoenix Arize.

from unillm.graph.middleware.phoenix import PhoenixMiddleware
from phoenix.otel import register

register(
    endpoint="http://localhost:6006/v1/traces",
    project_name="test-project"
)

tree = [
    RouterNode(routes={
        "unillm": RAGNode(model="gemini-2.5-flash"),
        "chat": SpecialistNode(model="gemini-2.5-flash", instructions="You are a specialist in Generative AI")
    })
]

assistant = MyAssistant()
assistant.add_middleware(PhoenixMiddleware())
result = assistant.execute(tree, "What is unillm?")

print(result)

Usage with FastAPI

The tree is stateless and can be safely defined once at module level and shared across all requests. The Executor, however, holds request-scoped state and should always be instantiated fresh per request.

from fastapi import FastAPI

app = FastAPI()

# Define once
tree = [
    RouterNode(routes={
        "unillm": RAGNode(model="gemini-2.5-flash"),
        "chat": SpecialistNode(model="gemini-2.5-flash", instructions="You are a specialist in Generative AI")
    })
]

@app.post("/chat")
async def chat(query: str):
    # Fresh instance per request
    assistant = MyAssistant()
    return assistant.execute(tree, query)

Philosophy

Most LLM frameworks treat workflows as a Directed Acyclic Graph (DAG) where state is passed along edges. While this is powerful for massive autonomous agents, this often leads to you spending more time debugging the framework's state-management than your own code and prompts.

unillm is built for less complex systems which don't require such bloat. You could think of unillm as a bare bones tree interpreter (because it is just that). The following three points are the philosophy unillm follows:

  1. Visitor Pattern

    In a standard graph, the nodes often contain both data and logic. In unillm, nodes are just data. You're supposed to declare the logic within your Executor. This is how it's supposed to be done, this is how interpreters and compilers do it. In this case:

    • The tree (graph) is the "source code" (AST)
    • The Executor is the "virtual machine" (In this case, a essentially a tree walker)

    This ensures a clean separation of data and logic, which makes it more enjoyable to code.

  2. Explicit State

    The visitor pattern has a crucial benifit: instead of storing state in a dictionary, or some object, and passing it around, your state now lives in the Executor instance in the form of attributes. Every evaluator which implements node logic has direct access to the Executor, which again, makes coding more enjoyable.

  3. Observability

    Even though it's a minimal implementation of telemetry, it provides insight into the execution flow of your tree/graph. Nodes are automatically traced, along with their outputs. LLM calls and vector retrievals are automatically traced if they're invoked within a node.

For example, implementing retry logic requires no special graph constructs; because logic lives in the Executor rather than the graph, you get imperative control flow for free. Retries, loops, and conditionals are just Python, no special node types or graph topology required.

@evaluator(RetryNode)
def _retry(self, node: RetryNode):
    for attempt in range(node.max_retries):
        try:
            return self.evaluate(node.child)
        except Exception:
            if attempt == node.max_retries - 1:
                raise

Contributions

Everybody is welcome to contribute. To keep the process smooth, please read the following before opening a PR.

Getting Started

Open an issue first for any non-trivial change (a bug fix, new feature, or architectural change). This avoids wasted effort if the change doesn't align with the project's direction. Fork the repository and create a branch named feature/<your-feature> or fix/<your-fix>. Open a PR against main with a clear description of what you changed and why.

Is My Contribution In Scope?

Before building something, ask whether it fits the three principles in the Philosophy section. unillm is intentionally minimal, if a feature adds complexity that most users won't need, it probably belongs in userland rather than the core framework.

Good candidates:

  • Bug fixes
  • New built-in Node types or Evaluator patterns that are broadly useful
  • New LLM or vector store client integrations
  • Observability improvements

Likely out of scope:

  • Features that reintroduce graph complexity (e.g. edge-based state passing)
  • Opinionated tooling that only fits specific use cases

When in doubt, open an issue and ask.

Code Style

  • Class and type names use CamelCase, variables and functions use snake_case
  • Follow the existing separation of data (Nodes) and logic (Executors), don't add logic to Node definitions!
  • Keep the end-user API Pythonic and minimal; if it feels clunky to use, it probably needs rethinking

About

Minimalist Python LLM framework that replaces complex graph abstractions with a clean, stateful interpreter pattern for logic-first AI workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages