Skip to content

robbiebusinessacc/justllm

justllm

PyPI CI Python License: MIT

Production LLM calls. Just the three lines.

justllm demo

from justllm import LLM

llm = LLM("anthropic/claude-opus-4-8")
llm("Summarize this contract.")

That call already does the work you'd normally wire up yourself, on by default:

  • Context compression. Headroom shrinks tool output by 50–95% before it reaches the model.
  • Prompt-cache optimization. Cache breakpoints go where each provider wants them (Anthropic, OpenAI, Google).
  • Reliability. Calls retry with backoff, then fail over to the next provider.

You don't call any of these yourself; they run inside llm(...). To turn them off per client: LLM(model, compress=False, cache="off").

pip install 'justllm[all]'

More, when you need it

You set up llm once (those three lines). After that, each of these is a single call on it. Reach for the ones you need and ignore the rest:

llm.stream("...")                    # token streaming
await llm.acall("...")               # async
llm.map(prompts, concurrency=8)      # many prompts at once, in order
llm.extract(Invoice, text)           # structured output (validated Pydantic)
llm.chat()                           # multi-turn, keeps history
llm.agent(system="...").run("...")   # tool-calling loop
llm.judge(output, criteria="...")    # LLM-as-judge score
llm.evaluate(cases)                  # run + grade a test set

Also there, all opt-in: llm.embed(...), routing (Router and Cascade), OpenTelemetry traces with the per-call dollar cost, Langfuse-backed prompts, and exact-match caching. Runnable versions of everything are in the cookbook.

Runnable recipes: cookbook

Why

The ecosystem splits two ways. You can have powerful but heavy (LiteLLM, LangChain), or simple but thin (aisuite, any-llm). justllm sits in the middle: every optimization is on, and the surface stays at three lines. Keeping it that small was most of the work.

justllm LiteLLM aisuite
three-line call yes yes yes
cross-provider fallback on by default config no
context compression on by default (Headroom) manual trim no
prompt-cache optimization on by default passthrough no
structured output yes (instructor) passthrough no
tool-calling agent yes (minimal) no no
surface area tiny large tiny

It runs on LiteLLM underneath, so think of it as the opinionated layer on top rather than a replacement.


Alpha. The wiring is tested on CI (Python 3.10–3.13) and the call paths are checked against live models.

Cookbook · Roadmap · Changelog · Contributing · MIT

About

Production LLM calls. Just the three lines. Cross-provider fallback, native caching, and reversible context compression on by default.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages