Skip to content

Releases: llm0ai/llm0

v0.2.0

28 May 21:11

Choose a tag to compare

What's Changed

  • moved mrmshfiq/llm0-gateway to llm0ai/llm0
  • chore: rename module to github.com/llm0ai/llm0 by @mrmushfiq in #1

Full Changelog: v0.1.3...v0.2.0

v0.1.3

25 May 20:29

Choose a tag to compare

Summary

Patch release: cleaner Ollama streaming, consistent cost display, better request validation, and SDK examples in the README.

Fixed

  • Filter empty role-only SSE chunks from Ollama streams (OLLAMA_FILTER_EMPTY_CHUNKS, default true)
  • Validate empty model/messages before calling upstream (catches smart-quote JSON paste bugs)
  • Round cost_usd to 6 decimals so headers, JSON body, and logs match

Added

  • README: Python and Node examples (single request + simple agent loop with per-user headers)

Upgrade

```bash
git pull
docker compose up -d --build gateway
docker compose up -d --force-recreate gateway ( if gateway was running during the build )
```
Full changelog: CHANGELOG.md
EOF

v0.1.2

20 Apr 15:52

Choose a tag to compare

Patch release. Redis durability fix + config-propagation documentation corrections. No schema changes, no env var changes, no API changes.

Fixed

  • Redis AOF persistence actually enabled in docker-compose.yml. The README and design doc both stated AOF was on; the compose file never set it, and there was no data volume, so docker compose down (or an OOM restart) silently wiped every spend counter. The redis service now runs with --appendonly yes --appendfsync everysec and a dedicated redis_data named volume.
  • Config-propagation docs corrected. README.md previously stated that per-project settings (monthly_cap_usd, rate_limit_per_minute, cache_enabled, semantic_cache_enabled, semantic_threshold) propagate within CUSTOMER_LIMIT_CACHE_TTL_SECONDS (default 60s). That is wrong — they ride the Redis apikey:* auth cache, which uses CACHE_TTL_SECONDS (default 3600s / 1 hour). CUSTOMER_LIMIT_CAE_TTL_SECONDS governs only the in-process customer_limits cache for end-user spend/request caps.

Added (docs only)

  • CUSTOMER_LIMIT_CACHE_TTL_SECONDS now documented in the env var table.
  • Updated CACHE_TTL_SECONDS description to reflect its dual role (exact-match cache TTL and API-key auth cache TTL).

Upgrade notes

```bash
git pull
docker compose down
docker compose up -d
```

The new `redis_data` volume starts empty.
Nothing else needs to be rebuilt: the gateway Go binary and the embedding image are unchanged.

Full diff: v0.1.1...v0.1.2

v0.1.1 — First public release

20 Apr 07:54

Choose a tag to compare

First public release of llm0-gateway.
An OpenAI-compatible LLM gateway with automatic failover, two-tier caching (exact + semantic), SSE streaming, per-customer spend caps, and scheduled maintenance workers. Runs locally via Docker Compose or go run and fronts four providers (OpenAI, Anthropic, Gemini, local Ollama) behind a single /v1/chat/completions endpoint.

Highlights

  • Four providers, one endpoint — prefix-based routing, drop-in OpenAI client compat
  • Automatic failover — 429 / 5xx / 404 / timeout / network, configurable FAILOVER_MODE
  • Streaming (SSE) across all four providers with trailing metadata frames
  • Two-tier caching — Redis (hot, <2 ms) + Postgres (warm) for exact match; pgvector HNSW for semantic (0.954 similarity hits in ~41 ms, $0 cost)
  • Per-customer spend caps — daily/monthly USD, block or downgrade on overflow
  • Scheduled workers — monthly speche cleanup, log retention, reconciliation
  • 1400+ RPS sustained on the hot path with p50/p99 cache-hit at ~11ms / 16 ms and rejection took 2 ms

Note on versioning

The first tag was accidentally pushed as v1.0.0 and has been withdrawn. v0.1.1 is the first public release. Versions before 1.0 reflect pre-stable status — the HTTP surface is intended to stay OpenAI-compatible, but operational semantics may shift in patch releases.

Links