Releases: llm0ai/llm0
v0.2.0
What's Changed
- moved mrmshfiq/llm0-gateway to llm0ai/llm0
- chore: rename module to github.com/llm0ai/llm0 by @mrmushfiq in #1
Full Changelog: v0.1.3...v0.2.0
v0.1.3
Summary
Patch release: cleaner Ollama streaming, consistent cost display, better request validation, and SDK examples in the README.
Fixed
- Filter empty role-only SSE chunks from Ollama streams (
OLLAMA_FILTER_EMPTY_CHUNKS, default true) - Validate empty model/messages before calling upstream (catches smart-quote JSON paste bugs)
- Round
cost_usdto 6 decimals so headers, JSON body, and logs match
Added
- README: Python and Node examples (single request + simple agent loop with per-user headers)
Upgrade
```bash
git pull
docker compose up -d --build gateway
docker compose up -d --force-recreate gateway ( if gateway was running during the build )
```
Full changelog: CHANGELOG.md
EOF
v0.1.2
Patch release. Redis durability fix + config-propagation documentation corrections. No schema changes, no env var changes, no API changes.
Fixed
- Redis AOF persistence actually enabled in
docker-compose.yml. The README and design doc both stated AOF was on; the compose file never set it, and there was no data volume, sodocker compose down(or an OOM restart) silently wiped every spend counter. The redis service now runs with--appendonly yes --appendfsync everysecand a dedicatedredis_datanamed volume. - Config-propagation docs corrected.
README.mdpreviously stated that per-project settings (monthly_cap_usd,rate_limit_per_minute,cache_enabled,semantic_cache_enabled,semantic_threshold) propagate withinCUSTOMER_LIMIT_CACHE_TTL_SECONDS(default 60s). That is wrong — they ride the Redisapikey:*auth cache, which usesCACHE_TTL_SECONDS(default 3600s / 1 hour).CUSTOMER_LIMIT_CAE_TTL_SECONDSgoverns only the in-processcustomer_limitscache for end-user spend/request caps.
Added (docs only)
CUSTOMER_LIMIT_CACHE_TTL_SECONDSnow documented in the env var table.- Updated
CACHE_TTL_SECONDSdescription to reflect its dual role (exact-match cache TTL and API-key auth cache TTL).
Upgrade notes
```bash
git pull
docker compose down
docker compose up -d
```
The new `redis_data` volume starts empty.
Nothing else needs to be rebuilt: the gateway Go binary and the embedding image are unchanged.
Full diff: v0.1.1...v0.1.2
v0.1.1 — First public release
First public release of llm0-gateway.
An OpenAI-compatible LLM gateway with automatic failover, two-tier caching (exact + semantic), SSE streaming, per-customer spend caps, and scheduled maintenance workers. Runs locally via Docker Compose or go run and fronts four providers (OpenAI, Anthropic, Gemini, local Ollama) behind a single /v1/chat/completions endpoint.
Highlights
- Four providers, one endpoint — prefix-based routing, drop-in OpenAI client compat
- Automatic failover — 429 / 5xx / 404 / timeout / network, configurable
FAILOVER_MODE - Streaming (SSE) across all four providers with trailing metadata frames
- Two-tier caching — Redis (hot, <2 ms) + Postgres (warm) for exact match; pgvector HNSW for semantic (0.954 similarity hits in ~41 ms, $0 cost)
- Per-customer spend caps — daily/monthly USD, block or downgrade on overflow
- Scheduled workers — monthly speche cleanup, log retention, reconciliation
- 1400+ RPS sustained on the hot path with p50/p99 cache-hit at ~11ms / 16 ms and rejection took 2 ms
Note on versioning
The first tag was accidentally pushed as v1.0.0 and has been withdrawn. v0.1.1 is the first public release. Versions before 1.0 reflect pre-stable status — the HTTP surface is intended to stay OpenAI-compatible, but operational semantics may shift in patch releases.
Links
- README — setup + features
- CHANGELOG — full list
- Managed cloud waitlist