bhoga

Zero-server Python library that routes AI harnesses to the best available provider for a model, based on live subscription quota.

bhoga (Sanskrit: route, path, enjoyment of resources) discovers quota automatically from Claude Code, Codex CLI, and HTTP response headers — no LLM calls, no servers, no configuration required to get started.

Quick start

from bhoga import Router, apply_to_hermes

router = Router()                          # background quota discovery starts immediately

rec = router.best_for("claude-opus-4")    # → RouterRecommendation | None
print(rec.hermes_model)                    # "claude-opus-4"  (normalised for Anthropic API)
print(rec.provider_id, rec.quota_pct)      # "claude_code"  0.80

# After each API call, record the turn (non-blocking)
router.record_turn(
    pid="anthropic_api",
    model="claude-opus-4",
    input_tokens=1200,
    output_tokens=800,
    headers=dict(response.headers),
    status_code=response.status_code,
)

# Write recommendation directly into Hermes config.yaml
apply_to_hermes(router, "claude-opus-4")

Installation

pip install bhoga

Requires Python 3.11+.

How it works

No LLM dependency

Quota discovery uses structured APIs and regex parsing only:

Provider	Discovery method
Claude Code	Reads `~/.claude/.credentials.json` → OAuth API (`/api/oauth/usage`) → `claude -p /usage` fallback
Codex CLI	`codex --non-interactive /status` (JSON-first, TUI text fallback)
API providers	Calibrated from HTTP response headers after each turn

Automatic routing

best_for() walks the provider hierarchy for a model family and picks the provider with the most remaining quota. Priority breaks ties.

Family	Priority 1	Priority 2	Priority 3
anthropic	`claude_code`	`anthropic_api`	`github-copilot`
openai	`openai-codex`	`openai_api`	`github-copilot`

Dual cadence for subscription providers

Claude Code and Codex have two rate-limit windows:

burst (5 hours) — short-term usage cap
weekly (7 days) — longer-term usage cap

best_for() uses the tighter window (minimum remaining %) for routing decisions.

`Router.set_quota()` — manual fallback

When automatic discovery is unavailable (no CLI installed, running in CI, known quota state), set quota explicitly:

router = Router()

# Windowed providers (burst + optional separate weekly)
router.set_quota("openai-codex", pct_remaining=75.0)
router.set_quota("claude_code",  pct_remaining=90.0, weekly_pct=60.0)

# Request/token-based providers
router.set_quota("github-copilot", pct_remaining=85.0)
router.set_quota("anthropic_api",  pct_remaining=50.0)

# Target a specific model (default is "*" = all models for this provider)
router.set_quota("anthropic_api", pct_remaining=60.0, model="claude-opus-4")

Parameters:

Parameter	Type	Description
`pid`	`str`	Provider ID — `"claude_code"`, `"openai-codex"`, `"github-copilot"`, `"anthropic_api"`, `"openai_api"`
`pct_remaining`	`float`	Percentage of quota remaining, 0–100. For windowed providers, sets the burst window.
`weekly_pct`	`float \| None`	Weekly window remaining percentage (windowed providers only). Defaults to `pct_remaining`.
`model`	`str`	Model key. Default `"*"` covers all models for this provider.

State is persisted to ~/.cache/bhoga/state.json (override with BHOGA_STATE env var) via atomic rename, so it survives restarts.

Typical harness pattern

router = Router()

# Try automatic discovery first
rec = router.best_for("claude-opus-4")

# If we know quota from our own monitoring, override it
if my_monitoring.claude_code_burst_pct is not None:
    router.set_quota("claude_code",
                     pct_remaining=my_monitoring.claude_code_burst_pct,
                     weekly_pct=my_monitoring.claude_code_weekly_pct)
    rec = router.best_for("claude-opus-4")

apply_to_hermes(router, "claude-opus-4")

Codex model discovery

from bhoga import get_codex_models

models = get_codex_models()
# ['gpt-5.4', 'gpt-5.3-codex', 'gpt-5.4-mini', ...]

Resolution order (no network call):

~/.codex/models_cache.json — written by Codex CLI on first run
~/.codex/config.toml — the model the user has configured
Built-in defaults (current Codex model line-up)

`apply_to_hermes()` — Hermes integration

Writes the routing recommendation directly into ~/.hermes/config.yaml:

apply_to_hermes(router, "claude-opus-4")
# Writes:
#   model:
#     default: claude-opus-4
#     provider: anthropic
#     api_mode: anthropic_messages
#   compression:
#     summary_provider: anthropic

apply_to_hermes(router, "gpt-5.4", write_auxiliary=True)
# Also writes all auxiliary task providers (vision, compression, …)

Provider → Hermes mapping:

bhoga `provider_id`	Hermes canonical ID	`api_mode`
`claude_code`	`anthropic`	`anthropic_messages`
`anthropic_api`	`anthropic`	`anthropic_messages`
`openai-codex`	`openai-codex`	`codex_responses`
`openai_api`	`openrouter`	`chat_completions`
`github-copilot`	`copilot`	`chat_completions`

Model name normalisation (per hermes_cli/model_normalize.py):

Target provider	Format	Example
`openrouter`	`vendor/model`	`openai/gpt-5.4`
`anthropic`	bare, dots→hyphens	`claude-sonnet-4-6`
`copilot`	bare, dots preserved	`claude-sonnet-4.6`
`openai-codex`	bare, dots preserved	`gpt-5.4`

For GitHub Copilot, a non-destructive providers.copilot stub is written so Hermes knows the base URL and auth env var (COPILOT_GITHUB_TOKEN).

GitHub Copilot auth

bhoga resolves the Copilot token in this order (matching hermes_cli/copilot_auth.py):

COPILOT_GITHUB_TOKEN → GH_TOKEN → GITHUB_TOKEN

Supported token types: gho_ (OAuth), github_pat_ (fine-grained PAT), ghu_ (GitHub App).
Classic PATs (ghp_) are not supported by the Copilot API.

API reference

`Router`

Router(*, state_path: Path | None = None, eager: bool = True)

state_path — override default state file location
eager=True — start background quota discovery immediately

Methods:

Method	Description
`best_for(model)`	Returns `RouterRecommendation \| None`
`record_turn(pid, model, input_tokens, output_tokens, headers, status_code)`	Non-blocking; queued for background calibration
`set_quota(pid, pct_remaining, , weekly_pct=None, model="")`	Manual quota override
`quotas()`	Returns a snapshot of all tracked `ProviderQuota` objects

Module-level helpers

Function	Description
`apply_to_hermes(router, model, *, config_path=None, write_auxiliary=False)`	Write routing recommendation to Hermes config
`get_codex_models()`	List available Codex model IDs from local files or defaults
`model_family(model)`	Map a model name to its family (`"anthropic"`, `"openai"`, `"google"`, `"other"`)
`to_hermes_model(pid, mid)`	Normalise a model name for the given bhoga provider
`calibrate(q, headers)`	Update `ProviderQuota` from HTTP response headers
`mark_throttled(q, retry_after)`	Mark a provider as throttled
`check_quota(pid)`	Run quota discovery for a subscription provider

`RouterRecommendation`

@dataclass
class RouterRecommendation:
    provider_id:  str           # e.g. "claude_code"
    model_id:     str           # e.g. "claude-opus-4"
    quota_pct:    float         # 0.0–1.0, or -1.0 if unknown
    status:       ProviderStatus
    hermes_model: str           # normalised model name for target provider
    priority:     int

State persistence

Quota state is stored in ~/.cache/bhoga/state.json. Override with BHOGA_STATE:

BHOGA_STATE=/tmp/my-bhoga-state.json python my_harness.py

All writes are atomic (write .tmp → rename) so concurrent processes cannot corrupt state.

Thread safety

All Router methods are thread-safe. Background threads:

_init_bg — initial quota discovery (one-shot, daemon)
_worker_loop — single persistent daemon that drains the record_turn queue serially, eliminating TOCTOU races and unbounded thread spawning

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bhoga		bhoga
skills/bhoga		skills/bhoga
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bhoga

Quick start

Installation

How it works

No LLM dependency

Automatic routing

Dual cadence for subscription providers

`Router.set_quota()` — manual fallback

Typical harness pattern

Codex model discovery

`apply_to_hermes()` — Hermes integration

GitHub Copilot auth

API reference

`Router`

Module-level helpers

`RouterRecommendation`

State persistence

Thread safety

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bhoga

Quick start

Installation

How it works

No LLM dependency

Automatic routing

Dual cadence for subscription providers

Router.set_quota() — manual fallback

Typical harness pattern

Codex model discovery

apply_to_hermes() — Hermes integration

GitHub Copilot auth

API reference

Router

Module-level helpers

RouterRecommendation

State persistence

Thread safety

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Router.set_quota()` — manual fallback

`apply_to_hermes()` — Hermes integration

`Router`

`RouterRecommendation`

Packages