Update agent model configs and provider budgets for 2025 by ManuelKugelmann · Pull Request #127 · ManuelKugelmann/Augur

ManuelKugelmann · 2026-03-20T08:02:03Z

Summary

Updated agent model configurations, provider budget notes, and endpoint settings to reflect current API limits and model availability. Improved test script robustness for model validation.

Key Changes

Agent Model Configuration (agent-models.json)

Updated budget notes with accurate RPM/RPD/token limits for all providers (Groq, Cerebras, Gemini, GitHub Models, Mistral, Cohere, OpenRouter, Qwen)
Standardized Cerebras model to qwen-3-235b-a22b-instruct-2507 across all agents (replacing qwen3-235b and gpt-oss-120b)
Reassigned signals-analyst from GitHub Models to Groq for better budget efficiency
Moved news-the-augur from Cohere to Cerebras for consistency
Moved news-finanz-augur from GitHub Models to Mistral (leveraging Mistral's European language strength)
Updated L5 default from GitHub Models to Cohere (command-a-03-2025)
Clarified agent design philosophy: L1-L4 cron agents use production providers only; L5 live-chat and prototyping can use lower-budget providers

LibreChat Endpoint Configuration (librechat-user.yaml)

Updated provider comments with specific RPM/RPD/token limits
Groq: Added moonshotai/kimi-k2-instruct and qwen/qwen3-32b; removed kimi-k2-0905
Cerebras: Standardized to qwen-3-235b-a22b-instruct-2507; removed gpt-oss-120b
GitHub Models: Removed o4-mini (unavailable)
Qwen: Removed qwen-long
OpenRouter: Updated free model list to minimax/minimax-m2.5:free, nvidia/nemotron-3-super-120b-a12b:free, openai/gpt-oss-120b:free; removed Gemini and DeepSeek free models; updated titleModel

Test Script (test-models.py)

Added User-Agent header to requests for better compatibility
Improved response parsing to handle list-type content (e.g., multimodal responses)
Enhanced robustness for edge cases (empty responses, non-string content)

Notable Details

All model names now use consistent naming conventions (e.g., qwen-3-235b-a22b-instruct-2507)
Budget allocation prioritizes production stability: high-budget providers (Groq, Cerebras, Gemini) for cron agents; lower-budget providers reserved for prototyping and live-chat fallbacks
OpenRouter free model selection updated to reflect current availability and performance

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

- Add browser User-Agent header to test-models.py to bypass Cloudflare bot protection (HTTP 403 error code 1010) on Groq and Cerebras APIs - Handle list-type content in responses from reasoning models (Mistral magistral) that return content as array of parts instead of string - Remove unavailable models: o4-mini (GitHub), qwen-long (Qwen) - Update OpenRouter free model list to currently available models https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Groq and Cerebras had hardcoded model lists with models that no longer exist (openai/gpt-oss-120b, kimi-k2-0905, qwen3-235b). Switch to fetch:true so LibreChat discovers available models from the API, keeping only known-good models as defaults. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Previous defaults (gemini-2.5-flash-preview, llama-4-scout, deepseek-r1-0528, qwen3-235b) don't exist in OpenRouter's free tier. Replaced with verified models from /models endpoint: gpt-oss-120b, nemotron-3-super-120b-a12b, llama-3.3-70b-instruct, hermes-3-llama-3.1-405b. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Add gpt-oss-120b, kimi-k2, qwen3-32b from confirmed /models response. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Model lists are now confirmed from live /models endpoints — no need for runtime discovery which can be blocked by Cloudflare. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

librechat-user.yaml: - Add free usage amounts to each provider comment (RPM, RPD, tok/day) agent-models.json: - Fix Cerebras model IDs: qwen3-235b → qwen-3-235b-a22b-instruct-2507 - Remove gpt-oss-120b from Cerebras (doesn't exist there) - Fix OpenRouter news agent: gemini-2.5-flash:free → minimax-m2.5:free - Update budget notes with verified free tier limits https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

- Qwen: clarify one-time token grant (not monthly) - GitHub Models: 10 RPM / 50 RPD for high-end, ~150 RPD for small - OpenRouter: 20 RPM / 50 RPD (was ~50 RPD) - agent-models.json: update all budget notes with verified numbers https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

- signals-analyst (L2): GitHub Models → Groq - news-the-augur (L4): Cohere → Cerebras - news-financial-augur (L4): OpenRouter → Groq - news-finanz-augur (L4): GitHub Models → Mistral - L5 default: GitHub Models → Cohere (allowed for live-chat tier) - Cohere/GitHub Models/OpenRouter/Qwen reserved for prototyping and one-off bootstrap runs (e.g. Qwen's 1M-token data seeding) https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

- agent-models-bootstrap.json: Qwen-heavy config for initial data seeding (burns 1M-token free grant across qwen3-max/plus/flash) - seed-agents.py --mode bootstrap|continuous: auto-selects models file - Post-install info: model test, agent setup, bootstrap workflow - 9 new tests: mode selection, bootstrap JSON validity, provider restrictions (no Qwen/GitHub/OpenRouter in L1-L4 continuous) Workflow: seed --mode bootstrap → bootstrap-data.py → seed --mode continuous https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

- agent_client.py: shared AgentClient with discovery, streaming, .env loading — replaces duplicate code in cron dispatcher, trigger command, and bootstrap-data.py - bootstrap-data.py: auto-loads AUGUR_AGENTS_API_KEY from .env, auto-discovers agent ID via API (no manual --api-key/--agent-id needed) - Augur.sh cron/trigger: refactored to use AgentClient via PYTHONPATH - Augur.sh bootstrap command: simplified, no more env var requirements - 14 new tests for agent_client (load_env, find_agent, invoke, logging) - Updated bootstrap tests for new AgentClient-based API https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

The agents command no longer uses --mode flags; it uses --group and --all instead. Bootstrap workflow simplified to reflect that AUGUR_AGENTS_API_KEY is now set in .env rather than auto-discovered. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

- --bootstrap: shorthand for --mode bootstrap (Qwen free tokens, core only) - --trading/--news: shorthand for --group trading/news - Remove --group flag from CLI (covered by shorthands + --all) - Update post-install info with correct bootstrap workflow https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Don't prescribe --bootstrap as mandatory; show it as an alternative to regular agents for users who want to use free Qwen tokens. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Replace cron-planner as bootstrap target with a purpose-built bootstrap agent. It coordinates L1 data agents (market-data, osint-data, signals-data) which maintain the unified store interface, rather than calling data tools directly. - New agent: bootstrap (L4, group: bootstrap) with edges to all 3 L1 agents - New prompt: prompts/bootstrap.md with delegation rules per kind - Update bootstrap-data.py default agent from cron-planner to bootstrap - Add bootstrap to live-chat edges, both model config files, ALL_GROUPS - Update test expectations for 17th agent https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Extend bootstrap from profiles-only to a full 4-phase bootstrapper: - Phase 1: Profiles (existing) - Phase 2: Timeseries (existing) - Phase 3: Events - seed current events via L1 delegation (GDELT, market moves, trending signals) for countries/stocks/commodities/ crypto/regions - Phase 4: Plans - create initial research plans and watchlists so cron-planner has work from day one New CLI flags: --events, --plans, --all-phases Add google-news MCP to bootstrap agent tools for event research. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

…tats Rewrite bootstrap-data.py: - One agent call per kind per phase (not per batch of 10) - No batching, no random sleeps — agent handles iteration internally - Safe to re-run: enriches existing data, deduplicates events/plans - Before/after profile coverage stats table - Structured timestamped log output - --phase flag for single phase, default runs all 4 phases - 10min timeout per call (agent makes many tool calls per kind) Rewrite agent_client.py: - Retry with exponential backoff on 429, 5xx, timeout, network errors - Up to 4 retries (2s, 4s, 8s, 16s), respects Retry-After header Update tests for new API (no batch_targets, build_profiles_prompt, etc). https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

The EU Parliament API (data.europarl.europa.eu) regularly takes >60s to respond. Bump pytest-timeout to 120s for this specific test to avoid flaky CI failures. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

claude added 21 commits March 19, 2026 08:20

Enable fetch:true for OpenRouter to discover available models

9fcf7ac

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Prioritize MiniMax M2.5 and Nemotron in OpenRouter defaults

3f20296

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Update Groq defaults to verified available models

126bbf6

Add gpt-oss-120b, kimi-k2, qwen3-32b from confirmed /models response. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Add Cerebras Qwen-3-235B to defaults (verified from /models)

6277dcd

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Switch all providers back to fetch:false with verified model lists

877e12a

Model lists are now confirmed from live /models endpoints — no need for runtime discovery which can be blocked by Cloudflare. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Note Qwen free quota 90-day expiry

2eeb922

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Make bootstrap workflow flexible: regular or Qwen agents

0312852

Don't prescribe --bootstrap as mandatory; show it as an alternative to regular agents for users who want to use free Qwen tokens. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

Increase timeout for EU Parliament votes integration test

55b8077

The EU Parliament API (data.europarl.europa.eu) regularly takes >60s to respond. Bump pytest-timeout to 120s for this specific test to avoid flaky CI failures. https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

ManuelKugelmann merged commit dad7c1d into main Mar 23, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update agent model configs and provider budgets for 2025#127

Update agent model configs and provider budgets for 2025#127
ManuelKugelmann merged 21 commits into
mainfrom
claude/fix-groq-api-auth-4jj0k

ManuelKugelmann commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ManuelKugelmann commented Mar 20, 2026

Summary

Key Changes

Notable Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants