Skip to content

Update agent model configs and provider budgets for 2025#127

Merged
ManuelKugelmann merged 21 commits into
mainfrom
claude/fix-groq-api-auth-4jj0k
Mar 23, 2026
Merged

Update agent model configs and provider budgets for 2025#127
ManuelKugelmann merged 21 commits into
mainfrom
claude/fix-groq-api-auth-4jj0k

Conversation

@ManuelKugelmann
Copy link
Copy Markdown
Owner

Summary

Updated agent model configurations, provider budget notes, and endpoint settings to reflect current API limits and model availability. Improved test script robustness for model validation.

Key Changes

Agent Model Configuration (agent-models.json)

  • Updated budget notes with accurate RPM/RPD/token limits for all providers (Groq, Cerebras, Gemini, GitHub Models, Mistral, Cohere, OpenRouter, Qwen)
  • Standardized Cerebras model to qwen-3-235b-a22b-instruct-2507 across all agents (replacing qwen3-235b and gpt-oss-120b)
  • Reassigned signals-analyst from GitHub Models to Groq for better budget efficiency
  • Moved news-the-augur from Cohere to Cerebras for consistency
  • Moved news-finanz-augur from GitHub Models to Mistral (leveraging Mistral's European language strength)
  • Updated L5 default from GitHub Models to Cohere (command-a-03-2025)
  • Clarified agent design philosophy: L1-L4 cron agents use production providers only; L5 live-chat and prototyping can use lower-budget providers

LibreChat Endpoint Configuration (librechat-user.yaml)

  • Updated provider comments with specific RPM/RPD/token limits
  • Groq: Added moonshotai/kimi-k2-instruct and qwen/qwen3-32b; removed kimi-k2-0905
  • Cerebras: Standardized to qwen-3-235b-a22b-instruct-2507; removed gpt-oss-120b
  • GitHub Models: Removed o4-mini (unavailable)
  • Qwen: Removed qwen-long
  • OpenRouter: Updated free model list to minimax/minimax-m2.5:free, nvidia/nemotron-3-super-120b-a12b:free, openai/gpt-oss-120b:free; removed Gemini and DeepSeek free models; updated titleModel

Test Script (test-models.py)

  • Added User-Agent header to requests for better compatibility
  • Improved response parsing to handle list-type content (e.g., multimodal responses)
  • Enhanced robustness for edge cases (empty responses, non-string content)

Notable Details

  • All model names now use consistent naming conventions (e.g., qwen-3-235b-a22b-instruct-2507)
  • Budget allocation prioritizes production stability: high-budget providers (Groq, Cerebras, Gemini) for cron agents; lower-budget providers reserved for prototyping and live-chat fallbacks
  • OpenRouter free model selection updated to reflect current availability and performance

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd

claude added 21 commits March 19, 2026 08:20
- Add browser User-Agent header to test-models.py to bypass Cloudflare
  bot protection (HTTP 403 error code 1010) on Groq and Cerebras APIs
- Handle list-type content in responses from reasoning models (Mistral
  magistral) that return content as array of parts instead of string
- Remove unavailable models: o4-mini (GitHub), qwen-long (Qwen)
- Update OpenRouter free model list to currently available models

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
Groq and Cerebras had hardcoded model lists with models that no longer
exist (openai/gpt-oss-120b, kimi-k2-0905, qwen3-235b). Switch to
fetch:true so LibreChat discovers available models from the API,
keeping only known-good models as defaults.

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
Previous defaults (gemini-2.5-flash-preview, llama-4-scout, deepseek-r1-0528,
qwen3-235b) don't exist in OpenRouter's free tier. Replaced with verified
models from /models endpoint: gpt-oss-120b, nemotron-3-super-120b-a12b,
llama-3.3-70b-instruct, hermes-3-llama-3.1-405b.

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
Add gpt-oss-120b, kimi-k2, qwen3-32b from confirmed /models response.

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
Model lists are now confirmed from live /models endpoints — no need
for runtime discovery which can be blocked by Cloudflare.

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
librechat-user.yaml:
- Add free usage amounts to each provider comment (RPM, RPD, tok/day)

agent-models.json:
- Fix Cerebras model IDs: qwen3-235b → qwen-3-235b-a22b-instruct-2507
- Remove gpt-oss-120b from Cerebras (doesn't exist there)
- Fix OpenRouter news agent: gemini-2.5-flash:free → minimax-m2.5:free
- Update budget notes with verified free tier limits

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
- Qwen: clarify one-time token grant (not monthly)
- GitHub Models: 10 RPM / 50 RPD for high-end, ~150 RPD for small
- OpenRouter: 20 RPM / 50 RPD (was ~50 RPD)
- agent-models.json: update all budget notes with verified numbers

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
- signals-analyst (L2): GitHub Models → Groq
- news-the-augur (L4): Cohere → Cerebras
- news-financial-augur (L4): OpenRouter → Groq
- news-finanz-augur (L4): GitHub Models → Mistral
- L5 default: GitHub Models → Cohere (allowed for live-chat tier)
- Cohere/GitHub Models/OpenRouter/Qwen reserved for prototyping
  and one-off bootstrap runs (e.g. Qwen's 1M-token data seeding)

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
- agent-models-bootstrap.json: Qwen-heavy config for initial data
  seeding (burns 1M-token free grant across qwen3-max/plus/flash)
- seed-agents.py --mode bootstrap|continuous: auto-selects models file
- Post-install info: model test, agent setup, bootstrap workflow
- 9 new tests: mode selection, bootstrap JSON validity, provider
  restrictions (no Qwen/GitHub/OpenRouter in L1-L4 continuous)

Workflow: seed --mode bootstrap → bootstrap-data.py → seed --mode continuous

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
- agent_client.py: shared AgentClient with discovery, streaming,
  .env loading — replaces duplicate code in cron dispatcher, trigger
  command, and bootstrap-data.py
- bootstrap-data.py: auto-loads AUGUR_AGENTS_API_KEY from .env,
  auto-discovers agent ID via API (no manual --api-key/--agent-id needed)
- Augur.sh cron/trigger: refactored to use AgentClient via PYTHONPATH
- Augur.sh bootstrap command: simplified, no more env var requirements
- 14 new tests for agent_client (load_env, find_agent, invoke, logging)
- Updated bootstrap tests for new AgentClient-based API

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
The agents command no longer uses --mode flags; it uses --group
and --all instead. Bootstrap workflow simplified to reflect that
AUGUR_AGENTS_API_KEY is now set in .env rather than auto-discovered.

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
- --bootstrap: shorthand for --mode bootstrap (Qwen free tokens, core only)
- --trading/--news: shorthand for --group trading/news
- Remove --group flag from CLI (covered by shorthands + --all)
- Update post-install info with correct bootstrap workflow

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
Don't prescribe --bootstrap as mandatory; show it as an alternative
to regular agents for users who want to use free Qwen tokens.

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
Replace cron-planner as bootstrap target with a purpose-built bootstrap
agent. It coordinates L1 data agents (market-data, osint-data,
signals-data) which maintain the unified store interface, rather than
calling data tools directly.

- New agent: bootstrap (L4, group: bootstrap) with edges to all 3 L1 agents
- New prompt: prompts/bootstrap.md with delegation rules per kind
- Update bootstrap-data.py default agent from cron-planner to bootstrap
- Add bootstrap to live-chat edges, both model config files, ALL_GROUPS
- Update test expectations for 17th agent

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
Extend bootstrap from profiles-only to a full 4-phase bootstrapper:
- Phase 1: Profiles (existing)
- Phase 2: Timeseries (existing)
- Phase 3: Events - seed current events via L1 delegation (GDELT,
  market moves, trending signals) for countries/stocks/commodities/
  crypto/regions
- Phase 4: Plans - create initial research plans and watchlists so
  cron-planner has work from day one

New CLI flags: --events, --plans, --all-phases
Add google-news MCP to bootstrap agent tools for event research.

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
…tats

Rewrite bootstrap-data.py:
- One agent call per kind per phase (not per batch of 10)
- No batching, no random sleeps — agent handles iteration internally
- Safe to re-run: enriches existing data, deduplicates events/plans
- Before/after profile coverage stats table
- Structured timestamped log output
- --phase flag for single phase, default runs all 4 phases
- 10min timeout per call (agent makes many tool calls per kind)

Rewrite agent_client.py:
- Retry with exponential backoff on 429, 5xx, timeout, network errors
- Up to 4 retries (2s, 4s, 8s, 16s), respects Retry-After header

Update tests for new API (no batch_targets, build_profiles_prompt, etc).

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
The EU Parliament API (data.europarl.europa.eu) regularly takes >60s to
respond. Bump pytest-timeout to 120s for this specific test to avoid
flaky CI failures.

https://claude.ai/code/session_01GrTcXSgevbdYJ97Jq8tsbd
@ManuelKugelmann ManuelKugelmann merged commit dad7c1d into main Mar 23, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants