Skip to content

feat: route all agents through Gemini via LiteLLM proxy#17

Open
farzanshibu wants to merge 6 commits into
raroque:mainfrom
farzanshibu:feat/gemini-via-litellm-proxy
Open

feat: route all agents through Gemini via LiteLLM proxy#17
farzanshibu wants to merge 6 commits into
raroque:mainfrom
farzanshibu:feat/gemini-via-litellm-proxy

Conversation

@farzanshibu
Copy link
Copy Markdown

Closes #16

Summary

  • litellm.config.yaml: maps Claude Code's internal model IDs (claude-sonnet-4-6, claude-haiku-4-5-20251001, claude-sonnet-4-5-20250929) to gemini/gemini-2.5-flash so requests route to Gemini transparently
  • scripts/start-proxy.sh: fixed .env.local loading (inline comments broke xargs); now runs LiteLLM on port 4001 + thin proxy on port 4000
  • scripts/anthropic-proxy.mjs: new thin Node.js proxy that intercepts /v1/messages/count_tokens (LiteLLM+Gemini bug: sends empty body → 500) and returns a mock 200; proxies all other requests to LiteLLM
  • server/interaction-agent.ts: fixed "(no reply)" literal being sent to iMessage when model produces no text; added self-description to safe-to-answer list; added unknown SDK message type logging
  • server/bot.ts: empty reply now sends a helpful fallback message instead of nothing

Test plan

  • sh scripts/start-proxy.sh starts without env errors
  • LiteLLM logs show on port 4001; proxy logs on port 4000
  • Send a message via iMessage/Sendblue — verify 200 OK in proxy logs and a real reply is received
  • /v1/messages/count_tokens calls return 200 (no more 500 floods)
  • Ask "which model are you" — agent answers directly without spawning

🤖 Generated with Claude Code

farzanshibu and others added 4 commits April 28, 2026 03:05
Replace custom Sendblue integration with chat-adapter-sendblue and
@chat-adapter/telegram, both unified under Chat SDK. All platforms
share the same handlers, memory, automations, and Composio tools.

- Add server/bot.ts with env-driven adapter registry (registerIfConfigured)
- Delete server/sendblue.ts (replaced by chat-adapter-sendblue)
- Refactor server/index.ts webhook bridge to forward all headers + debug logs
- Add scripts/telegram-webhook.mjs for auto-registration on dev boot
- Update scripts/dev.mjs: auto-register Telegram webhook alongside Sendblue
- Add README section: "Adding more chat platforms" with Slack walkthrough
- Add .agents/skills/chat-sdk/SKILL.md for agent-assisted Chat SDK work

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Capture raw request body via express.json verify callback so HMAC
  signature verification works for Slack, GitHub, Discord, etc.
- Proxy adapter responses faithfully (status + headers + raw bytes)
  instead of forcing JSON — fixes URL verification challenge flows
- Fix double-handler: scope onNewMessage catch-all to sendblue only;
  all other platforms use onDirectMessage to avoid firing twice on @-mentions
- Gate webhook body logging behind DEBUG_WEBHOOKS=true env var to avoid
  PII in production logs
- Surface webhook registration failures explicitly in autoRegisterWebhook
- Update dev banner to list all active platform webhooks dynamically
  instead of hardcoding Sendblue-only messaging

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- litellm.config.yaml: map Claude model IDs (claude-sonnet-4-6,
  claude-haiku-4-5-20251001, etc.) to gemini/gemini-2.5-flash so
  Claude Code CLI accepts the model name while LiteLLM routes to Gemini
- scripts/start-proxy.sh: load .env.local stripping inline comments;
  run LiteLLM on port 4001, thin proxy on port 4000
- scripts/anthropic-proxy.mjs: intercepts /v1/messages/count_tokens
  and returns a mock 200 (LiteLLM+Gemini sends empty body → 500 bug);
  proxies all other requests to LiteLLM on 4001
- server/interaction-agent.ts: fix empty-reply fallback (was sending
  literal "(no reply)" string to iMessage); add self-description to
  safe-to-answer list; log unknown SDK message types for debugging
- server/bot.ts: send helpful fallback when model produces no text;
  log elapsed time in no-reply path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 22:30
…rt and enhance execution agent logic

Co-authored-by: Copilot <copilot@github.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR routes all agent requests through a LiteLLM proxy to Gemini, adds a thin Node.js proxy to work around a LiteLLM+Gemini count_tokens bug, brings up a Telegram adapter, and fixes an empty-reply fallback in bot.ts.

  • gemini-main / gpt-main not in LiteLLM config: .env.example tells users to set BOOP_EXECUTION_MODEL=gemini-main or gpt-main, but neither model name is defined in litellm.config.yaml — any agent turn using those values will fail with a model-not-found error from LiteLLM.

Confidence Score: 4/5

Safe to merge after resolving the litellm.config.yaml / .env.example model-name mismatch; all other changes are straightforward.

One P1 defect: documented model names (gemini-main, gpt-main) are absent from the LiteLLM config, causing immediate failures for any user who follows the .env.example instructions. No P0 issues found.

litellm.config.yaml and .env.example — model name definitions must be consistent between them.

Important Files Changed

Filename Overview
scripts/anthropic-proxy.mjs New thin Node.js proxy that mocks count_tokens and forwards other requests to LiteLLM; mock token value (10000) already flagged in previous review
scripts/start-proxy.sh Fixed .env.local parsing (strips inline comments via sed before sourcing); starts LiteLLM on 4001 and proxy on 4000; empty OPENAI_API_KEY export already flagged
litellm.config.yaml Maps three Claude model IDs to gemini-2.5-flash; missing gemini-main/gpt-main entries documented in .env.example causes model-not-found errors if users follow the example
server/bot.ts Added Telegram adapter, fallback message for empty replies, and onSubscribedMessage handler; Sendblue double-firing issue already flagged in previous review
server/interaction-agent.ts Added self-description to safe-to-answer list and unknown SDK message type logging; no new issues found
.env.example Documents gemini-main/gpt-main model names that are not defined in litellm.config.yaml; using them would cause model-not-found errors

Sequence Diagram

sequenceDiagram
    participant CC as Claude Code CLI
    participant P as anthropic-proxy :4000
    participant L as LiteLLM :4001
    participant G as Gemini API

    CC->>P: POST /v1/messages/count_tokens
    P-->>CC: 200 {input_tokens: 10000} (mocked)

    CC->>P: POST /v1/messages (claude-sonnet-4-6)
    P->>L: forward request
    L->>G: gemini/gemini-2.5-flash
    G-->>L: response
    L-->>P: response
    P-->>CC: response

    Note over CC,G: All three mapped Claude model IDs route to gemini-2.5-flash
Loading

Comments Outside Diff (1)

  1. .env.example, line 303-304 (link)

    P1 gemini-main / gpt-main not defined in LiteLLM config

    The comments instruct users to set BOOP_EXECUTION_MODEL=gemini-main or gpt-main, but neither name appears in litellm.config.yaml. LiteLLM will respond with a "model not found" error for any request using those values; the agent will fail on every turn.

    litellm.config.yaml only maps claude-sonnet-4-6, claude-sonnet-4-5-20250929, and claude-haiku-4-5-20251001. Either add the missing entries to the config, or update the comment to reflect a model name that is actually mapped.

Reviews (2): Last reviewed commit: "Merge branch 'main' into feat/gemini-via..." | Re-trigger Greptile

Comment thread server/bot.ts
Comment on lines +130 to +131

// ---------------------------------------------------------------------------
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Duplicate turn for subscribed Sendblue threads

After the first Sendblue message, handleTurn calls thread.subscribe(). On every subsequent message that thread is now subscribed, so onSubscribedMessage fires and onNewMessage(/[\s\S]*/) (which only guards on adapter.name !== "sendblue") also fires — resulting in two calls to handleTurn, two thread.post() calls, and two Convex mutations per message.

A simple fix is to add an early-exit guard inside onSubscribedMessage since onNewMessage already covers follow-up messages for that adapter:

bot.onSubscribedMessage(async (thread, message) => {
  // Sendblue messages are already handled via onNewMessage above
  if (thread.adapter.name === "sendblue") return;
  await handleTurn(thread, message);
});

Comment on lines +18 to +21
if (req.url?.startsWith("/v1/messages/count_tokens")) {
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify({ input_tokens: 10000 }));
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded mock token count may mis-signal context capacity

Returning input_tokens: 10000 for every count_tokens request means Claude Code always believes the context is ~10 k tokens, regardless of actual conversation length. If a conversation approaches the real model limit, Claude Code won't get a signal to compact or warn the user — it will silently exceed the context window and see truncation or errors at inference time.

A safer sentinel is a very large value (e.g. 200_000) so Claude Code never hits the heuristic threshold, or the comment should explicitly document the chosen value and its rationale.

Comment thread scripts/start-proxy.sh
set +a
fi

# Ensure mandatory keys are set for LiteLLM
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unconditional empty-string export may confuse LiteLLM

Exporting OPENAI_API_KEY as an empty string (via the :-"" default) always sets the variable in the environment, even when OpenAI is not in use. LiteLLM inspects environment variable presence to decide whether to validate/route OpenAI requests; an empty string may trigger unexpected validation errors or suppress the clearer "key not configured" diagnostic. Only exporting it when the value is non-empty would be safer.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Route all agents through Gemini via LiteLLM proxy

2 participants