Add Budget Manager + Support for Anthropic, Cohere, Palm (100+ LLMs using LiteLLM)#99
Add Budget Manager + Support for Anthropic, Cohere, Palm (100+ LLMs using LiteLLM)#99ishaan-jaff wants to merge 2 commits intoStampyAI:mainfrom
Conversation
|
@FraserLee @henri123lemoine can i get a review on this pr ? if this initial commit looks good i can add docs/testing too |
| ENCODER = tiktoken.get_encoding("cl100k_base") | ||
|
|
||
| # initialize a budget manager to control costs for gpt-4/other llms | ||
| budget_manager = litellm.BudgetManager(project_name="stampy_chat") |
There was a problem hiding this comment.
how does the budget per user get configured? Could you add a new item to env.py so that it can be configured? Also, what would the units be? (I had a very quick glance at the litellm docs, but otherwise don't know anything about it)
There was a problem hiding this comment.
(I'll give it a proper look tomorrow)
| # convert talk_to_robot_internal from dict generator into json generator | ||
| def talk_to_robot(index, query: str, mode: str, history: List[Dict[str, str]], k: int = STANDARD_K, log: Callable = print): | ||
| yield from (json.dumps(block) for block in talk_to_robot_internal(index, query, mode, history, k, log)) | ||
| session_id = str(uuid.uuid4()) |
There was a problem hiding this comment.
If I understand this, then budget manager has an internal dict to count how much a given session has used? But if you're creating a new id with each call to this function, then each session will have max 1 call? I'm planning on adding session ids, as it will be needed for logging anyway, so could you do this by extracting the session id from the request params in the main.py functions?
| def talk_to_robot(index, query: str, mode: str, history: List[Dict[str, str]], k: int = STANDARD_K, log: Callable = print): | ||
| yield from (json.dumps(block) for block in talk_to_robot_internal(index, query, mode, history, k, log)) | ||
| session_id = str(uuid.uuid4()) | ||
| budget_manager.create_budget(total_budget=10, user=session_id) # init $10 budget |
There was a problem hiding this comment.
don't hardcode it - create a setting in env.py
|
|
||
| for chunk in openai.ChatCompletion.create( | ||
| # check if budget exceeded for session | ||
| if budget_manager.get_current_cost(user=session_id) <= budget_manager.get_total_budget(session_id): |
There was a problem hiding this comment.
is this for the number of allowed tokens or chat calls? Is it a hard total, or does it get reset every now and then? The code is run on gunicorn workers - how will that influence it, as I'm guessing litellm won't communicate across processes?
Addressing:
#47
#55
This PR addresses two problems:
Add support for 100+ LLMs
using LiteLLM https://github.com/BerriAI/litellm/
LiteLLM is a lightweight package to simplify LLM API calls - use any llm as a drop in replacement for gpt-3.5-turbo.
Example
Use a budget manager for limiting $ spend per session or per user
LiteLLM exposes a budget manager for each session/user