AI Model Custom Pricing and Cost Calculation#294
Open
Sanjusha-tridz wants to merge 9 commits into
Open
Conversation
-Single source of truth for all LLM cost calculations -Priority: custom DB prices → LiteLLM auto → 0.0/unknown -Industry-standard formula: (tokens / 1M) × price -Redis caching with 10-min TTL per model -LiteLLM registry sync and cache invalidation helpers
-validate(): enforces both input and output prices set together and skips check if use_custom_pricing is off -on_update(): invalidates Redis cache on model save and re-registers prices in LiteLLM registry immediately
-Removed direct litellm.completion_cost() import and call -Replaced with huf.ai.cost_calculator.calculate_cost() -Passes model name, token counts and raw response for fallback -Added cost calculation to streaming path (run_stream) — previously had none -complete chunk now includes 'cost' key
- Fix cache-aware cost formula to prevent double-charging by subtracting cached tokens from total input tokens before applying standard input price - Introduce unregister_model_pricing_with_litellm() to handle memory cleanup when custom pricing is disabled - Implement strict JSON fallback logic inside the unregister function to safely restore native LiteLLM model mappings (e.g. gpt-4o) instead of destroying them in the internal dictionary - Add Just-In-Time (JIT) memory validation immediately before litellm.completion_cost() fallback to prevent multi-worker environments (Gunicorn) from using stale custom prices
- Update on_update hook to explicitly trigger unregister_model_pricing_with_litellm() when use_custom_pricing is toggled off - Ensure the active worker purges the override from LiteLLM's internal registry upon save
- Centralize cost calculation by replacing direct, hardcoded calls to litellm.completion_cost() with the unified calculate_cost() module - Add logic to preferentially extract and use the cost metadata already embedded in the final chunk, eliminating redundant recalculations - Ensure mock response fallback generation properly inherits the normalized pricing_model to prevent unmapped provider prefix errors when LiteLLM fallback is required
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a unified, priority-based cost calculation system for all AI interactions within the HUF framework. It moves away from relying solely on LiteLLM's built-in token prices by allowing users to define custom, model-specific pricing directly on the
AI ModelDocType.This ensures that organizations can accurately track AI costs based on their negotiated enterprise rates, custom deployed models, or updated pricing before LiteLLM officially supports it.
Key Changes
huf.ai.cost_calculator): Added a centralized module serving as the single source of truth for token cost calculations.AI ModelDocType ifuse_custom_pricingis enabled.litellm.completion_cost()if no custom pricing is set.0.0to ensure costs are never silently miscalculated.input_cost_per_1m_tokens,output_cost_per_1m_tokens, andcached_input_cost_per_1m_tokens. Added validation to ensure both input and output prices are set when custom pricing is toggled on.on_updatehooks.after_migratehooks and dynamically when an AI model is updated, ensuring complete consistency across the system.litellm.completion_costusage acrossagent_integration.pyandproviders/litellm.py(both streaming and sync responses) with the unifiedcalculate_costfunction.Impact