Skip to content

AI Model Custom Pricing and Cost Calculation#294

Open
Sanjusha-tridz wants to merge 9 commits into
developfrom
feat/ai-model-pricing
Open

AI Model Custom Pricing and Cost Calculation#294
Sanjusha-tridz wants to merge 9 commits into
developfrom
feat/ai-model-pricing

Conversation

@Sanjusha-tridz

Copy link
Copy Markdown
Collaborator

Summary

This PR introduces a unified, priority-based cost calculation system for all AI interactions within the HUF framework. It moves away from relying solely on LiteLLM's built-in token prices by allowing users to define custom, model-specific pricing directly on the AI Model DocType.

This ensures that organizations can accurately track AI costs based on their negotiated enterprise rates, custom deployed models, or updated pricing before LiteLLM officially supports it.

Key Changes

  • Cost Calculator Module (huf.ai.cost_calculator): Added a centralized module serving as the single source of truth for token cost calculations.
  • Priority-Based Calculation:
    1. Custom Pricing (Highest Priority): Uses prices defined on the AI Model DocType if use_custom_pricing is enabled.
    2. LiteLLM Fallback: Falls back to litellm.completion_cost() if no custom pricing is set.
    3. Unknown/Free: Defaults to 0.0 to ensure costs are never silently miscalculated.
  • AI Model DocType Updates: Added fields for input_cost_per_1m_tokens, output_cost_per_1m_tokens, and cached_input_cost_per_1m_tokens. Added validation to ensure both input and output prices are set when custom pricing is toggled on.
  • Caching & Performance: Custom prices are cached in Redis with a 10-minute TTL to prevent database hits on every LLM call. The cache is automatically invalidated via on_update hooks.
  • LiteLLM Registry Sync: Custom prices are actively synced into LiteLLM's in-memory registry. This happens via after_migrate hooks and dynamically when an AI model is updated, ensuring complete consistency across the system.
  • Standardized Integration: Replaced raw litellm.completion_cost usage across agent_integration.py and providers/litellm.py (both streaming and sync responses) with the unified calculate_cost function.

Impact

  • Accurate Cost Tracking: Accurately tracks prompt caching discounts natively.
  • Flexibility: Enables pricing support for self-hosted, local, or newly released models that LiteLLM has not yet added to its internal pricing list.
  • Performance: High performance is maintained through Redis caching and in-memory registry updates.

 -Single source of truth for all LLM cost calculations
 -Priority: custom DB prices → LiteLLM auto → 0.0/unknown
 -Industry-standard formula: (tokens / 1M) × price
 -Redis caching with 10-min TTL per model
 -LiteLLM registry sync and cache invalidation helpers
 -validate(): enforces both input and output prices set together and skips check if use_custom_pricing is off
 -on_update(): invalidates Redis cache on model save and re-registers prices in LiteLLM registry immediately
 -Removed direct litellm.completion_cost() import and call
 -Replaced with huf.ai.cost_calculator.calculate_cost()
 -Passes model name, token counts and raw response for fallback
 -Added cost calculation to streaming path (run_stream) — previously had none
 -complete chunk now includes 'cost' key
- Fix cache-aware cost formula to prevent double-charging by subtracting cached tokens from total input tokens before applying standard input price
- Introduce unregister_model_pricing_with_litellm() to handle memory cleanup when custom pricing is disabled
- Implement strict JSON fallback logic inside the unregister function to safely restore native LiteLLM model mappings (e.g. gpt-4o) instead of destroying them in the internal dictionary
- Add Just-In-Time (JIT) memory validation immediately before litellm.completion_cost() fallback to prevent multi-worker environments (Gunicorn) from using stale custom prices
- Update on_update hook to explicitly trigger unregister_model_pricing_with_litellm() when use_custom_pricing is toggled off
- Ensure the active worker purges the override from LiteLLM's internal registry upon save
- Centralize cost calculation by replacing direct, hardcoded calls to litellm.completion_cost() with the unified calculate_cost() module
- Add logic to preferentially extract and use the cost metadata already embedded in the final chunk, eliminating redundant recalculations
- Ensure mock response fallback generation properly inherits the normalized pricing_model to prevent unmapped provider prefix errors when LiteLLM fallback is required
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant