AI Model Custom Pricing and Cost Calculation by Sanjusha-tridz · Pull Request #294 · tridz-dev/huf

Sanjusha-tridz · 2026-06-08T08:38:41Z

Summary

This PR introduces a unified, priority-based cost calculation system for all AI interactions within the HUF framework. It moves away from relying solely on LiteLLM's built-in token prices by allowing users to define custom, model-specific pricing directly on the AI Model DocType.

This ensures that organizations can accurately track AI costs based on their negotiated enterprise rates, custom deployed models, or updated pricing before LiteLLM officially supports it.

Key Changes

Cost Calculator Module (huf.ai.cost_calculator): Added a centralized module serving as the single source of truth for token cost calculations.
Priority-Based Calculation:
1. Custom Pricing (Highest Priority): Uses prices defined on the AI Model DocType if use_custom_pricing is enabled.
2. LiteLLM Fallback: Falls back to litellm.completion_cost() if no custom pricing is set.
3. Unknown/Free: Defaults to 0.0 to ensure costs are never silently miscalculated.
AI Model DocType Updates: Added fields for input_cost_per_1m_tokens, output_cost_per_1m_tokens, and cached_input_cost_per_1m_tokens. Added validation to ensure both input and output prices are set when custom pricing is toggled on.
Caching & Performance: Custom prices are cached in Redis with a 10-minute TTL to prevent database hits on every LLM call. The cache is automatically invalidated via on_update hooks.
LiteLLM Registry Sync: Custom prices are actively synced into LiteLLM's in-memory registry. This happens via after_migrate hooks and dynamically when an AI model is updated, ensuring complete consistency across the system.
Standardized Integration: Replaced raw litellm.completion_cost usage across agent_integration.py and providers/litellm.py (both streaming and sync responses) with the unified calculate_cost function.

Impact

Accurate Cost Tracking: Accurately tracks prompt caching discounts natively.
Flexibility: Enables pricing support for self-hosted, local, or newly released models that LiteLLM has not yet added to its internal pricing list.
Performance: High performance is maintained through Redis caching and in-memory registry updates.

-Single source of truth for all LLM cost calculations -Priority: custom DB prices → LiteLLM auto → 0.0/unknown -Industry-standard formula: (tokens / 1M) × price -Redis caching with 10-min TTL per model -LiteLLM registry sync and cache invalidation helpers

-validate(): enforces both input and output prices set together and skips check if use_custom_pricing is off -on_update(): invalidates Redis cache on model save and re-registers prices in LiteLLM registry immediately

-Removed direct litellm.completion_cost() import and call -Replaced with huf.ai.cost_calculator.calculate_cost() -Passes model name, token counts and raw response for fallback -Added cost calculation to streaming path (run_stream) — previously had none -complete chunk now includes 'cost' key

- Fix cache-aware cost formula to prevent double-charging by subtracting cached tokens from total input tokens before applying standard input price - Introduce unregister_model_pricing_with_litellm() to handle memory cleanup when custom pricing is disabled - Implement strict JSON fallback logic inside the unregister function to safely restore native LiteLLM model mappings (e.g. gpt-4o) instead of destroying them in the internal dictionary - Add Just-In-Time (JIT) memory validation immediately before litellm.completion_cost() fallback to prevent multi-worker environments (Gunicorn) from using stale custom prices

- Update on_update hook to explicitly trigger unregister_model_pricing_with_litellm() when use_custom_pricing is toggled off - Ensure the active worker purges the override from LiteLLM's internal registry upon save

- Centralize cost calculation by replacing direct, hardcoded calls to litellm.completion_cost() with the unified calculate_cost() module - Add logic to preferentially extract and use the cost metadata already embedded in the final chunk, eliminating redundant recalculations - Ensure mock response fallback generation properly inherits the normalized pricing_model to prevent unmapped provider prefix errors when LiteLLM fallback is required

Sanjusha-tridz added 9 commits June 1, 2026 08:04

feat: add pricing fields to AI Model

669c01d

feat: add validate and on_update hooks to AIModel

d8e6b98

-validate(): enforces both input and output prices set together and skips check if use_custom_pricing is off -on_update(): invalidates Redis cache on model save and re-registers prices in LiteLLM registry immediately

feat: sync custom model pricing in after_migrate

00d61a5

docs: document custom pricing system on AI Model

71c85d9

fix: trigger LiteLLM memory cleanup on disable

a3bf579

- Update on_update hook to explicitly trigger unregister_model_pricing_with_litellm() when use_custom_pricing is toggled off - Ensure the active worker purges the override from LiteLLM's internal registry upon save

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Model Custom Pricing and Cost Calculation#294

AI Model Custom Pricing and Cost Calculation#294
Sanjusha-tridz wants to merge 9 commits into
developfrom
feat/ai-model-pricing

Sanjusha-tridz commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sanjusha-tridz commented Jun 8, 2026

Summary

Key Changes

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant