Skip to content

Conversation

@Abhiraj-GetGarak
Copy link
Contributor

@Abhiraj-GetGarak Abhiraj-GetGarak commented Dec 16, 2025

Summary

  • Adds --track_usage flag to enable token usage tracking without limits
  • Adds --token_limit and --cost_limit CLI options to cap scan resource usage
  • Tracks prompt/completion token breakdown during scans
  • Estimates costs based on model pricing from model_pricing.yaml
  • Displays token usage summary at end of scan

Related Issues

  • Closes #1532 - Feature Request: Track and display token usage during scans
  • Closes #1533 - Feature Request: Budget limits to cap token usage and costs
  • Closes #1534 - Feature Request: Estimate and report scan costs

Motivation

Users conducting garak scans against commercial LLM APIs lack visibility into token consumption and have no way to control costs. This creates:

  • Unexpected expenses from extended scans (potentially hundreds of dollars)
  • No budget enforcement mechanisms
  • Limited ability to test configurations safely
  • Barriers to enterprise adoption

Community feedback (Discord, April/September 2025):

  • "Does GARAK have any way to track or estimate how many tokens are used in each scan?"
  • "I'm trying to reduce the cost of running GARAK scans against commercial LLM APIs"

Usage

# Track usage without limits (report only)
python -m garak --target_type openai --target_name gpt-3.5-turbo --probes encoding --track_usage

# Limit scan to 10,000 tokens (automatically enables tracking)
python -m garak --target_type openai --target_name gpt-3.5-turbo --probes encoding --token_limit 10000

# Limit scan to $5.00 USD
python -m garak --target_type openai --target_name gpt-4o --probes dan --cost_limit 5.00

# Both limits (stops at whichever is hit first)
python -m garak --target_type openai --target_name gpt-4o --probes dan --token_limit 50000 --cost_limit 10.00

Output Example

Token Usage Summary:
  Total tokens: 1,686 (prompt: 1,424, completion: 262)
  Estimated cost: $0.0011 USD
  API calls: 2
  Token limit: 1,000

Supported Generators

Token usage tracking is implemented for:

  • OpenAI (openai.py)
  • LiteLLM (litellm.py) - supports all LiteLLM-compatible providers
  • Mistral (mistral.py)
  • Ollama (ollama.py)
  • Bedrock (bedrock.py)

Trade-offs and Limitations

1. Token Counting Accuracy

Cost estimation depends on APIs returning token counts. When APIs don't provide counts, we estimate using ~4 chars/token ratio (marked as "estimated" in summary).

Impact: Cost estimates may be less accurate for providers that don't return token counts.

2. Parallel Execution Overshoot

With --parallel_attempts, the budget may slightly overshoot the limit since multiple workers dispatch simultaneously. We use batch processing to minimize this.

Impact: If you set --token_limit 1000, actual usage might be ~1100-1200 tokens due to in-flight requests completing.

3. Pricing Data Staleness

Model prices in model_pricing.yaml may become outdated. Users should verify current rates with providers for production budgeting.

Impact: Cost estimates are approximations. The file includes an update timestamp (2025-12) for reference.

Unknown models use conservative defaults ($5/$15 per 1M tokens input/output).

Test Plan

  • Unit tests for BudgetManager, TokenUsage, cost calculation
  • Integration tests for multiprocessing shared state
  • Manual testing with OpenAI gpt-3.5-turbo (verified 2 API calls, proper token tracking)
  • Manual testing with LiteLLM/Claude (verified budget exceeded at ~1013 tokens)

Implements comprehensive budget management for garak scans:
- Add --token_limit and --cost_limit CLI options
- Track prompt/completion token usage across all generators
- Estimate costs using model_pricing.yaml
- Support multiprocessing with shared state budget enforcement
- Display token usage summary at end of scan

Closes NVIDIA#1532, NVIDIA#1533, NVIDIA#1534
Remove token usage tracking from nvcf and watsonx generators as these
have not been tested and should be added in a separate PR once verified.
@Abhiraj-GetGarak Abhiraj-GetGarak marked this pull request as ready for review December 16, 2025 21:15
- Remove DEFAULT_PARAMS from BudgetManager (service class, not plugin)
- Add _capture_oai_token_usage() and _capture_dict_token_usage() helpers
  to Generator base class for consistent token tracking across providers
- Refactor bedrock, litellm, mistral, ollama, openai generators to use
  new base class helpers
- Use self.budget_manager for limit values in harness instead of _config
- Return (budget_manager, exception) tuple from probewise_run/pxd_run
  instead of storing in _config.transient
- Auto-enable track_usage when cost/token limits are set in CLI
- Add :rtype: annotations to docstrings
- Move _last_usage from class variable to instance variable
- Add handle_budget_exceeded() function to command.py (was missing but called from cli.py)
- Remove redundant BudgetExceededError imports in harnesses/base.py (already imported at top)
- Remove dead code fallback to _config.transient.budget_manager in end_run()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant