-
OpenTelemetry-native governance for AI systems
-
Turn AI telemetry into actionable accountability
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
----
-
-## What is GenOps AI?
-
-GenOps AI is an **open-source governance framework** that brings cost attribution, policy enforcement, and compliance automation to AI systems using **OpenTelemetry standards**.
-
-While [OpenLLMetry](https://github.com/traceloop/openllmetry) tells you *what* your AI is doing (prompts, completions, tokens), **GenOps AI tells you *why and how* โ with governance telemetry** that enables:
-
-- ๐ฐ **Cost Attribution** across teams, projects, features, and customers
-- ๐ก๏ธ **Policy Enforcement** with configurable limits and content filtering
-- ๐ **Budget Tracking** with automated alerts and spend controls
-- ๐ **Compliance Automation** with evaluation metrics and audit trails
-- ๐ **Observability Integration** with your existing monitoring stack
-
-**Built on OpenTelemetry standards, works alongside OpenLLMetry and other observability tools.**
-
-## Quick Start
-
-### Installation
-
-=== "Basic"
- ```bash
- pip install genops
- ```
-
-=== "With AI Providers"
- ```bash
- pip install "genops[openai,anthropic]" # For OpenAI + Anthropic
- pip install "genops[all]" # All providers
- ```
-
-=== "Development"
- ```bash
- git clone https://github.com/KoshiHQ/GenOps-AI.git
- cd GenOps-AI
- make dev-install # Sets up everything including pre-commit hooks
- ```
-
-### 30-Second Test
-
-Verify your installation works:
-
-```bash
-# Test the CLI
-genops --version
-
-# Quick Python test
-python -c "import genops; print('โ
GenOps AI installed successfully!')"
-```
-
-### 5-Minute Governance Setup
-
-```python
-from genops.providers.openai import instrument_openai
-import genops
-
-# 1. Set default attribution (once at app startup)
-genops.set_default_attributes(
- team="platform-engineering",
- project="ai-services",
- environment="production"
-)
-
-# 2. Instrument your AI providers
-client = instrument_openai(api_key="your-openai-key")
-
-# 3. Use normally - defaults inherited automatically
-response = client.chat_completions_create(
- model="gpt-3.5-turbo",
- messages=[{"role": "user", "content": "Hello!"}],
- # Only specify what's unique to this operation
- customer_id="enterprise-123",
- feature="chat-assistant"
- # team, project, environment automatically included!
-)
-
-# 4. OpenTelemetry exports complete attribution data
-# โ
Cost, tokens, team, customer, feature โ Your observability platform
-```
-
-## Key Features
-
-### ๐ **Provider Instrumentation**
-
-Automatic governance tracking for major AI providers:
-
-```python
-from genops.providers.openai import instrument_openai
-
-# Instrument OpenAI with automatic governance tracking
-client = instrument_openai(api_key="your-openai-key")
-
-# All calls now include cost, token, and governance telemetry
-response = client.chat_completions_create(
- model="gpt-4",
- messages=[{"role": "user", "content": "Hello!"}],
- # Governance attributes
- team="support-team",
- project="ai-assistant",
- customer_id="enterprise-123"
-)
-# โ
Cost, tokens, policies automatically tracked and exported via OpenTelemetry
-```
-
-### ๐ก๏ธ **Policy Enforcement**
-
-Configurable governance policies with real-time enforcement:
-
-```python
-from genops.core.policy import register_policy, PolicyResult, _policy_engine
-
-# Register governance policies
-register_policy(
- name="cost_limit",
- enforcement_level=PolicyResult.BLOCKED,
- conditions={"max_cost": 5.00}
-)
-
-# Evaluate policies before operations
-def safe_ai_operation(prompt: str, estimated_cost: float):
- # Check policy before operation
- result = _policy_engine.evaluate_policy("cost_limit", {"cost": estimated_cost})
-
- if result.result == PolicyResult.BLOCKED:
- raise Exception(f"Policy violation: {result.reason}")
-
- return call_ai_model(prompt) # Proceeds if policy allows
-```
-
-### ๐ **Rich Governance Telemetry**
-
-Comprehensive tracking with OpenTelemetry integration:
-
-```python
-from genops.core.telemetry import GenOpsTelemetry
-
-telemetry = GenOpsTelemetry()
-
-with telemetry.trace_operation(operation_name="document_analysis") as span:
- # AI processing...
- ai_result = process_document()
-
- # Record comprehensive governance signals
- telemetry.record_cost(span, cost=2.50, currency="USD", provider="openai")
- telemetry.record_policy(span, policy_name="content_safety", result="allowed")
- telemetry.record_evaluation(span, metric_name="quality_score", score=0.92)
- telemetry.record_budget(span, budget_name="monthly_ai_spend", allocated=1000, consumed=150)
-```
-
-## Why GenOps AI?
-
-**Traditional AI monitoring tells you what happened. GenOps AI tells you what it cost, who did it, whether it should have been allowed, and how well it worked.**
-
-- **For DevOps Teams**: Integrate AI governance into existing observability workflows
-- **For FinOps Teams**: Get precise cost attribution and budget controls
-- **For Compliance Teams**: Automated policy enforcement with audit trails
-- **For Product Teams**: Feature-level AI cost analysis and optimization insights
-
-**Open source, OpenTelemetry-native, and designed to work with your existing stack.**
-
-## Next Steps
-
-
-
-- :material-clock-fast:{ .lg .middle } **Quick Start**
-
- ---
-
- Get up and running in 5 minutes with our comprehensive quick start guide.
-
- [:octicons-arrow-right-24: Quick Start](quickstart.md)
-
-- :material-book-open-page-variant:{ .lg .middle } **User Guide**
-
- ---
-
- Learn core concepts and best practices for AI governance.
-
- [:octicons-arrow-right-24: User Guide](user-guide/concepts.md)
-
-- :material-puzzle:{ .lg .middle } **Integrations**
-
- ---
-
- Connect GenOps AI with your AI providers and observability stack.
-
- [:octicons-arrow-right-24: Integrations](integrations/index.md)
-
-- :material-api:{ .lg .middle } **API Reference**
-
- ---
-
- Detailed API documentation for all GenOps AI components.
-
- [:octicons-arrow-right-24: API Reference](api/index.md)
-
-
-
-## Community
-
-We welcome contributions! GenOps AI is built by the community, for the community.
-
-- **GitHub**: [KoshiHQ/GenOps-AI](https://github.com/KoshiHQ/GenOps-AI)
-- **Discussions**: [GitHub Discussions](https://github.com/KoshiHQ/GenOps-AI/discussions)
-- **Issues**: [Report bugs or request features](https://github.com/KoshiHQ/GenOps-AI/issues)
-
----
-
-*Ready to bring governance to your AI systems?*
-
-```bash
-pip install genops
-```
\ No newline at end of file
diff --git a/docs/integrations/anthropic.md b/docs/integrations/anthropic.md
deleted file mode 100644
index ad54c1a..0000000
--- a/docs/integrations/anthropic.md
+++ /dev/null
@@ -1,706 +0,0 @@
-# Anthropic Integration Guide
-
-## Overview
-
-The GenOps Anthropic adapter provides comprehensive governance telemetry for Claude applications, including:
-
-- **Message completion tracking** with detailed cost and performance metrics
-- **Multi-model cost optimization** across Claude 3 variants (Haiku, Sonnet, Opus)
-- **Token usage analytics** for cost forecasting and optimization
-- **Conversation tracking** for multi-turn dialog systems
-- **Policy enforcement** with governance attribute propagation
-
-## Quick Start
-
-### Installation
-
-```bash
-pip install genops-ai[anthropic]
-```
-
-### Basic Setup
-
-The simplest way to add GenOps tracking to your Anthropic application:
-
-```python
-from genops.providers.anthropic import instrument_anthropic
-
-# Initialize GenOps Anthropic adapter
-client = instrument_anthropic(api_key="your_anthropic_key")
-
-# Your existing Anthropic code works unchanged
-response = client.messages_create(
- model="claude-3-5-sonnet-20241022",
- max_tokens=300,
- messages=[{"role": "user", "content": "Explain machine learning"}],
- team="ai-research",
- project="claude-assistant",
- customer_id="customer_123"
-)
-```
-
-### Auto-Instrumentation (Recommended)
-
-For zero-code setup, enable auto-instrumentation:
-
-```python
-from genops import auto_instrument
-
-# Automatically instrument all supported providers
-auto_instrument()
-
-# Your Anthropic code automatically gets governance telemetry
-from anthropic import Anthropic
-client = Anthropic()
-response = client.messages.create(
- model="claude-3-5-haiku-20241022",
- max_tokens=200,
- messages=[{"role": "user", "content": "Your query here"}]
-) # Automatically tracked!
-```
-
-## Core Features
-
-### 1. Message Completion Tracking
-
-Track Claude messages with detailed telemetry:
-
-```python
-from genops.providers.anthropic import instrument_anthropic
-
-client = instrument_anthropic()
-
-# Track message with governance attributes
-response = client.messages_create(
- model="claude-3-5-sonnet-20241022",
- max_tokens=1000,
- messages=[
- {"role": "user", "content": "Analyze this business strategy document and provide insights"}
- ],
-
- # Governance attributes for cost attribution
- team="strategy-team",
- project="business-analysis",
- environment="production",
- customer_id="enterprise_customer_789",
-
- # Claude parameters
- temperature=0.7,
- top_p=0.9,
- top_k=40
-)
-```
-
-**Telemetry Captured:**
-- Request/response timing and latency
-- Token usage (input, output) by Claude model
-- Exact cost calculation using current Anthropic pricing
-- Success/error rates and error categorization
-- Governance attribute propagation
-
-### 2. Multi-Model Intelligence and Cost Optimization
-
-Intelligent model selection across Claude 3 variants:
-
-```python
-def smart_claude_completion(prompt: str, complexity: str = "balanced"):
- """Choose optimal Claude model based on task complexity."""
-
- model_configs = {
- "simple": {
- "model": "claude-3-haiku-20240307",
- "max_tokens": 200,
- "temperature": 0.3,
- "cost_per_1m_input": 0.25,
- "cost_per_1m_output": 1.25,
- "use_case": "Simple Q&A, basic text processing"
- },
- "balanced": {
- "model": "claude-3-5-haiku-20241022",
- "max_tokens": 500,
- "temperature": 0.5,
- "cost_per_1m_input": 1.00,
- "cost_per_1m_output": 5.00,
- "use_case": "General tasks, moderate complexity"
- },
- "advanced": {
- "model": "claude-3-5-sonnet-20241022",
- "max_tokens": 1000,
- "temperature": 0.7,
- "cost_per_1m_input": 3.00,
- "cost_per_1m_output": 15.00,
- "use_case": "Complex reasoning, analysis, coding"
- },
- "expert": {
- "model": "claude-3-opus-20240229",
- "max_tokens": 1500,
- "temperature": 0.8,
- "cost_per_1m_input": 15.00,
- "cost_per_1m_output": 75.00,
- "use_case": "Highest quality, creative tasks"
- }
- }
-
- config = model_configs.get(complexity, model_configs["balanced"])
-
- response = client.messages_create(
- model=config["model"],
- max_tokens=config["max_tokens"],
- temperature=config["temperature"],
- messages=[{"role": "user", "content": prompt}],
-
- # Cost attribution and optimization tracking
- team="optimization-team",
- project="smart-routing",
- complexity_level=complexity,
- estimated_cost_per_1m=config["cost_per_1m_input"],
- use_case=config["use_case"]
- )
-
- return response.content[0].text
-```
-
-### 3. Multi-Turn Conversations
-
-Handle conversational flows with comprehensive tracking:
-
-```python
-from genops import track
-
-def conversational_agent(conversation_history: list, customer_id: str):
- """Handle multi-turn conversations with detailed cost tracking."""
-
- with track("conversation_session",
- customer_id=customer_id,
- team="customer-support") as span:
-
- response = client.messages_create(
- model="claude-3-5-sonnet-20241022",
- max_tokens=600,
- messages=conversation_history,
-
- # Conversation-specific attributes
- team="customer-support",
- customer_id=customer_id,
- conversation_turn=len(conversation_history),
- conversation_type="support_chat"
- )
-
- # Track conversation metrics
- total_chars = sum(len(msg.get("content", "")) for msg in conversation_history)
- span.set_attribute("conversation_turns", len(conversation_history))
- span.set_attribute("total_conversation_chars", total_chars)
- span.set_attribute("customer_tier", "enterprise") # Dynamic customer data
-
- return response.content[0].text
-```
-
-### 4. Document Analysis and Processing
-
-Specialized patterns for document analysis:
-
-```python
-def analyze_legal_document(document_text: str, analysis_type: str):
- """Analyze legal documents with specialized prompts."""
-
- analysis_prompts = {
- "contract_review": "Review this contract for key terms, obligations, and potential risks:",
- "compliance_check": "Check this document for regulatory compliance issues:",
- "summary": "Provide a concise executive summary of this legal document:",
- "risk_assessment": "Identify and assess legal risks in this document:"
- }
-
- system_prompt = analysis_prompts.get(analysis_type, analysis_prompts["summary"])
-
- response = client.messages_create(
- model="claude-3-5-sonnet-20241022", # Best for complex analysis
- max_tokens=2000,
- messages=[
- {"role": "system", "content": system_prompt},
- {"role": "user", "content": document_text}
- ],
-
- # Legal analysis specific attributes
- team="legal-team",
- project="document-analysis",
- analysis_type=analysis_type,
- document_length=len(document_text),
- requires_expertise="legal"
- )
-
- return response.content[0].text
-```
-
-### 5. Code Generation and Review
-
-Track coding assistance with detailed metrics:
-
-```python
-def code_assistant(code_request: str, language: str = "python"):
- """Generate or review code with Claude."""
-
- system_prompts = {
- "python": "You are an expert Python developer. Write clean, efficient, well-documented code.",
- "javascript": "You are an expert JavaScript developer. Follow modern ES6+ standards.",
- "sql": "You are a database expert. Write efficient, secure SQL queries.",
- "review": "You are a senior code reviewer. Provide constructive feedback on code quality."
- }
-
- response = client.messages_create(
- model="claude-3-5-sonnet-20241022", # Best for coding
- max_tokens=1500,
- messages=[
- {"role": "system", "content": system_prompts.get(language, system_prompts["python"])},
- {"role": "user", "content": code_request}
- ],
-
- # Code-specific attributes
- team="engineering-team",
- project="ai-coding-assistant",
- programming_language=language,
- task_type="code_generation",
- complexity="intermediate"
- )
-
- return response.content[0].text
-```
-
-## Integration Patterns
-
-### Pattern 1: Decorator-Based Instrumentation
-
-```python
-from genops.decorators import track_anthropic
-
-@track_anthropic(
- team="research-team",
- project="academic-writing"
-)
-def generate_research_summary(papers: list, topic: str) -> str:
- combined_content = "\n\n".join(papers)
-
- response = client.messages_create(
- model="claude-3-5-sonnet-20241022",
- max_tokens=1200,
- messages=[
- {"role": "system", "content": "Synthesize research papers into a comprehensive summary"},
- {"role": "user", "content": f"Topic: {topic}\n\nPapers:\n{combined_content}"}
- ]
- )
- return response.content[0].text
-
-# Automatic telemetry on every call
-summary = generate_research_summary(paper_list, "AI Ethics")
-```
-
-### Pattern 2: Context Manager Pattern
-
-```python
-from genops import track
-
-def multi_step_content_creation(brief: str, customer_id: str):
- """Create content through multiple Claude interactions."""
-
- with track(f"content_creation_{customer_id}",
- customer_id=customer_id,
- team="content-marketing") as span:
-
- # Step 1: Outline creation
- outline = client.messages_create(
- model="claude-3-5-haiku-20241022", # Fast for outlining
- max_tokens=300,
- messages=[{"role": "user", "content": f"Create an outline for: {brief}"}]
- )
-
- # Step 2: Content expansion
- content = client.messages_create(
- model="claude-3-5-sonnet-20241022", # Better for detailed content
- max_tokens=1500,
- messages=[
- {"role": "user", "content": f"Write detailed content based on: {outline.content[0].text}"}
- ]
- )
-
- # Step 3: SEO optimization
- seo_content = client.messages_create(
- model="claude-3-5-haiku-20241022", # Cost-effective for optimization
- max_tokens=800,
- messages=[
- {"role": "user", "content": f"Optimize for SEO: {content.content[0].text}"}
- ]
- )
-
- span.set_attribute("content_creation_steps", 3)
- span.set_attribute("total_tokens_estimated", 2600)
-
- return seo_content.content[0].text
-```
-
-### Pattern 3: Policy Enforcement
-
-```python
-from genops.core.policy import enforce_policy
-
-@enforce_policy("content_safety")
-def process_user_content(user_input: str, user_id: str):
- """Process user content with safety checks."""
-
- return client.messages_create(
- model="claude-3-5-sonnet-20241022",
- max_tokens=500,
- messages=[
- {"role": "system", "content": "Review and moderate user content for safety"},
- {"role": "user", "content": user_input}
- ],
- user_id=user_id,
- team="content-moderation",
- safety_check=True
- )
-```
-
-## Configuration
-
-### Environment Variables
-
-```bash
-# Anthropic configuration
-export ANTHROPIC_API_KEY="your_anthropic_key"
-
-# OpenTelemetry configuration
-export OTEL_SERVICE_NAME="my-claude-app"
-export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
-
-# GenOps Anthropic configuration
-export GENOPS_ANTHROPIC_AUTO_INSTRUMENT=true
-export GENOPS_ANTHROPIC_COST_TRACKING=true
-export GENOPS_ANTHROPIC_MAX_RETRIES=3
-```
-
-### Programmatic Configuration
-
-```python
-from genops.providers.anthropic import configure_anthropic_adapter
-
-configure_anthropic_adapter({
- "auto_instrument": True,
- "cost_tracking": {
- "enabled": True,
- "include_system_messages": True,
- "track_conversation_context": True
- },
- "telemetry": {
- "service_name": "my-claude-service",
- "attributes": {
- "deployment.environment": "production",
- "service.version": "1.0.0"
- }
- },
- "model_defaults": {
- "temperature": 0.7,
- "max_tokens": 1000,
- "top_p": 0.9
- }
-})
-```
-
-## Advanced Features
-
-### Streaming Responses
-
-```python
-def streaming_claude_response(prompt: str):
- """Handle streaming responses from Claude."""
-
- stream = client.messages.create(
- model="claude-3-5-sonnet-20241022",
- max_tokens=1000,
- messages=[{"role": "user", "content": prompt}],
- stream=True,
-
- # Governance attributes
- team="streaming-team",
- project="real-time-chat",
- streaming=True
- )
-
- full_response = ""
- for event in stream:
- if event.type == "content_block_delta":
- content = event.delta.text
- full_response += content
- print(content, end="", flush=True)
-
- return full_response
-```
-
-### System Message Optimization
-
-```python
-def optimized_system_prompts(task_type: str, user_query: str):
- """Use optimized system prompts for different tasks."""
-
- system_prompts = {
- "analysis": """You are an expert analyst. Provide thorough, structured analysis with:
- 1. Executive summary
- 2. Key findings
- 3. Detailed analysis
- 4. Recommendations
- Be concise but comprehensive.""",
-
- "creative": """You are a creative writing expert. Focus on:
- - Engaging storytelling
- - Vivid imagery
- - Compelling characters
- - Original ideas
- Let creativity flow while maintaining quality.""",
-
- "technical": """You are a technical expert. Provide:
- - Accurate technical information
- - Step-by-step explanations
- - Best practices
- - Practical examples
- Be precise and actionable."""
- }
-
- system_prompt = system_prompts.get(task_type, "You are a helpful assistant.")
-
- response = client.messages_create(
- model="claude-3-5-sonnet-20241022",
- max_tokens=1200,
- messages=[
- {"role": "system", "content": system_prompt},
- {"role": "user", "content": user_query}
- ],
-
- # System prompt optimization tracking
- team="prompt-optimization",
- task_type=task_type,
- system_prompt_version="v2.1",
- optimization_strategy="task_specific"
- )
-
- return response.content[0].text
-```
-
-### Batch Processing Optimization
-
-```python
-def batch_process_documents(documents: list, operation: str, customer_id: str):
- """Process multiple documents efficiently with cost optimization."""
-
- # Choose model based on operation complexity
- model_map = {
- "summarize": "claude-3-5-haiku-20241022", # Fast and cost-effective
- "analyze": "claude-3-5-sonnet-20241022", # Balanced capability/cost
- "detailed_review": "claude-3-opus-20240229" # Highest quality
- }
-
- model = model_map.get(operation, "claude-3-5-haiku-20241022")
-
- results = []
-
- with track(f"batch_{operation}_{customer_id}",
- customer_id=customer_id,
- team="document-processing") as span:
-
- for i, document in enumerate(documents):
- response = client.messages_create(
- model=model,
- max_tokens=500 if operation == "summarize" else 1000,
- messages=[
- {"role": "system", "content": f"Please {operation} this document"},
- {"role": "user", "content": document}
- ],
-
- # Individual document tracking
- team="document-processing",
- customer_id=customer_id,
- document_index=i,
- batch_operation=operation,
- batch_size=len(documents)
- )
-
- results.append(response.content[0].text)
-
- # Batch-level metrics
- span.set_attribute("documents_processed", len(documents))
- span.set_attribute("operation_type", operation)
- span.set_attribute("model_used", model)
-
- return results
-```
-
-## Troubleshooting
-
-### Common Issues
-
-#### Issue: "Anthropic API key not found"
-```python
-# Solution: Verify API key setup
-import os
-print("API key set:", bool(os.getenv("ANTHROPIC_API_KEY")))
-
-# Check key format
-key = os.getenv("ANTHROPIC_API_KEY")
-if key:
- print("Correct format:", key.startswith("sk-ant-"))
-
-# Or set programmatically
-from genops.providers.anthropic import instrument_anthropic
-client = instrument_anthropic(api_key="your_key_here")
-```
-
-#### Issue: Cost tracking not working
-```python
-# Check if cost calculation is enabled
-from genops.providers.anthropic import validate_setup, print_validation_result
-
-result = validate_setup()
-print_validation_result(result)
-
-# Enable debug logging
-import logging
-logging.getLogger("genops.providers.anthropic").setLevel(logging.DEBUG)
-```
-
-#### Issue: Model not available errors
-```python
-# Use current Claude model names
-models = {
- "fastest": "claude-3-haiku-20240307",
- "balanced": "claude-3-5-haiku-20241022",
- "advanced": "claude-3-5-sonnet-20241022",
- "expert": "claude-3-opus-20240229"
-}
-
-# Always check Anthropic docs for latest model names
-response = client.messages_create(
- model=models["balanced"], # Use mapped model names
- max_tokens=500,
- messages=[{"role": "user", "content": "Hello Claude"}]
-)
-```
-
-### Debug Mode
-
-Enable comprehensive debug logging:
-
-```python
-import logging
-
-# Enable GenOps debug logging
-logging.getLogger("genops").setLevel(logging.DEBUG)
-
-# Enable Anthropic adapter debug logging
-logging.getLogger("genops.providers.anthropic").setLevel(logging.DEBUG)
-
-# Enable OpenTelemetry debug logging
-logging.getLogger("opentelemetry").setLevel(logging.DEBUG)
-```
-
-### Validation Utilities
-
-Verify your setup is working correctly:
-
-```python
-from genops.providers.anthropic import validate_setup, print_validation_result
-
-# Run comprehensive setup validation
-validation_result = validate_setup()
-print_validation_result(validation_result)
-
-if validation_result.is_valid:
- print("โ
GenOps Anthropic setup is valid!")
-else:
- print("โ Setup issues found:")
- for issue in validation_result.issues:
- if issue.level == "error":
- print(f" - ERROR: {issue.message}")
- if issue.fix_suggestion:
- print(f" Fix: {issue.fix_suggestion}")
-```
-
-## Performance Considerations
-
-### Best Practices
-
-1. **Choose appropriate Claude models** based on task complexity and cost sensitivity
-2. **Use system messages effectively** to provide context and reduce prompt repetition
-3. **Implement streaming** for long responses to improve user experience
-4. **Batch similar operations** to reduce API overhead
-
-### Performance Tuning
-
-```python
-from genops.providers.anthropic import configure_performance
-
-configure_performance({
- "connection_pool_size": 8,
- "request_timeout": 60, # Claude can take longer than OpenAI
- "max_retries": 3,
- "retry_delay": 1.0,
- "stream_timeout": 120,
- "async_export": True
-})
-```
-
-## Cost Management
-
-### Claude Model Cost Comparison
-
-| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
-|-------|----------------------|------------------------|----------|
-| Claude 3 Haiku | $0.25 | $1.25 | Simple tasks, high volume |
-| Claude 3.5 Haiku | $1.00 | $5.00 | General purpose, speed |
-| Claude 3.5 Sonnet | $3.00 | $15.00 | Complex reasoning, analysis |
-| Claude 3 Opus | $15.00 | $75.00 | Highest quality, creative tasks |
-
-### Cost Optimization Strategies
-
-```python
-def cost_aware_completion(prompt: str, max_cost: float = 0.50):
- """Choose Claude model based on cost constraints."""
-
- estimated_tokens = len(prompt.split()) * 1.3
- output_tokens = 500 # Estimated
-
- models = [
- ("claude-3-haiku-20240307", 0.25/1000000, 1.25/1000000),
- ("claude-3-5-haiku-20241022", 1.00/1000000, 5.00/1000000),
- ("claude-3-5-sonnet-20241022", 3.00/1000000, 15.00/1000000),
- ("claude-3-opus-20240229", 15.00/1000000, 75.00/1000000)
- ]
-
- for model, input_cost, output_cost in models:
- estimated_cost = (estimated_tokens * input_cost) + (output_tokens * output_cost)
-
- if estimated_cost <= max_cost:
- response = client.messages_create(
- model=model,
- max_tokens=output_tokens,
- messages=[{"role": "user", "content": prompt}],
-
- # Cost tracking
- team="cost-optimization",
- estimated_cost=estimated_cost,
- max_budget=max_cost,
- model_selection="cost_optimized"
- )
- return response.content[0].text
-
- raise ValueError(f"No Claude model available within budget of ${max_cost}")
-```
-
-## Next Steps
-
-- Explore the [complete examples](../examples/anthropic/) for advanced patterns
-- Check out [governance scenarios](../examples/governance_scenarios/) for policy enforcement
-- Review [observability integration](../observability/) for dashboard setup
-- See [API reference](../api/anthropic.md) for detailed method documentation
-
-## Support
-
-- **Issues:** [GitHub Issues](https://github.com/genops-ai/genops-ai/issues)
-- **Discussions:** [GitHub Discussions](https://github.com/genops-ai/genops-ai/discussions)
-- **Documentation:** [Full Documentation](https://docs.genops.ai)
-- **Anthropic Docs:** [Claude API Documentation](https://docs.anthropic.com/claude/reference/)
\ No newline at end of file
diff --git a/docs/integrations/anyscale.md b/docs/integrations/anyscale.md
deleted file mode 100644
index 53603de..0000000
--- a/docs/integrations/anyscale.md
+++ /dev/null
@@ -1,1880 +0,0 @@
-# Anyscale Endpoints Integration Guide
-
-Comprehensive guide for integrating Anyscale Endpoints with GenOps AI governance and telemetry.
-
-## Table of Contents
-
-- [Overview](#overview)
-- [Installation & Setup](#installation--setup)
-- [Integration Patterns](#integration-patterns)
-- [Multi-Model Support](#multi-model-support)
-- [Cost Intelligence](#cost-intelligence)
-- [Enterprise Governance](#enterprise-governance)
-- [Production Deployment](#production-deployment)
-- [Performance Optimization](#performance-optimization)
-- [Observability Integration](#observability-integration)
-- [Advanced Use Cases](#advanced-use-cases)
-- [Troubleshooting](#troubleshooting)
-- [API Reference](#api-reference)
-
-## Overview
-
-GenOps provides comprehensive Anyscale Endpoints integration with:
-
-- **Multi-model support**: Llama-2, Llama-3, Mistral, CodeLlama, and embedding models
-- **Real-time cost tracking**: Token-level precision with client-side cost calculation
-- **Enterprise governance**: Team, project, and customer-level cost attribution
-- **Zero-code instrumentation**: Works with existing OpenAI SDK applications unchanged
-- **OpenTelemetry native**: Exports to any OTLP-compatible observability platform
-- **Cost optimization**: Model recommendations and alternative suggestions
-
-### Architecture Overview
-
-```
-Application Code
- โ
-GenOps Anyscale Adapter
- โ
-Anyscale Endpoints API โ OpenAI-compatible interface
- โ
-OpenTelemetry Pipeline โ Rich governance telemetry
- โ
-Your Observability Platform โ Datadog, Grafana, Honeycomb, etc.
-```
-
-### Why Anyscale + GenOps?
-
-**Anyscale Endpoints** provides managed LLM inference with:
-- Production-scale infrastructure
-- OpenAI-compatible API for easy migration
-- Competitive pricing (often 50%+ cheaper than alternatives)
-- High availability and reliability
-
-**GenOps adds governance layer**:
-- Per-customer cost attribution for billing
-- Team and project-level budget tracking
-- Real-time cost optimization recommendations
-- Compliance and audit trails via OpenTelemetry
-
-## Installation & Setup
-
-### Quick Installation
-
-```bash
-# Core installation
-pip install genops-ai
-
-# Verify installation
-python -c "from genops.providers.anyscale import instrument_anyscale; print('โ
GenOps Anyscale provider installed')"
-```
-
-### Anyscale API Key Setup
-
-GenOps requires an Anyscale API key to access Endpoints:
-
-```bash
-# Get your API key from: https://console.anyscale.com/credentials
-
-# Set environment variable
-export ANYSCALE_API_KEY='your-api-key-here'
-
-# Verify it's set
-echo $ANYSCALE_API_KEY
-```
-
-### Environment Configuration
-
-```bash
-# Required
-export ANYSCALE_API_KEY="your-api-key-here"
-export ANYSCALE_BASE_URL="https://api.endpoints.anyscale.com/v1" # Optional, this is the default
-
-# OpenTelemetry configuration
-export OTEL_SERVICE_NAME="anyscale-ai-application"
-export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
-
-# GenOps configuration
-export GENOPS_ENVIRONMENT="production"
-export GENOPS_PROJECT="anyscale-ai-project"
-export GENOPS_TEAM="ml-engineering"
-
-# Performance tuning (optional)
-export GENOPS_SAMPLING_RATE="1.0" # Full sampling (0.0-1.0)
-export GENOPS_ASYNC_EXPORT="true" # Non-blocking telemetry
-export GENOPS_DEBUG="false" # Debug logging
-```
-
-### Setup Validation
-
-```python
-from genops.providers.anyscale import validate_setup, print_validation_result
-
-result = validate_setup()
-print_validation_result(result)
-
-if result.success:
- print("โ
Ready to start using GenOps with Anyscale!")
-else:
- print("โ Please resolve the issues above before continuing")
-```
-
-**Expected validation output:**
-
-```
-โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
-โ Anyscale Setup Validation โ
-โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
-โ โ
Dependencies: All required packages installed โ
-โ โ
Configuration: ANYSCALE_API_KEY set โ
-โ โ
Connectivity: Anyscale API reachable โ
-โ โ
Models: 12+ models available โ
-โ โ
Pricing: Complete pricing database loaded โ
-โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
-โ Status: PASSED (Score: 100/100) โ
-โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
-```
-
-## Integration Patterns
-
-### 1. Zero-Code Auto-Instrumentation
-
-**Automatically instrument existing OpenAI SDK applications with zero code changes:**
-
-```python
-import os
-from genops.providers.anyscale import auto_instrument
-
-# Enable automatic instrumentation with default governance attributes
-auto_instrument(
- team="ml-research",
- project="chatbot",
- environment="production"
-)
-
-# Your existing OpenAI SDK code now automatically tracked!
-import openai
-
-client = openai.OpenAI(
- api_key=os.getenv("ANYSCALE_API_KEY"),
- base_url="https://api.endpoints.anyscale.com/v1"
-)
-
-response = client.chat.completions.create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[
- {"role": "user", "content": "What is the capital of France?"}
- ],
- # Governance attributes automatically added
- customer_id="acme-corp" # Per-request governance override
-)
-
-# Cost, tokens, and governance automatically tracked and exported via OpenTelemetry
-```
-
-**Benefits:**
-- Zero refactoring required
-- Existing applications work unchanged
-- Governance attributes propagate automatically
-- Full OpenTelemetry tracing with cost attribution
-
-### 2. Manual Adapter Integration
-
-**Full control over instrumentation with governance attributes:**
-
-```python
-from genops.providers.anyscale import instrument_anyscale
-
-# Create adapter with default governance attributes
-adapter = instrument_anyscale(
- team="ml-engineering",
- project="customer-support-bot",
- environment="production",
- cost_center="Engineering"
-)
-
-# Make a completion request with per-request governance
-response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[
- {"role": "system", "content": "You are a helpful assistant."},
- {"role": "user", "content": "Analyze this customer feedback..."}
- ],
- temperature=0.7,
- max_tokens=500,
-
- # Per-request governance attributes (override defaults)
- customer_id="customer-789",
- feature="feedback-analysis"
-)
-
-# Response includes usage and governance metadata
-print(f"Response: {response['choices'][0]['message']['content']}")
-print(f"Tokens used: {response['usage']['total_tokens']}")
-
-# Calculate cost
-from genops.providers.anyscale import calculate_completion_cost
-cost = calculate_completion_cost(
- model="meta-llama/Llama-2-70b-chat-hf",
- input_tokens=response['usage']['prompt_tokens'],
- output_tokens=response['usage']['completion_tokens']
-)
-print(f"๐ฐ Cost: ${cost:.6f}")
-```
-
-### 3. Context Manager Pattern
-
-**Multi-operation workflows with unified governance:**
-
-```python
-from genops.providers.anyscale import instrument_anyscale
-
-adapter = instrument_anyscale(
- team="data-science",
- project="analytics-pipeline"
-)
-
-# Context manager for workflow-level governance
-with adapter.governance_context(
- customer_id="enterprise-client",
- feature="document-processing",
- workflow_id="doc-proc-12345"
-) as context:
-
- # Step 1: Classify document
- classification = adapter.completion_create(
- model="meta-llama/Llama-2-7b-chat-hf", # Cheaper model for classification
- messages=[{"role": "user", "content": f"Classify: {document_text[:100]}"}],
- max_tokens=50
- )
-
- # Step 2: Extract entities (if needed)
- if needs_extraction(classification):
- entities = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf", # More powerful model
- messages=[{"role": "user", "content": f"Extract entities: {document_text}"}],
- max_tokens=300
- )
-
- # Step 3: Summarize
- summary = adapter.completion_create(
- model="mistralai/Mistral-7B-Instruct-v0.1",
- messages=[{"role": "user", "content": f"Summarize: {document_text}"}],
- max_tokens=200
- )
-
-# All operations automatically attributed to customer, feature, and workflow
-# Total cost aggregated and exported to observability platform
-```
-
-## Multi-Model Support
-
-### Supported Models
-
-GenOps Anyscale integration supports 12+ models across multiple categories:
-
-#### Chat Completion Models
-
-**Llama-2 Series:**
-```python
-models = [
- "meta-llama/Llama-2-70b-chat-hf", # $1.00/M tokens
- "meta-llama/Llama-2-13b-chat-hf", # $0.25/M tokens
- "meta-llama/Llama-2-7b-chat-hf", # $0.15/M tokens
-]
-```
-
-**Llama-3 Series:**
-```python
-models = [
- "meta-llama/Meta-Llama-3-70B-Instruct", # $1.00/M tokens
- "meta-llama/Meta-Llama-3-8B-Instruct", # $0.15/M tokens
-]
-```
-
-**Mistral Series:**
-```python
-models = [
- "mistralai/Mixtral-8x7B-Instruct-v0.1", # $0.50/M tokens
- "mistralai/Mistral-7B-Instruct-v0.1", # $0.15/M tokens
- "mistralai/Mistral-7B-Instruct-v0.2", # $0.15/M tokens
-]
-```
-
-**CodeLlama Series:**
-```python
-models = [
- "codellama/CodeLlama-70b-Instruct-hf", # $1.00/M tokens
- "codellama/CodeLlama-34b-Instruct-hf", # $0.80/M tokens
-]
-```
-
-#### Embedding Models
-
-```python
-embedding_models = [
- "thenlper/gte-large", # $0.05/M tokens
- "BAAI/bge-large-en-v1.5", # $0.05/M tokens
-]
-```
-
-### Model Comparison and Selection
-
-```python
-from genops.providers.anyscale import AnyscalePricing
-
-pricing = AnyscalePricing()
-
-# Get pricing for specific model
-model_pricing = pricing.get_model_pricing("meta-llama/Llama-2-70b-chat-hf")
-print(f"Model: {model_pricing.model_name}")
-print(f"Input cost: ${model_pricing.input_cost_per_million}/M tokens")
-print(f"Output cost: ${model_pricing.output_cost_per_million}/M tokens")
-print(f"Context window: {model_pricing.context_window} tokens")
-
-# Get cost-effective alternatives
-alternatives = pricing.get_model_alternatives("meta-llama/Llama-2-70b-chat-hf")
-print("\n๐ก Cost-effective alternatives:")
-for model, cost_ratio, description in alternatives:
- print(f" {model}: {description}")
-
-# Output:
-# meta-llama/Llama-2-13b-chat-hf: 75% cheaper, good for most tasks
-# meta-llama/Llama-2-7b-chat-hf: 85% cheaper, best for simple tasks
-# mistralai/Mistral-7B-Instruct-v0.1: 85% cheaper, alternative architecture
-```
-
-### Multi-Model Workflows
-
-```python
-# Route by task complexity
-def select_model(task_complexity: str) -> str:
- """Cost-optimized model selection."""
- if task_complexity == "simple":
- return "meta-llama/Llama-2-7b-chat-hf" # $0.15/M
- elif task_complexity == "medium":
- return "meta-llama/Llama-2-13b-chat-hf" # $0.25/M
- elif task_complexity == "complex":
- return "meta-llama/Llama-2-70b-chat-hf" # $1.00/M
- else:
- return "mistralai/Mistral-7B-Instruct-v0.1" # Default
-
-# Example: Adaptive model selection
-adapter = instrument_anyscale(team="optimization-team")
-
-for query in user_queries:
- complexity = estimate_complexity(query)
- model = select_model(complexity)
-
- response = adapter.completion_create(
- model=model,
- messages=[{"role": "user", "content": query}],
- customer_id=query.customer_id
- )
-
- # Cost automatically tracked per customer and model
-```
-
-## Cost Intelligence
-
-### Real-Time Cost Tracking
-
-```python
-from genops.providers.anyscale import instrument_anyscale, calculate_completion_cost
-
-adapter = instrument_anyscale(
- team="finance-ai",
- project="cost-monitoring"
-)
-
-# Make request
-response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[{"role": "user", "content": "Analyze quarterly revenue..."}],
- max_tokens=500
-)
-
-# Calculate cost
-cost = calculate_completion_cost(
- model="meta-llama/Llama-2-70b-chat-hf",
- input_tokens=response['usage']['prompt_tokens'],
- output_tokens=response['usage']['completion_tokens']
-)
-
-print(f"๐ Token Usage:")
-print(f" Input: {response['usage']['prompt_tokens']} tokens")
-print(f" Output: {response['usage']['completion_tokens']} tokens")
-print(f" Total: {response['usage']['total_tokens']} tokens")
-print(f"๐ฐ Cost: ${cost:.6f}")
-```
-
-### Cost Attribution
-
-**Team-Level Attribution:**
-```python
-# All costs automatically attributed to team
-adapter = instrument_anyscale(team="data-science-team")
-
-response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[{"role": "user", "content": "..."}]
-)
-
-# OpenTelemetry span includes: genops.team="data-science-team"
-```
-
-**Project-Level Attribution:**
-```python
-adapter = instrument_anyscale(
- team="ml-engineering",
- project="customer-support-bot"
-)
-
-# Costs attributed to project
-response = adapter.completion_create(...)
-
-# OpenTelemetry span includes:
-# genops.team="ml-engineering"
-# genops.project="customer-support-bot"
-```
-
-**Customer-Level Attribution:**
-```python
-adapter = instrument_anyscale(team="saas-platform")
-
-# Per-customer cost tracking for billing
-response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[...],
- customer_id="enterprise-client-123"
-)
-
-# Query your observability platform to aggregate costs per customer:
-# SUM(genops.anyscale.cost.total) WHERE genops.customer_id="enterprise-client-123"
-```
-
-### Cost Optimization Strategies
-
-**1. Model Selection by Task:**
-```python
-# Use cheaper models for simple tasks
-simple_tasks = ["classification", "routing", "validation"]
-complex_tasks = ["analysis", "generation", "reasoning"]
-
-model = (
- "meta-llama/Llama-2-7b-chat-hf" if task in simple_tasks
- else "meta-llama/Llama-2-70b-chat-hf"
-)
-
-# Potential savings: 85% for simple tasks
-```
-
-**2. Max Tokens Optimization:**
-```python
-# Set appropriate max_tokens to avoid waste
-response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[{"role": "user", "content": "Yes or no: ..."}],
- max_tokens=10 # Don't pay for unused tokens
-)
-```
-
-**3. Batch Processing:**
-```python
-# Process multiple items in single request
-batch_prompt = "Classify each of the following:\n" + "\n".join(items)
-
-response = adapter.completion_create(
- model="meta-llama/Llama-2-13b-chat-hf",
- messages=[{"role": "user", "content": batch_prompt}],
- max_tokens=len(items) * 50
-)
-
-# Cost per item reduced by sharing prompt overhead
-```
-
-**4. Caching Strategy:**
-```python
-import hashlib
-from functools import lru_cache
-
-@lru_cache(maxsize=1000)
-def cached_completion(prompt_hash: str, model: str):
- """Cache identical prompts to avoid redundant API calls."""
- response = adapter.completion_create(
- model=model,
- messages=[{"role": "user", "content": prompt_hash}]
- )
- return response
-
-# Use cache
-prompt = "What is the capital of France?"
-prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
-result = cached_completion(prompt_hash, "meta-llama/Llama-2-7b-chat-hf")
-```
-
-## Enterprise Governance
-
-### Multi-Tenant Cost Attribution
-
-```python
-from genops.providers.anyscale import instrument_anyscale
-
-# SaaS application with multiple customers
-adapter = instrument_anyscale(
- team="saas-platform",
- project="ai-features",
- environment="production"
-)
-
-def process_customer_request(customer_id: str, request_data: dict):
- """Process customer request with cost attribution."""
-
- response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=request_data['messages'],
-
- # Governance attributes for billing
- customer_id=customer_id,
- feature=request_data.get('feature', 'chat'),
- cost_center="Product-AI"
- )
-
- # Cost automatically attributed to customer
- # Query observability platform for monthly billing:
- # SUM(cost) WHERE customer_id="..." AND month="2026-01"
-
- return response
-
-# Process requests from different customers
-process_customer_request("customer-A", {...})
-process_customer_request("customer-B", {...})
-process_customer_request("customer-C", {...})
-
-# Each customer's costs tracked separately in OpenTelemetry
-```
-
-### Budget Enforcement
-
-```python
-# Track spending against budget
-from datetime import datetime
-import os
-
-class BudgetEnforcer:
- def __init__(self, monthly_budget_usd: float):
- self.monthly_budget = monthly_budget_usd
- self.current_month = datetime.now().strftime("%Y-%m")
-
- def check_budget(self, customer_id: str) -> bool:
- """Check if customer has budget remaining."""
- # Query your observability platform for current month spend
- current_spend = self.get_customer_spend(customer_id, self.current_month)
- return current_spend < self.monthly_budget
-
- def get_customer_spend(self, customer_id: str, month: str) -> float:
- """Query observability platform for customer spend."""
- # Example: Query Datadog, Grafana, or Honeycomb
- # This is pseudo-code - implement based on your observability platform
- pass
-
-# Usage
-adapter = instrument_anyscale(team="saas-platform")
-budget_enforcer = BudgetEnforcer(monthly_budget_usd=100.0)
-
-def process_with_budget_check(customer_id: str, messages: list):
- """Process request with budget enforcement."""
-
- if not budget_enforcer.check_budget(customer_id):
- raise BudgetExceededError(
- f"Customer {customer_id} has exceeded monthly budget"
- )
-
- return adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=messages,
- customer_id=customer_id
- )
-```
-
-### Compliance and Audit Trails
-
-```python
-# All operations automatically generate audit trails via OpenTelemetry
-
-adapter = instrument_anyscale(
- team="healthcare-ai",
- project="patient-analysis",
- environment="production",
- cost_center="Healthcare-IT"
-)
-
-# HIPAA-compliant request tracking
-response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[
- {"role": "user", "content": "Analyze patient symptoms..."}
- ],
-
- # Audit trail attributes
- customer_id="hospital-123",
- feature="symptom-analysis",
- request_id="req-abc-123",
- user_id="doctor-456"
-)
-
-# OpenTelemetry span includes complete audit trail:
-# - timestamp
-# - team, project, environment
-# - customer_id, user_id, request_id
-# - model, tokens, cost
-# - latency, success/failure
-# - All governance attributes
-
-# Query your observability platform for compliance reports:
-# - All operations by customer
-# - All operations by user
-# - Cost attribution by cost center
-# - Performance SLAs by environment
-```
-
-### Access Control Integration
-
-```python
-# Integrate with existing access control systems
-
-from typing import Set
-
-class AccessControlAdapter:
- def __init__(self, adapter):
- self.adapter = adapter
- self.permissions = {} # Load from your access control system
-
- def check_model_access(self, user_id: str, model: str) -> bool:
- """Check if user has permission to use model."""
- allowed_models = self.permissions.get(user_id, set())
- return model in allowed_models
-
- def completion_create(self, user_id: str, model: str, **kwargs):
- """Completion with access control check."""
-
- if not self.check_model_access(user_id, model):
- raise PermissionError(
- f"User {user_id} not authorized to use {model}"
- )
-
- return self.adapter.completion_create(
- model=model,
- user_id=user_id, # Include in governance attributes
- **kwargs
- )
-
-# Usage
-adapter = instrument_anyscale(team="enterprise")
-access_controlled_adapter = AccessControlAdapter(adapter)
-
-try:
- response = access_controlled_adapter.completion_create(
- user_id="employee-789",
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[...]
- )
-except PermissionError as e:
- print(f"Access denied: {e}")
-```
-
-## Production Deployment
-
-### High-Availability Configuration
-
-```python
-from genops.providers.anyscale import instrument_anyscale
-import time
-from tenacity import retry, stop_after_attempt, wait_exponential
-
-# Production adapter with retry logic
-@retry(
- stop=stop_after_attempt(3),
- wait=wait_exponential(multiplier=1, min=1, max=10)
-)
-def resilient_completion(adapter, **kwargs):
- """Completion with automatic retry on transient failures."""
- return adapter.completion_create(**kwargs)
-
-# Initialize adapter
-adapter = instrument_anyscale(
- team="production-team",
- project="customer-facing-app",
- environment="production"
-)
-
-# Use in production
-try:
- response = resilient_completion(
- adapter,
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[{"role": "user", "content": "..."}],
- customer_id="customer-123"
- )
-except Exception as e:
- # Log error and fallback
- print(f"Failed after 3 retries: {e}")
- # Implement fallback logic
-```
-
-### Load Balancing and Rate Limiting
-
-```python
-import asyncio
-from asyncio import Semaphore
-
-class RateLimitedAdapter:
- def __init__(self, adapter, max_concurrent: int = 10):
- self.adapter = adapter
- self.semaphore = Semaphore(max_concurrent)
-
- async def completion_create(self, **kwargs):
- """Rate-limited completion."""
- async with self.semaphore:
- # Implement your async completion here
- # This ensures max 10 concurrent requests
- return self.adapter.completion_create(**kwargs)
-
-# Usage
-adapter = instrument_anyscale(team="high-volume-app")
-rate_limited = RateLimitedAdapter(adapter, max_concurrent=10)
-
-# Process high-volume requests
-async def process_batch(requests):
- tasks = [
- rate_limited.completion_create(**req)
- for req in requests
- ]
- return await asyncio.gather(*tasks)
-```
-
-### Monitoring and Alerting
-
-```python
-# Configure OpenTelemetry metrics for alerting
-
-from opentelemetry import metrics
-from opentelemetry.sdk.metrics import MeterProvider
-from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
-from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
-
-# Setup metrics pipeline
-metric_reader = PeriodicExportingMetricReader(
- OTLPMetricExporter(endpoint="http://localhost:4317")
-)
-provider = MeterProvider(metric_readers=[metric_reader])
-metrics.set_meter_provider(provider)
-
-meter = metrics.get_meter("anyscale.monitoring")
-
-# Create custom metrics
-request_counter = meter.create_counter(
- "anyscale.requests.total",
- description="Total Anyscale API requests"
-)
-
-error_counter = meter.create_counter(
- "anyscale.errors.total",
- description="Total Anyscale API errors"
-)
-
-cost_gauge = meter.create_observable_gauge(
- "anyscale.cost.current",
- description="Current Anyscale cost"
-)
-
-# Use in application
-adapter = instrument_anyscale(team="monitored-app")
-
-def monitored_completion(**kwargs):
- """Completion with custom metrics."""
- request_counter.add(1, {"model": kwargs.get("model")})
-
- try:
- response = adapter.completion_create(**kwargs)
-
- # Record cost metric
- cost = calculate_completion_cost(
- model=kwargs.get("model"),
- input_tokens=response['usage']['prompt_tokens'],
- output_tokens=response['usage']['completion_tokens']
- )
- # cost_gauge.set(cost) # Update gauge
-
- return response
-
- except Exception as e:
- error_counter.add(1, {"error_type": type(e).__name__})
- raise
-
-# Configure alerts in your observability platform:
-# - Alert when anyscale.errors.total > 10 in 5 minutes
-# - Alert when anyscale.cost.current > budget_threshold
-# - Alert when p99 latency > 5 seconds
-```
-
-### Disaster Recovery
-
-```python
-# Implement fallback to alternative providers
-
-class MultiProviderAdapter:
- def __init__(self):
- self.anyscale_adapter = instrument_anyscale(team="multi-provider")
- self.fallback_available = self._check_fallback()
-
- def _check_fallback(self) -> bool:
- """Check if fallback provider is available."""
- try:
- # Check OpenAI, Replicate, or other fallback
- return True
- except:
- return False
-
- def completion_create(self, **kwargs):
- """Completion with automatic fallback."""
- try:
- return self.anyscale_adapter.completion_create(**kwargs)
- except Exception as e:
- print(f"Anyscale failed: {e}")
-
- if self.fallback_available:
- print("Falling back to alternative provider...")
- # Implement fallback to OpenAI or others
- return self._fallback_completion(**kwargs)
- else:
- raise
-
- def _fallback_completion(self, **kwargs):
- """Fallback completion implementation."""
- # Implement OpenAI or other provider fallback
- pass
-
-# Usage
-adapter = MultiProviderAdapter()
-response = adapter.completion_create(...) # Automatic fallback on failure
-```
-
-## Performance Optimization
-
-### Telemetry Sampling
-
-```python
-# Reduce overhead in high-volume scenarios
-
-adapter = instrument_anyscale(
- team="high-volume-app",
- project="production-api",
-
- # Sample 10% of requests for telemetry
- sampling_rate=0.1
-)
-
-# 90% of requests skip detailed telemetry, reducing overhead
-# 10% of requests include full governance tracking
-```
-
-### Async Operations
-
-```python
-import asyncio
-from typing import List
-
-async def async_batch_processing(prompts: List[str]):
- """Process multiple prompts concurrently."""
-
- adapter = instrument_anyscale(team="async-team")
-
- async def process_single(prompt: str):
- # Implement async completion
- # Note: Current adapter is synchronous, but shows pattern
- return adapter.completion_create(
- model="meta-llama/Llama-2-7b-chat-hf",
- messages=[{"role": "user", "content": prompt}]
- )
-
- # Process all prompts concurrently
- tasks = [process_single(prompt) for prompt in prompts]
- results = await asyncio.gather(*tasks)
-
- return results
-
-# Usage
-prompts = ["Prompt 1", "Prompt 2", "Prompt 3", ...]
-results = asyncio.run(async_batch_processing(prompts))
-```
-
-### Caching and Memoization
-
-```python
-from functools import lru_cache
-import hashlib
-import json
-
-class CachedAnyscaleAdapter:
- def __init__(self, adapter, cache_size: int = 1000):
- self.adapter = adapter
- self.cache_size = cache_size
-
- def _hash_request(self, model: str, messages: list, **kwargs) -> str:
- """Create hash of request parameters."""
- request_dict = {
- "model": model,
- "messages": messages,
- **kwargs
- }
- request_str = json.dumps(request_dict, sort_keys=True)
- return hashlib.md5(request_str.encode()).hexdigest()
-
- @lru_cache(maxsize=1000)
- def _cached_completion(self, request_hash: str, model: str, messages_str: str, **kwargs):
- """Cached completion to avoid redundant API calls."""
- messages = json.loads(messages_str)
- return self.adapter.completion_create(
- model=model,
- messages=messages,
- **kwargs
- )
-
- def completion_create(self, model: str, messages: list, **kwargs):
- """Completion with caching."""
- request_hash = self._hash_request(model, messages, **kwargs)
- messages_str = json.dumps(messages)
-
- return self._cached_completion(
- request_hash,
- model,
- messages_str,
- **kwargs
- )
-
-# Usage
-adapter = instrument_anyscale(team="cached-app")
-cached_adapter = CachedAnyscaleAdapter(adapter, cache_size=1000)
-
-# Identical requests return cached results
-response1 = cached_adapter.completion_create(
- model="meta-llama/Llama-2-7b-chat-hf",
- messages=[{"role": "user", "content": "What is 2+2?"}]
-)
-
-response2 = cached_adapter.completion_create(
- model="meta-llama/Llama-2-7b-chat-hf",
- messages=[{"role": "user", "content": "What is 2+2?"}]
-)
-
-# response2 returned from cache, no API call made
-```
-
-### Connection Pooling
-
-```python
-# Reuse HTTP connections for better performance
-
-import requests
-from requests.adapters import HTTPAdapter
-from requests.packages.urllib3.util.retry import Retry
-
-def create_resilient_session():
- """Create HTTP session with connection pooling and retries."""
- session = requests.Session()
-
- retry_strategy = Retry(
- total=3,
- backoff_factor=1,
- status_forcelist=[429, 500, 502, 503, 504]
- )
-
- adapter = HTTPAdapter(
- max_retries=retry_strategy,
- pool_connections=10,
- pool_maxsize=20
- )
-
- session.mount("https://", adapter)
- session.mount("http://", adapter)
-
- return session
-
-# Use custom session in production
-# (Note: Adapter would need to be modified to accept custom session)
-```
-
-## Observability Integration
-
-### Datadog Integration
-
-```python
-# Export Anyscale telemetry to Datadog
-import os
-
-from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
-from opentelemetry.sdk.trace import TracerProvider
-from opentelemetry.sdk.trace.export import BatchSpanProcessor
-
-# Configure Datadog OTLP endpoint
-provider = TracerProvider()
-processor = BatchSpanProcessor(
- OTLPSpanExporter(
- endpoint="http://localhost:4317", # Datadog Agent OTLP endpoint
- headers={
- "DD-API-KEY": os.getenv("DD_API_KEY")
- }
- )
-)
-provider.add_span_processor(processor)
-
-# Use adapter - telemetry automatically exported to Datadog
-adapter = instrument_anyscale(
- team="datadog-integration",
- project="production-app"
-)
-
-response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[...]
-)
-
-# Query in Datadog:
-# - Trace search: service:anyscale-ai-application
-# - Metrics: genops.anyscale.cost.total
-# - Logs: genops.team:datadog-integration
-```
-
-### Grafana / Prometheus Integration
-
-```python
-# Export metrics to Prometheus
-
-from prometheus_client import Counter, Histogram, Gauge
-from prometheus_client import start_http_server
-
-# Define metrics
-anyscale_requests = Counter(
- 'anyscale_requests_total',
- 'Total Anyscale API requests',
- ['model', 'team', 'customer_id']
-)
-
-anyscale_cost = Gauge(
- 'anyscale_cost_usd',
- 'Anyscale operation cost in USD',
- ['model', 'customer_id']
-)
-
-anyscale_latency = Histogram(
- 'anyscale_latency_seconds',
- 'Anyscale request latency',
- ['model']
-)
-
-# Start Prometheus metrics server
-start_http_server(8000)
-
-# Instrument adapter
-adapter = instrument_anyscale(team="prometheus-integration")
-
-def monitored_completion(**kwargs):
- """Completion with Prometheus metrics."""
- model = kwargs.get("model")
- customer_id = kwargs.get("customer_id", "unknown")
-
- anyscale_requests.labels(
- model=model,
- team="prometheus-integration",
- customer_id=customer_id
- ).inc()
-
- import time
- start_time = time.time()
-
- response = adapter.completion_create(**kwargs)
-
- latency = time.time() - start_time
- anyscale_latency.labels(model=model).observe(latency)
-
- cost = calculate_completion_cost(
- model=model,
- input_tokens=response['usage']['prompt_tokens'],
- output_tokens=response['usage']['completion_tokens']
- )
- anyscale_cost.labels(model=model, customer_id=customer_id).set(cost)
-
- return response
-
-# Metrics available at: http://localhost:8000/metrics
-# Import into Grafana for visualization
-```
-
-### Honeycomb Integration
-
-```python
-# Export to Honeycomb for observability
-import os
-
-from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
-from opentelemetry.sdk.trace import TracerProvider
-from opentelemetry.sdk.trace.export import BatchSpanProcessor
-
-# Configure Honeycomb
-provider = TracerProvider()
-processor = BatchSpanProcessor(
- OTLPSpanExporter(
- endpoint="https://api.honeycomb.io/v1/traces",
- headers={
- "x-honeycomb-team": os.getenv("HONEYCOMB_API_KEY"),
- "x-honeycomb-dataset": "anyscale-telemetry"
- }
- )
-)
-provider.add_span_processor(processor)
-
-# Use adapter
-adapter = instrument_anyscale(team="honeycomb-team")
-
-response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[...],
- customer_id="customer-123"
-)
-
-# Query in Honeycomb:
-# - Traces with genops.anyscale.* attributes
-# - Cost analysis by customer: SUM(genops.anyscale.cost.total) GROUP BY genops.customer_id
-# - Latency p99 by model: P99(duration_ms) GROUP BY genops.anyscale.model
-```
-
-## Advanced Use Cases
-
-### Multi-Model Router
-
-```python
-# Intelligent routing based on task complexity and cost
-
-from genops.providers.anyscale import instrument_anyscale, get_model_pricing
-
-class IntelligentRouter:
- def __init__(self, adapter):
- self.adapter = adapter
- self.model_tiers = {
- "simple": "meta-llama/Llama-2-7b-chat-hf",
- "medium": "meta-llama/Llama-2-13b-chat-hf",
- "complex": "meta-llama/Llama-2-70b-chat-hf",
- }
-
- def estimate_complexity(self, prompt: str) -> str:
- """Estimate task complexity."""
- # Simple heuristic - replace with ML model in production
- if len(prompt) < 100:
- return "simple"
- elif len(prompt) < 500:
- return "medium"
- else:
- return "complex"
-
- def route_completion(self, messages: list, **kwargs):
- """Route to appropriate model based on complexity."""
- prompt = messages[0]['content'] if messages else ""
- complexity = self.estimate_complexity(prompt)
- model = self.model_tiers[complexity]
-
- print(f"๐ Routing to {complexity} tier: {model}")
-
- return self.adapter.completion_create(
- model=model,
- messages=messages,
- **kwargs
- )
-
-# Usage
-adapter = instrument_anyscale(team="intelligent-routing")
-router = IntelligentRouter(adapter)
-
-# Automatically routed to optimal model
-response = router.route_completion(
- messages=[{"role": "user", "content": "What is 2+2?"}],
- customer_id="customer-123"
-)
-```
-
-### A/B Testing Framework
-
-```python
-# A/B test different models for performance and cost
-
-import random
-from typing import Dict, List
-
-class ABTestingAdapter:
- def __init__(self, adapter):
- self.adapter = adapter
- self.experiments = {}
- self.results = []
-
- def create_experiment(
- self,
- name: str,
- variants: Dict[str, str],
- traffic_split: Dict[str, float]
- ):
- """Create A/B test experiment."""
- self.experiments[name] = {
- "variants": variants,
- "traffic_split": traffic_split
- }
-
- def select_variant(self, experiment_name: str) -> str:
- """Select variant based on traffic split."""
- experiment = self.experiments[experiment_name]
- rand = random.random()
-
- cumulative = 0
- for variant, percentage in experiment["traffic_split"].items():
- cumulative += percentage
- if rand <= cumulative:
- return variant
-
- return list(experiment["variants"].keys())[0]
-
- def experimental_completion(self, experiment_name: str, messages: list, **kwargs):
- """Run completion as part of A/B test."""
- variant = self.select_variant(experiment_name)
- model = self.experiments[experiment_name]["variants"][variant]
-
- import time
- start_time = time.time()
-
- response = self.adapter.completion_create(
- model=model,
- messages=messages,
- experiment_name=experiment_name,
- variant=variant,
- **kwargs
- )
-
- latency = time.time() - start_time
-
- # Record results
- self.results.append({
- "experiment": experiment_name,
- "variant": variant,
- "model": model,
- "latency": latency,
- "tokens": response['usage']['total_tokens'],
- "cost": calculate_completion_cost(
- model=model,
- input_tokens=response['usage']['prompt_tokens'],
- output_tokens=response['usage']['completion_tokens']
- )
- })
-
- return response
-
- def analyze_results(self, experiment_name: str):
- """Analyze A/B test results."""
- exp_results = [r for r in self.results if r["experiment"] == experiment_name]
-
- by_variant = {}
- for result in exp_results:
- variant = result["variant"]
- if variant not in by_variant:
- by_variant[variant] = {"latency": [], "cost": []}
-
- by_variant[variant]["latency"].append(result["latency"])
- by_variant[variant]["cost"].append(result["cost"])
-
- # Calculate averages
- for variant, data in by_variant.items():
- avg_latency = sum(data["latency"]) / len(data["latency"])
- avg_cost = sum(data["cost"]) / len(data["cost"])
-
- print(f"\n{variant}:")
- print(f" Average latency: {avg_latency:.3f}s")
- print(f" Average cost: ${avg_cost:.6f}")
-
-# Usage
-adapter = instrument_anyscale(team="ab-testing")
-ab_adapter = ABTestingAdapter(adapter)
-
-# Create experiment: Llama-2-70B vs Llama-2-13B
-ab_adapter.create_experiment(
- name="model_comparison",
- variants={
- "control": "meta-llama/Llama-2-70b-chat-hf",
- "variant_a": "meta-llama/Llama-2-13b-chat-hf"
- },
- traffic_split={
- "control": 0.5,
- "variant_a": 0.5
- }
-)
-
-# Run experiment
-for i in range(100):
- response = ab_adapter.experimental_completion(
- experiment_name="model_comparison",
- messages=[{"role": "user", "content": f"Query {i}"}]
- )
-
-# Analyze results
-ab_adapter.analyze_results("model_comparison")
-```
-
-### Cost Budgeting and Alerts
-
-```python
-# Implement cost budgets with real-time alerts
-
-from datetime import datetime
-from typing import Optional
-
-class BudgetManager:
- def __init__(
- self,
- adapter,
- daily_budget_usd: float,
- monthly_budget_usd: float
- ):
- self.adapter = adapter
- self.daily_budget = daily_budget_usd
- self.monthly_budget = monthly_budget_usd
- self.daily_spend = 0.0
- self.monthly_spend = 0.0
- self.last_reset_date = datetime.now().date()
- self.alert_thresholds = [0.5, 0.75, 0.9, 1.0] # 50%, 75%, 90%, 100%
- self.alerts_sent = set()
-
- def check_and_reset_daily(self):
- """Reset daily spend if new day."""
- current_date = datetime.now().date()
- if current_date > self.last_reset_date:
- self.daily_spend = 0.0
- self.last_reset_date = current_date
- self.alerts_sent.clear()
-
- def check_budget(self, estimated_cost: float) -> tuple[bool, Optional[str]]:
- """Check if request would exceed budget."""
- self.check_and_reset_daily()
-
- new_daily = self.daily_spend + estimated_cost
- new_monthly = self.monthly_spend + estimated_cost
-
- if new_daily > self.daily_budget:
- return False, f"Would exceed daily budget: ${new_daily:.2f} > ${self.daily_budget:.2f}"
-
- if new_monthly > self.monthly_budget:
- return False, f"Would exceed monthly budget: ${new_monthly:.2f} > ${self.monthly_budget:.2f}"
-
- return True, None
-
- def send_alert(self, threshold: float, budget_type: str):
- """Send budget alert."""
- alert_key = f"{budget_type}_{threshold}"
- if alert_key not in self.alerts_sent:
- percentage = int(threshold * 100)
- print(f"๐จ ALERT: {percentage}% of {budget_type} budget consumed")
- self.alerts_sent.add(alert_key)
- # Implement actual alerting: email, Slack, PagerDuty, etc.
-
- def check_alert_thresholds(self):
- """Check if alert thresholds reached."""
- daily_pct = self.daily_spend / self.daily_budget
- monthly_pct = self.monthly_spend / self.monthly_budget
-
- for threshold in self.alert_thresholds:
- if daily_pct >= threshold:
- self.send_alert(threshold, "daily")
- if monthly_pct >= threshold:
- self.send_alert(threshold, "monthly")
-
- def completion_create(self, model: str, messages: list, **kwargs):
- """Completion with budget enforcement."""
- # Estimate cost before making request
- prompt_tokens = sum(len(m['content'].split()) for m in messages) * 1.3 # Rough estimate
- estimated_output_tokens = kwargs.get('max_tokens', 500)
-
- estimated_cost = calculate_completion_cost(
- model=model,
- input_tokens=int(prompt_tokens),
- output_tokens=estimated_output_tokens
- )
-
- # Check budget
- allowed, reason = self.check_budget(estimated_cost)
- if not allowed:
- raise BudgetExceededError(reason)
-
- # Make request
- response = self.adapter.completion_create(
- model=model,
- messages=messages,
- **kwargs
- )
-
- # Record actual cost
- actual_cost = calculate_completion_cost(
- model=model,
- input_tokens=response['usage']['prompt_tokens'],
- output_tokens=response['usage']['completion_tokens']
- )
-
- self.daily_spend += actual_cost
- self.monthly_spend += actual_cost
-
- # Check alert thresholds
- self.check_alert_thresholds()
-
- return response
-
-# Usage
-adapter = instrument_anyscale(team="budget-controlled")
-budget_manager = BudgetManager(
- adapter,
- daily_budget_usd=10.0,
- monthly_budget_usd=200.0
-)
-
-# Use with budget enforcement
-try:
- response = budget_manager.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[{"role": "user", "content": "..."}]
- )
-except BudgetExceededError as e:
- print(f"Budget limit reached: {e}")
-```
-
-## Troubleshooting
-
-### Common Issues
-
-#### Issue: "ANYSCALE_API_KEY not set"
-
-**Symptom:**
-```
-ValidationError: ANYSCALE_API_KEY environment variable not set
-```
-
-**Fix:**
-```bash
-# Set API key
-export ANYSCALE_API_KEY='your-api-key-here'
-
-# Verify
-echo $ANYSCALE_API_KEY
-
-# Permanent fix (add to ~/.bashrc or ~/.zshrc)
-echo 'export ANYSCALE_API_KEY="your-api-key-here"' >> ~/.bashrc
-source ~/.bashrc
-```
-
-#### Issue: "Authentication Failed"
-
-**Symptom:**
-```
-AuthenticationError: Invalid API key
-```
-
-**Fix:**
-1. Verify API key at: https://console.anyscale.com/credentials
-2. Check for extra spaces when copying
-3. Ensure key hasn't expired
-4. Create new API key if needed
-
-```bash
-# Test API key manually
-curl -H "Authorization: Bearer $ANYSCALE_API_KEY" \
- https://api.endpoints.anyscale.com/v1/models
-```
-
-#### Issue: "Model not found"
-
-**Symptom:**
-```
-ModelNotFoundError: Model 'meta-llama/Llama-2-70b' not available
-```
-
-**Fix:**
-```python
-# List available models
-from genops.providers.anyscale import ANYSCALE_PRICING
-
-print("Available models:")
-for model in ANYSCALE_PRICING.keys():
- print(f" - {model}")
-
-# Use exact model name including suffix
-model = "meta-llama/Llama-2-70b-chat-hf" # Correct
-# model = "meta-llama/Llama-2-70b" # Wrong - missing suffix
-```
-
-#### Issue: "Connection timeout"
-
-**Symptom:**
-```
-ConnectionError: Request timeout after 60s
-```
-
-**Fix:**
-1. Check network connectivity
-2. Verify firewall settings
-3. Check DNS resolution
-4. Try different network
-
-```bash
-# Test connectivity
-curl https://api.endpoints.anyscale.com/v1/models
-
-# Check DNS
-nslookup api.endpoints.anyscale.com
-
-# Test with timeout
-curl --max-time 30 https://api.endpoints.anyscale.com/v1/models
-```
-
-#### Issue: "Rate limit exceeded"
-
-**Symptom:**
-```
-RateLimitError: Too many requests (429)
-```
-
-**Fix:**
-```python
-# Implement rate limiting
-import time
-from tenacity import retry, wait_exponential, stop_after_attempt
-
-@retry(
- wait=wait_exponential(multiplier=1, min=1, max=60),
- stop=stop_after_attempt(5)
-)
-def rate_limited_completion(adapter, **kwargs):
- """Completion with automatic retry on rate limits."""
- try:
- return adapter.completion_create(**kwargs)
- except RateLimitError:
- print("Rate limit hit, retrying...")
- raise # Retry will handle this
-
-# Usage
-adapter = instrument_anyscale(team="rate-limited-app")
-response = rate_limited_completion(adapter, model="...", messages=[...])
-```
-
-#### Issue: "Telemetry not appearing in observability platform"
-
-**Symptom:**
-OpenTelemetry spans not visible in Datadog/Grafana/Honeycomb
-
-**Fix:**
-1. Verify OTLP exporter configuration
-2. Check endpoint URL and port
-3. Verify authentication headers
-4. Test OTLP endpoint connectivity
-
-```python
-# Debug telemetry export
-import os
-os.environ['OTEL_LOG_LEVEL'] = 'debug'
-
-# Check exporter configuration
-from opentelemetry import trace
-from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
-
-# Add console exporter for debugging
-provider = trace.get_tracer_provider()
-provider.add_span_processor(
- SimpleSpanProcessor(ConsoleSpanExporter())
-)
-
-# Use adapter - spans will print to console
-adapter = instrument_anyscale(team="debug-team")
-response = adapter.completion_create(...)
-```
-
-### Validation Troubleshooting
-
-```python
-# Run comprehensive validation
-from genops.providers.anyscale import validate_setup, print_validation_result
-
-result = validate_setup()
-print_validation_result(result)
-
-# Check specific validation categories
-if not result.success:
- for issue in result.issues:
- print(f"\nโ {issue.title}")
- print(f" Category: {issue.category}")
- print(f" Level: {issue.level}")
- print(f" Description: {issue.description}")
- print(f" Fix: {issue.fix_suggestion}")
-```
-
-### Debug Logging
-
-```python
-# Enable debug logging
-import logging
-
-logging.basicConfig(level=logging.DEBUG)
-logging.getLogger('genops.providers.anyscale').setLevel(logging.DEBUG)
-
-# Use adapter - detailed logs will show all operations
-adapter = instrument_anyscale(
- team="debug-team",
- debug=True # Enable debug mode
-)
-
-response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[{"role": "user", "content": "Test"}]
-)
-
-# Output includes:
-# - API request details
-# - Token usage calculations
-# - Cost calculations
-# - OpenTelemetry span creation
-# - Governance attribute propagation
-```
-
-## API Reference
-
-### Core Functions
-
-#### `instrument_anyscale(**governance_defaults)`
-
-Create GenOps Anyscale adapter with governance defaults.
-
-**Parameters:**
-- `anyscale_api_key` (str, optional): Anyscale API key (defaults to `ANYSCALE_API_KEY` env var)
-- `anyscale_base_url` (str, optional): Base URL (default: "https://api.endpoints.anyscale.com/v1")
-- `telemetry_enabled` (bool): Enable OpenTelemetry tracing (default: True)
-- `cost_tracking_enabled` (bool): Enable cost tracking (default: True)
-- `debug` (bool): Enable debug logging (default: False)
-- `**governance_defaults`: Default governance attributes (team, project, environment, etc.)
-
-**Returns:** `GenOpsAnyscaleAdapter`
-
-**Example:**
-```python
-adapter = instrument_anyscale(
- team="ml-team",
- project="chatbot",
- environment="production"
-)
-```
-
-#### `auto_instrument(**governance_defaults)`
-
-Enable zero-code auto-instrumentation of OpenAI SDK.
-
-**Parameters:**
-- `**governance_defaults`: Default governance attributes for all operations
-
-**Returns:** `bool` - True if successful
-
-**Example:**
-```python
-from genops.providers.anyscale import auto_instrument
-
-auto_instrument(team="auto-team", project="auto-project")
-
-# Existing OpenAI SDK code now automatically tracked
-import openai
-client = openai.OpenAI(base_url="https://api.endpoints.anyscale.com/v1")
-response = client.chat.completions.create(...) # Tracked!
-```
-
-#### `validate_setup(anyscale_api_key=None, anyscale_base_url=None)`
-
-Validate Anyscale setup and configuration.
-
-**Parameters:**
-- `anyscale_api_key` (str, optional): API key to validate
-- `anyscale_base_url` (str, optional): Base URL to validate
-
-**Returns:** `ValidationResult`
-
-**Example:**
-```python
-from genops.providers.anyscale import validate_setup, print_validation_result
-
-result = validate_setup()
-print_validation_result(result)
-
-if result.success:
- print("โ
Setup validated")
-```
-
-### Adapter Methods
-
-#### `adapter.completion_create(model, messages, **kwargs)`
-
-Create chat completion with governance tracking.
-
-**Parameters:**
-- `model` (str): Model ID (e.g., "meta-llama/Llama-2-70b-chat-hf")
-- `messages` (list): Chat messages in OpenAI format
-- `temperature` (float, optional): Sampling temperature (0.0-2.0)
-- `max_tokens` (int, optional): Maximum tokens to generate
-- `top_p` (float, optional): Nucleus sampling parameter
-- `frequency_penalty` (float, optional): Frequency penalty (-2.0-2.0)
-- `presence_penalty` (float, optional): Presence penalty (-2.0-2.0)
-- `**governance_attrs`: Per-request governance attributes
-
-**Returns:** dict with OpenAI-compatible response
-
-**Example:**
-```python
-response = adapter.completion_create(
- model="meta-llama/Llama-2-70b-chat-hf",
- messages=[
- {"role": "system", "content": "You are a helpful assistant."},
- {"role": "user", "content": "Hello!"}
- ],
- temperature=0.7,
- max_tokens=500,
- customer_id="customer-123"
-)
-```
-
-#### `adapter.embeddings_create(model, input, **kwargs)`
-
-Create embeddings with governance tracking.
-
-**Parameters:**
-- `model` (str): Embedding model ID (e.g., "thenlper/gte-large")
-- `input` (str or list): Text to embed
-- `**governance_attrs`: Per-request governance attributes
-
-**Returns:** dict with OpenAI-compatible response
-
-**Example:**
-```python
-response = adapter.embeddings_create(
- model="thenlper/gte-large",
- input="Text to embed",
- customer_id="customer-123"
-)
-
-embeddings = response['data'][0]['embedding']
-```
-
-### Pricing Functions
-
-#### `calculate_completion_cost(model, input_tokens, output_tokens)`
-
-Calculate cost for chat completion.
-
-**Parameters:**
-- `model` (str): Model ID
-- `input_tokens` (int): Number of input tokens
-- `output_tokens` (int): Number of output tokens
-
-**Returns:** float (cost in USD)
-
-**Example:**
-```python
-from genops.providers.anyscale import calculate_completion_cost
-
-cost = calculate_completion_cost(
- model="meta-llama/Llama-2-70b-chat-hf",
- input_tokens=100,
- output_tokens=50
-)
-print(f"Cost: ${cost:.6f}")
-```
-
-#### `calculate_embedding_cost(model, tokens)`
-
-Calculate cost for embeddings.
-
-**Parameters:**
-- `model` (str): Embedding model ID
-- `tokens` (int): Number of tokens
-
-**Returns:** float (cost in USD)
-
-#### `get_model_pricing(model)`
-
-Get pricing information for model.
-
-**Parameters:**
-- `model` (str): Model ID
-
-**Returns:** `ModelPricing` dataclass
-
-**Example:**
-```python
-from genops.providers.anyscale import get_model_pricing
-
-pricing = get_model_pricing("meta-llama/Llama-2-70b-chat-hf")
-print(f"Input: ${pricing.input_cost_per_million}/M tokens")
-print(f"Output: ${pricing.output_cost_per_million}/M tokens")
-print(f"Context window: {pricing.context_window} tokens")
-```
-
-### Data Classes
-
-#### `AnyscaleCostSummary`
-
-Cost summary for operations.
-
-**Attributes:**
-- `total_cost` (float): Total cost in USD
-- `cost_by_model` (dict): Costs grouped by model
-- `cost_by_customer` (dict): Costs grouped by customer_id
-- `total_tokens` (int): Total tokens used
-- `operation_count` (int): Number of operations
-
-#### `ModelPricing`
-
-Pricing information for a model.
-
-**Attributes:**
-- `model_name` (str): Model identifier
-- `input_cost_per_million` (float): Input cost per million tokens
-- `output_cost_per_million` (float): Output cost per million tokens
-- `currency` (str): Currency (USD)
-- `category` (str): Model category (chat, embedding)
-- `context_window` (int): Maximum context length
-- `notes` (str): Additional notes
-
----
-
-## Next Steps
-
-**Congratulations!** You now have comprehensive knowledge of GenOps Anyscale integration.
-
-### Recommended Actions
-
-1. **Start Simple**: Use the [Quickstart Guide](../anyscale-quickstart.md) for 5-minute setup
-2. **Explore Examples**: Try `examples/anyscale/basic_completion.py`
-3. **Enable Auto-Instrumentation**: Zero-code setup for existing applications
-4. **Configure Observability**: Export to your platform (Datadog, Grafana, etc.)
-5. **Optimize Costs**: Use model selection and caching strategies
-6. **Scale to Production**: Implement budgets, monitoring, and high-availability patterns
-
-### Additional Resources
-
-- **Quickstart Guide**: [docs/anyscale-quickstart.md](../anyscale-quickstart.md)
-- **Example Scripts**: `examples/anyscale/`
-- **Anyscale Documentation**: https://docs.anyscale.com
-- **GenOps GitHub**: https://github.com/KoshiHQ/GenOps-AI
-
-### Community
-
-- **Issues**: [GitHub Issues](https://github.com/KoshiHQ/GenOps-AI/issues)
-- **Discussions**: [GitHub Discussions](https://github.com/KoshiHQ/GenOps-AI/discussions)
-- **Contributing**: See [CONTRIBUTING.md](../../CONTRIBUTING.md)
-
----
-
-**Built with GenOps AI** - Governance for AI, Built on OpenTelemetry
diff --git a/docs/integrations/arize.md b/docs/integrations/arize.md
deleted file mode 100644
index 592e542..0000000
--- a/docs/integrations/arize.md
+++ /dev/null
@@ -1,1914 +0,0 @@
-# Arize AI Integration
-
-> ๐ **Navigation:** [Quickstart (5 min)](../arize-quickstart.md) โ **Complete Guide** โ [Examples](../../examples/arize/)
-
-Complete integration guide for Arize AI model monitoring with GenOps governance, cost intelligence, and policy enforcement.
-
-## ๐บ๏ธ Choose Your Learning Path
-
-**๐ New to Arize + GenOps?** Start here:
-1. **[5-minute Quickstart](../arize-quickstart.md)** - Get running with zero code changes
-2. **[Interactive Examples](../../examples/arize/)** - Copy-paste working code
-3. **Come back here** for deep-dive documentation
-
-**๐ Looking for specific info?** Jump to:
-- [Cost Intelligence & ROI](../cost-intelligence-guide.md) - Calculate ROI and optimize costs
-- [Enterprise Governance](../enterprise-governance-templates.md) - Compliance templates (SOX, GDPR, HIPAA)
-- [Production Patterns](#enterprise-deployment-patterns) - HA, scaling, monitoring
-
-## ๐บ๏ธ Visual Learning Path
-
-```
-๐ START HERE: 5-minute Quickstart
-โ โโโ Zero-code setup
-โ โโโ Basic validation
-โ โโโ Success confirmation
-โ
-โโโโ ๐ HANDS-ON: Interactive Examples (5-30 min)
-โ โโโ basic_tracking.py โ See governance in action
-โ โโโ cost_optimization.py โ Learn cost intelligence
-โ โโโ advanced_features.py โ Multi-model patterns
-โ โโโ production_patterns.py โ Enterprise deployment
-โ
-โโโโ ๐ DEEP-DIVE: Complete Guide (15-60 min)
-โ โโโ Manual Configuration โ Full control & customization
-โ โโโ Governance Policies โ Team attribution & budgets
-โ โโโ Production Monitoring โ Dashboards & alerting
-โ โโโ Troubleshooting โ Problem solving
-โ
-โโโโ ๐ฐ BUSINESS: Cost Intelligence (15-45 min)
-โ โโโ ROI Calculator โ Business justification
-โ โโโ Cost Optimization โ Reduce monitoring costs
-โ โโโ Budget Forecasting โ Plan future investments
-โ
-โโโโ ๐ข ENTERPRISE: Governance Templates (30-120 min)
- โโโ SOX Compliance โ Financial regulations
- โโโ GDPR Compliance โ EU data protection
- โโโ HIPAA Compliance โ Healthcare requirements
- โโโ Multi-Tenant Setup โ SaaS deployments
-```
-
-**๐ฏ Choose your path based on:**
-- **Time available:** 5 min (Quickstart) โ 30 min (Examples) โ 60+ min (Enterprise)
-- **Role:** Developer (Examples) โ FinOps (Cost Intelligence) โ Architect (Enterprise)
-- **Goal:** Quick setup โ Production deployment โ Compliance requirements
-
-## Table of Contents
-
-- [Overview](#overview)
-- [Quick Start](#quick-start) โฑ๏ธ 5 minutes
-- [Manual Adapter Usage](#manual-adapter-usage) โฑ๏ธ 15 minutes
-- [Cost Intelligence](#cost-intelligence) โฑ๏ธ 10 minutes
-- [Governance Configuration](#governance-configuration) โฑ๏ธ 20 minutes
-- [Enterprise Deployment Patterns](#enterprise-deployment-patterns) โฑ๏ธ 30 minutes
-- [Production Monitoring](#production-monitoring) โฑ๏ธ 20 minutes
-- [Validation and Troubleshooting](#validation-and-troubleshooting) โฑ๏ธ 10 minutes
-- [API Reference](#api-reference)
-
-**๐ Advanced Guides:**
-- **[Cost Intelligence & ROI Guide](../cost-intelligence-guide.md)** - ROI templates, cost optimization, and budget forecasting
-- **[Production Deployment Patterns](../examples/arize/production_patterns.py)** - Enterprise architecture and scaling patterns
-
-## Overview
-
-The GenOps Arize AI integration provides comprehensive governance for machine learning model monitoring operations. Arize AI is a leading ML observability platform that helps teams monitor, troubleshoot, and improve model performance in production. This integration adds cost tracking, team attribution, and policy enforcement to your Arize AI workflows.
-
-### ๐ Quick Value Proposition
-
-| โฑ๏ธ Time Investment | ๐ฐ Value Delivered | ๐ฏ Use Case |
-|-------------------|-------------------|-------------|
-| **5 minutes** | Zero-code governance for existing Arize workflows | Quick wins |
-| **30 minutes** | Complete cost intelligence and optimization | Production ready |
-| **2 hours** | Enterprise governance with compliance | Mission critical |
-
-### Key Features
-
-- **Model Monitoring Governance**: Enhanced prediction logging and model performance tracking with cost attribution
-- **Data Quality Intelligence**: Cost tracking for data drift detection and quality monitoring operations
-- **Alert Management**: Governed alert creation with cost optimization and team attribution
-- **Dashboard Analytics**: Cost tracking for dashboard access and custom analytics
-- **Budget Enforcement**: Real-time cost tracking with configurable budget limits and alerts
-- **Zero-Code Auto-Instrumentation**: Transparent governance for existing Arize AI code
-- **Multi-Environment Support**: Environment-specific monitoring with governance policies
-
-> ๐ก **New to Arize AI?** Check our [5-minute quickstart guide](../arize-quickstart.md) for immediate setup.
-
-## Quick Start
-
-### Prerequisites
-
-```bash
-# Install Arize AI SDK and GenOps
-pip install genops[arize]
-
-# Or install dependencies separately
-pip install genops arize pandas
-```
-
-### Environment Setup
-
-```bash
-# Required: Arize AI credentials
-export ARIZE_API_KEY="your-arize-api-key"
-export ARIZE_SPACE_KEY="your-arize-space-key"
-
-# Recommended: GenOps governance attributes
-export GENOPS_TEAM="ml-platform"
-export GENOPS_PROJECT="fraud-detection"
-export GENOPS_ENVIRONMENT="production"
-export GENOPS_DAILY_BUDGET_LIMIT="50.0"
-```
-
-### Zero-Code Auto-Instrumentation
-
-```python
-from genops.providers.arize import auto_instrument
-
-# Enable automatic governance for all Arize operations
-auto_instrument(
- team="ml-platform",
- project="fraud-detection"
-)
-
-# Your existing Arize code now includes GenOps governance
-from arize.pandas.logger import Client
-
-arize_client = Client(
- api_key="your-api-key",
- space_key="your-space-key"
-)
-
-# This is automatically tracked with cost attribution and governance
-response = arize_client.log(
- prediction_id="pred-123",
- prediction_label="positive",
- actual_label="positive",
- model_id="sentiment-model-v2",
- model_version="2.1"
-)
-```
-
-## Manual Adapter Usage
-
-### Basic Configuration
-
-```python
-from genops.providers.arize import GenOpsArizeAdapter
-
-# Initialize with governance configuration
-adapter = GenOpsArizeAdapter(
- arize_api_key="your-api-key",
- arize_space_key="your-space-key",
- team="ml-platform-team",
- project="production-monitoring",
- environment="production",
- daily_budget_limit=50.0,
- max_monitoring_cost=25.0,
- enable_cost_alerts=True
-)
-```
-
-### Model Monitoring Session
-
-```python
-# Track complete monitoring lifecycle with governance
-with adapter.track_model_monitoring_session(
- model_id="fraud-detection-v3",
- model_version="3.1",
- environment="production"
-) as session:
-
- # Log prediction batch with cost tracking
- predictions_df = load_predictions() # Your prediction data
- session.log_prediction_batch(
- predictions_df,
- cost_per_prediction=0.001
- )
-
- # Monitor data quality with governance
- quality_metrics = calculate_quality_metrics()
- session.log_data_quality_metrics(
- quality_metrics,
- cost_estimate=0.05
- )
-
- # Create governed performance alerts
- session.create_performance_alert(
- metric="accuracy",
- threshold=0.85,
- cost_per_alert=0.10
- )
-
- # Update monitoring costs manually if needed
- session.update_monitoring_cost(additional_cost=0.20)
-```
-
-### Governed Artifact Logging
-
-```python
-import wandb
-
-# Create and log artifacts with governance metadata
-model_artifact = wandb.Artifact("trained-model-v3", type="model")
-model_artifact.add_file("fraud_model.pkl")
-
-adapter.log_governed_artifact(
- artifact=model_artifact,
- cost_estimate=1.50,
- governance_metadata={
- "compliance_level": "SOX",
- "data_classification": "sensitive",
- "retention_period": "7_years"
- }
-)
-```
-
-## Cost Intelligence Features
-
-### Real-Time Cost Tracking
-
-```python
-# Get current monitoring session cost breakdown
-session_cost = adapter.get_monitoring_cost_summary("session-123")
-
-print(f"Total Cost: ${session_cost.total_cost:.2f}")
-print(f"Prediction Logging: ${session_cost.prediction_logging_cost:.2f}")
-print(f"Data Quality: ${session_cost.data_quality_cost:.2f}")
-print(f"Alert Management: ${session_cost.alert_management_cost:.2f}")
-print(f"Dashboard Analytics: ${session_cost.dashboard_cost:.2f}")
-print(f"Efficiency Score: {session_cost.efficiency_score:.2f} predictions/hour")
-```
-
-### Cost Aggregation and Analysis
-
-```python
-from genops.providers.arize_cost_aggregator import ArizeCostAggregator
-
-# Initialize cost aggregator for detailed analysis
-cost_aggregator = ArizeCostAggregator(
- team="ml-platform",
- project="fraud-detection",
- budget_limit=1000.0
-)
-
-# Calculate comprehensive monitoring costs
-session_cost = cost_aggregator.calculate_monitoring_session_cost(
- model_id="fraud-model-v3",
- model_version="3.1",
- environment="production",
- prediction_count=100000,
- data_quality_checks=50,
- active_alerts=5,
- session_duration_hours=24
-)
-
-print(f"Session Cost Breakdown:")
-print(f" Total: ${session_cost.total_cost:.2f}")
-print(f" Cost per Prediction: ${session_cost.cost_per_prediction:.6f}")
-print(f" Efficiency Score: {session_cost.efficiency_score:.2f}")
-```
-
-### Cost Optimization Recommendations
-
-```python
-# Get monthly cost summary and optimization suggestions
-monthly_summary = cost_aggregator.get_monthly_cost_summary()
-optimization_recommendations = cost_aggregator.get_cost_optimization_recommendations()
-
-print(f"Monthly Summary:")
-print(f" Total Cost: ${monthly_summary.total_cost:.2f}")
-print(f" Budget Utilization: {monthly_summary.budget_utilization:.1f}%")
-print(f" Top Cost Driver: {monthly_summary.top_cost_drivers[0]}")
-
-print(f"\nOptimization Opportunities:")
-for rec in optimization_recommendations:
- print(f" โข {rec.title}")
- print(f" Potential Savings: ${rec.potential_savings:.2f}")
- print(f" Effort Level: {rec.effort_level}")
- print(f" Priority Score: {rec.priority_score:.1f}/100")
-```
-
-## Advanced Features
-
-### Multi-Model Cost Tracking
-
-```python
-# Track costs across multiple models with unified governance
-models_to_monitor = [
- ("fraud-detection-v3", "3.1"),
- ("credit-scoring-v2", "2.3"),
- ("risk-assessment-v1", "1.5")
-]
-
-total_monthly_cost = 0.0
-cost_by_model = {}
-
-for model_id, version in models_to_monitor:
- model_cost = cost_aggregator.calculate_monitoring_session_cost(
- model_id=model_id,
- model_version=version,
- prediction_count=50000,
- data_quality_checks=20,
- active_alerts=3,
- session_duration_hours=720 # Monthly (30 days * 24 hours)
- )
-
- cost_by_model[f"{model_id}-{version}"] = model_cost.total_cost
- total_monthly_cost += model_cost.total_cost
-
-print(f"Multi-Model Monitoring Costs:")
-for model, cost in cost_by_model.items():
- print(f" {model}: ${cost:.2f}")
-print(f"Total Monthly Cost: ${total_monthly_cost:.2f}")
-```
-
-### Custom Pricing and Forecasting
-
-```python
-from genops.providers.arize_pricing import ArizePricingCalculator, PricingTier
-
-# Initialize pricing calculator with enterprise tier
-calculator = ArizePricingCalculator(
- tier=PricingTier.ENTERPRISE,
- region="us-east-1",
- currency="USD",
- enterprise_discount=15.0 # 15% enterprise discount
-)
-
-# Calculate detailed costs with volume discounts
-prediction_cost = calculator.calculate_prediction_logging_cost(
- prediction_count=1000000, # 1M predictions
- model_tier="production",
- time_period_days=30
-)
-
-print(f"Prediction Logging Cost Breakdown:")
-print(f" Base Cost: ${prediction_cost.base_cost:.2f}")
-print(f" Volume Discount: ${prediction_cost.volume_discount:.2f}")
-print(f" Final Cost: ${prediction_cost.final_cost:.2f}")
-print(f" Effective Rate: ${prediction_cost.effective_rate:.6f} per prediction")
-
-# Get monthly estimate with optimization
-monthly_estimate = calculator.estimate_monthly_cost(
- models=10,
- predictions_per_model=100000,
- optimize_for_cost=True
-)
-
-print(f"\nMonthly Estimate:")
-print(f" Total Estimated Cost: ${monthly_estimate.total_estimated_cost:.2f}")
-print(f" Recommended Tier: {monthly_estimate.recommended_tier.value}")
-print(f" Potential Savings: ${monthly_estimate.potential_savings:.2f}")
-print(f" Optimization Opportunities:")
-for opportunity in monthly_estimate.optimization_opportunities:
- print(f" โข {opportunity}")
-```
-
-### Environment-Specific Governance
-
-```python
-# Configure different governance policies by environment
-environments = ["development", "staging", "production"]
-governance_configs = {
- "development": {
- "daily_budget_limit": 10.0,
- "max_monitoring_cost": 5.0,
- "enable_cost_alerts": False,
- "governance_policy": "advisory"
- },
- "staging": {
- "daily_budget_limit": 25.0,
- "max_monitoring_cost": 12.0,
- "enable_cost_alerts": True,
- "governance_policy": "advisory"
- },
- "production": {
- "daily_budget_limit": 100.0,
- "max_monitoring_cost": 50.0,
- "enable_cost_alerts": True,
- "governance_policy": "enforced"
- }
-}
-
-# Create environment-specific adapters
-adapters = {}
-for env in environments:
- adapters[env] = GenOpsArizeAdapter(
- team="ml-platform",
- project="multi-env-monitoring",
- environment=env,
- **governance_configs[env]
- )
-
-# Use appropriate adapter based on deployment environment
-current_env = "production" # This would come from your deployment config
-adapter = adapters[current_env]
-
-# Monitoring operations now use environment-specific governance
-with adapter.track_model_monitoring_session("model-v1") as session:
- # Environment-specific cost limits and policies are enforced
- pass
-```
-
-## Enterprise Deployment Patterns
-
-### High-Availability Architecture
-
-```python
-from genops.providers.arize import GenOpsArizeAdapter
-from typing import Dict, List, Optional
-import logging
-
-class EnterpriseArizeDeployment:
- """Enterprise-grade Arize deployment with HA and failover."""
-
- def __init__(self, regions: List[str], environment: str = "production"):
- self.regions = regions
- self.environment = environment
- self.adapters: Dict[str, GenOpsArizeAdapter] = {}
- self.primary_region = regions[0] if regions else "us-east-1"
- self.logger = logging.getLogger(f"genops.arize.enterprise.{environment}")
-
- # Initialize regional adapters
- self._setup_regional_adapters()
-
- def _setup_regional_adapters(self):
- """Set up Arize adapters for each region."""
- for region in self.regions:
- is_primary = region == self.primary_region
-
- self.adapters[region] = GenOpsArizeAdapter(
- team=f"enterprise-{region}",
- project=f"ha-monitoring-{self.environment}",
- environment=self.environment,
- daily_budget_limit=500.0 if is_primary else 300.0,
- max_monitoring_cost=100.0 if is_primary else 75.0,
- enable_governance=True,
- enable_cost_alerts=True,
- tags={
- 'deployment_type': 'enterprise',
- 'region': region,
- 'role': 'primary' if is_primary else 'secondary',
- 'ha_enabled': 'true',
- 'failover_capable': 'true'
- }
- )
-
- self.logger.info(f"Initialized {region} adapter ({'PRIMARY' if is_primary else 'SECONDARY'})")
-
- def monitor_with_failover(self, model_id: str, predictions_data, max_retries: int = 2):
- """Monitor with automatic failover across regions."""
-
- for attempt in range(max_retries + 1):
- current_region = self.regions[attempt % len(self.regions)]
- adapter = self.adapters[current_region]
-
- try:
- self.logger.info(f"Attempting monitoring in {current_region} (attempt {attempt + 1})")
-
- with adapter.track_model_monitoring_session(
- model_id=model_id,
- environment=self.environment,
- max_cost=50.0
- ) as session:
- # Log predictions
- session.log_prediction_batch(predictions_data, cost_per_prediction=0.001)
-
- # Monitor data quality
- quality_metrics = {'accuracy': 0.94, 'data_drift_score': 0.12}
- session.log_data_quality_metrics(quality_metrics, cost_estimate=0.05)
-
- # Create performance alerts
- session.create_performance_alert('accuracy', 0.90, 0.15)
-
- self.logger.info(f"Successfully monitored in {current_region}")
- return {
- 'success': True,
- 'region': current_region,
- 'cost': session.estimated_cost,
- 'predictions': session.prediction_count
- }
-
- except Exception as e:
- self.logger.warning(f"Monitoring failed in {current_region}: {e}")
- if attempt == max_retries:
- self.logger.error(f"All regions failed after {max_retries + 1} attempts")
- raise e
- continue
-
- return {'success': False, 'region': None}
-
-# Example: Multi-region enterprise deployment
-enterprise_deployment = EnterpriseArizeDeployment(
- regions=['us-east-1', 'us-west-2', 'eu-west-1'],
- environment='production'
-)
-
-# Use with automatic failover
-import pandas as pd
-sample_predictions = pd.DataFrame({'prediction': [1, 0, 1, 1, 0] * 100})
-
-result = enterprise_deployment.monitor_with_failover(
- model_id='enterprise-fraud-model-v3',
- predictions_data=sample_predictions
-)
-
-print(f"Monitoring result: {result}")
-```
-
-### Auto-Scaling Configuration
-
-```python
-class AutoScalingArizeConfig:
- """Auto-scaling configuration for variable workloads."""
-
- def __init__(self):
- self.scaling_tiers = {
- 'light': {
- 'daily_budget': 50.0,
- 'max_session_cost': 15.0,
- 'sampling_rate': 1.0,
- 'alert_threshold': 0.90
- },
- 'medium': {
- 'daily_budget': 150.0,
- 'max_session_cost': 40.0,
- 'sampling_rate': 0.8,
- 'alert_threshold': 0.85
- },
- 'heavy': {
- 'daily_budget': 400.0,
- 'max_session_cost': 100.0,
- 'sampling_rate': 0.3,
- 'alert_threshold': 0.80
- },
- 'enterprise': {
- 'daily_budget': 1000.0,
- 'max_session_cost': 200.0,
- 'sampling_rate': 0.1,
- 'alert_threshold': 0.75
- }
- }
-
- def get_optimal_tier(self, daily_prediction_volume: int) -> str:
- """Determine optimal scaling tier based on volume."""
- if daily_prediction_volume < 100_000:
- return 'light'
- elif daily_prediction_volume < 1_000_000:
- return 'medium'
- elif daily_prediction_volume < 10_000_000:
- return 'heavy'
- else:
- return 'enterprise'
-
- def create_scaled_adapter(self, daily_volume: int, team: str, project: str):
- """Create appropriately scaled adapter."""
- tier = self.get_optimal_tier(daily_volume)
- config = self.scaling_tiers[tier]
-
- return GenOpsArizeAdapter(
- team=team,
- project=project,
- daily_budget_limit=config['daily_budget'],
- max_monitoring_cost=config['max_session_cost'],
- enable_governance=True,
- enable_cost_alerts=True,
- tags={
- 'scaling_tier': tier,
- 'daily_volume': str(daily_volume),
- 'sampling_rate': str(config['sampling_rate']),
- 'auto_scaled': 'true'
- }
- )
-
-# Example auto-scaling usage
-scaling_config = AutoScalingArizeConfig()
-
-# Different workloads get appropriate configurations
-light_adapter = scaling_config.create_scaled_adapter(50_000, "startup-team", "mvp-model")
-enterprise_adapter = scaling_config.create_scaled_adapter(25_000_000, "enterprise-ml", "production-models")
-
-print(f"Light workload tier: {scaling_config.get_optimal_tier(50_000)}")
-print(f"Enterprise workload tier: {scaling_config.get_optimal_tier(25_000_000)}")
-```
-
-### Compliance and Audit Patterns
-
-```python
-class ComplianceArizeAdapter:
- """Compliance-ready Arize adapter with audit trail."""
-
- def __init__(self, compliance_level: str, team: str, project: str):
- self.compliance_level = compliance_level
- self.audit_trail = []
-
- # Compliance-specific configurations
- compliance_configs = {
- 'SOX': {
- 'data_retention_years': 7,
- 'access_logging': 'comprehensive',
- 'change_approval': 'required',
- 'audit_frequency': 'quarterly'
- },
- 'GDPR': {
- 'data_residency': 'eu_only',
- 'pii_handling': 'anonymized',
- 'right_to_deletion': 'supported',
- 'consent_tracking': 'enabled'
- },
- 'HIPAA': {
- 'data_classification': 'phi',
- 'encryption': 'aes_256',
- 'access_controls': 'strict',
- 'minimum_necessary': 'enforced'
- }
- }
-
- config = compliance_configs.get(compliance_level, {})
-
- self.adapter = GenOpsArizeAdapter(
- team=team,
- project=project,
- enable_governance=True,
- cost_center=f'{compliance_level}-ML-001',
- tags={
- 'compliance_framework': compliance_level,
- 'audit_trail': 'enabled',
- **config
- }
- )
-
- def audit_log(self, action: str, details: Dict):
- """Log compliance-relevant actions."""
- from datetime import datetime
-
- audit_entry = {
- 'timestamp': datetime.utcnow().isoformat(),
- 'compliance_level': self.compliance_level,
- 'action': action,
- 'details': details,
- 'user_context': 'system' # Would include actual user in production
- }
-
- self.audit_trail.append(audit_entry)
-
- def compliant_monitoring_session(self, model_id: str, **kwargs):
- """Create monitoring session with compliance logging."""
-
- self.audit_log('monitoring_session_start', {
- 'model_id': model_id,
- 'compliance_checks': 'enabled',
- 'data_handling': 'compliant'
- })
-
- return self.adapter.track_model_monitoring_session(model_id, **kwargs)
-
- def generate_audit_report(self) -> Dict:
- """Generate compliance audit report."""
- return {
- 'compliance_level': self.compliance_level,
- 'audit_period': f"{len(self.audit_trail)} events",
- 'audit_trail': self.audit_trail,
- 'compliance_status': 'COMPLIANT',
- 'recommendations': [
- 'Continue current compliance practices',
- 'Schedule quarterly compliance review',
- 'Update data retention policies as needed'
- ]
- }
-
-# Example compliance implementations
-sox_adapter = ComplianceArizeAdapter('SOX', 'financial-ml-team', 'risk-models')
-gdpr_adapter = ComplianceArizeAdapter('GDPR', 'eu-ml-team', 'customer-models')
-hipaa_adapter = ComplianceArizeAdapter('HIPAA', 'healthcare-ml', 'diagnosis-models')
-
-# Compliant monitoring example
-with sox_adapter.compliant_monitoring_session('financial-risk-model-v2') as session:
- # All operations are automatically logged for compliance
- sample_data = pd.DataFrame({'prediction': [1, 0, 1] * 10})
- session.log_prediction_batch(sample_data, cost_per_prediction=0.001)
-
-# Generate audit report
-audit_report = sox_adapter.generate_audit_report()
-print(f"Compliance audit: {audit_report['compliance_status']}")
-```
-
-## Production Monitoring & Alerting
-
-### Advanced Alert Management
-
-```python
-from dataclasses import dataclass
-from enum import Enum
-from typing import Dict, List, Callable, Optional
-import json
-
-class AlertPriority(Enum):
- CRITICAL = "critical"
- HIGH = "high"
- MEDIUM = "medium"
- LOW = "low"
-
-class AlertChannel(Enum):
- EMAIL = "email"
- SLACK = "slack"
- PAGERDUTY = "pagerduty"
- WEBHOOK = "webhook"
-
-@dataclass
-class AlertRule:
- """Advanced alert rule configuration."""
- name: str
- metric: str
- threshold: float
- comparison: str # "gt", "lt", "eq"
- priority: AlertPriority
- channels: List[AlertChannel]
- cost_per_trigger: float
- suppression_window_minutes: int = 60
- escalation_delay_minutes: int = 30
- auto_resolution_enabled: bool = True
-
-class ProductionAlertManager:
- """Production-grade alert management for Arize monitoring."""
-
- def __init__(self, adapter: GenOpsArizeAdapter):
- self.adapter = adapter
- self.alert_rules: Dict[str, AlertRule] = {}
- self.active_alerts: Dict[str, Dict] = {}
- self.alert_history: List[Dict] = []
-
- def register_alert_rule(self, rule: AlertRule):
- """Register a new alert rule."""
- self.alert_rules[rule.name] = rule
- print(f"โ
Registered alert rule: {rule.name} ({rule.priority.value})")
-
- def create_ml_ops_alerts(self):
- """Create standard ML operations alert rules."""
-
- # Critical business-impact alerts
- self.register_alert_rule(AlertRule(
- name="model_accuracy_critical_drop",
- metric="accuracy",
- threshold=0.85,
- comparison="lt",
- priority=AlertPriority.CRITICAL,
- channels=[AlertChannel.PAGERDUTY, AlertChannel.SLACK],
- cost_per_trigger=0.25,
- suppression_window_minutes=30,
- escalation_delay_minutes=15
- ))
-
- self.register_alert_rule(AlertRule(
- name="severe_data_drift",
- metric="data_drift_score",
- threshold=0.30,
- comparison="gt",
- priority=AlertPriority.CRITICAL,
- channels=[AlertChannel.PAGERDUTY, AlertChannel.EMAIL],
- cost_per_trigger=0.20,
- suppression_window_minutes=120,
- escalation_delay_minutes=20
- ))
-
- # High-priority operational alerts
- self.register_alert_rule(AlertRule(
- name="prediction_latency_spike",
- metric="prediction_latency_p95",
- threshold=500, # 500ms
- comparison="gt",
- priority=AlertPriority.HIGH,
- channels=[AlertChannel.SLACK, AlertChannel.EMAIL],
- cost_per_trigger=0.15,
- suppression_window_minutes=60
- ))
-
- self.register_alert_rule(AlertRule(
- name="daily_budget_exceeded",
- metric="daily_cost_utilization",
- threshold=0.90, # 90% of budget
- comparison="gt",
- priority=AlertPriority.HIGH,
- channels=[AlertChannel.SLACK, AlertChannel.WEBHOOK],
- cost_per_trigger=0.10
- ))
-
- # Medium-priority monitoring alerts
- self.register_alert_rule(AlertRule(
- name="feature_distribution_shift",
- metric="feature_distribution_divergence",
- threshold=0.20,
- comparison="gt",
- priority=AlertPriority.MEDIUM,
- channels=[AlertChannel.EMAIL],
- cost_per_trigger=0.08,
- suppression_window_minutes=240 # 4 hours
- ))
-
- # Low-priority informational alerts
- self.register_alert_rule(AlertRule(
- name="weekly_cost_trend_anomaly",
- metric="weekly_cost_variance",
- threshold=0.25, # 25% variance from trend
- comparison="gt",
- priority=AlertPriority.LOW,
- channels=[AlertChannel.EMAIL],
- cost_per_trigger=0.05,
- suppression_window_minutes=1440 # 24 hours
- ))
-
- def trigger_alert(self, rule_name: str, current_value: float, context: Dict = None):
- """Trigger an alert with contextual information."""
- if rule_name not in self.alert_rules:
- return False
-
- rule = self.alert_rules[rule_name]
- alert_id = f"{rule_name}_{hash(str(current_value))}"
-
- # Check if alert is in suppression window
- if self._is_suppressed(rule_name):
- return False
-
- alert_data = {
- 'id': alert_id,
- 'rule_name': rule_name,
- 'metric': rule.metric,
- 'threshold': rule.threshold,
- 'current_value': current_value,
- 'priority': rule.priority.value,
- 'channels': [ch.value for ch in rule.channels],
- 'cost': rule.cost_per_trigger,
- 'context': context or {},
- 'timestamp': '2024-01-15T10:30:00Z' # Would be actual timestamp
- }
-
- # Add to active alerts
- self.active_alerts[alert_id] = alert_data
- self.alert_history.append(alert_data)
-
- # Send to configured channels
- self._send_alert_notifications(alert_data)
-
- # Track cost
- self.adapter.add_monitoring_cost(rule.cost_per_trigger, f"Alert: {rule_name}")
-
- print(f"๐จ ALERT TRIGGERED: {rule_name}")
- print(f" ๐ Current value: {current_value} (threshold: {rule.threshold})")
- print(f" โก Priority: {rule.priority.value.upper()}")
- print(f" ๐ฐ Cost: ${rule.cost_per_trigger}")
-
- return True
-
- def _is_suppressed(self, rule_name: str) -> bool:
- """Check if alert is in suppression window."""
- # Implementation would check last alert time vs suppression window
- return False # Simplified for example
-
- def _send_alert_notifications(self, alert_data: Dict):
- """Send alert to configured notification channels."""
- for channel in alert_data['channels']:
- if channel == 'slack':
- self._send_slack_alert(alert_data)
- elif channel == 'email':
- self._send_email_alert(alert_data)
- elif channel == 'pagerduty':
- self._send_pagerduty_alert(alert_data)
- elif channel == 'webhook':
- self._send_webhook_alert(alert_data)
-
- def _send_slack_alert(self, alert_data: Dict):
- """Send Slack notification."""
- print(f"๐ฑ Slack alert sent: {alert_data['rule_name']}")
-
- def _send_email_alert(self, alert_data: Dict):
- """Send email notification."""
- print(f"๐ง Email alert sent: {alert_data['rule_name']}")
-
- def _send_pagerduty_alert(self, alert_data: Dict):
- """Send PagerDuty notification."""
- print(f"๐ PagerDuty alert sent: {alert_data['rule_name']}")
-
- def _send_webhook_alert(self, alert_data: Dict):
- """Send webhook notification."""
- print(f"๐ Webhook alert sent: {alert_data['rule_name']}")
-
- def get_alert_summary(self) -> Dict:
- """Get comprehensive alert summary."""
- total_cost = sum(alert['cost'] for alert in self.alert_history)
- alerts_by_priority = {}
-
- for alert in self.alert_history:
- priority = alert['priority']
- if priority not in alerts_by_priority:
- alerts_by_priority[priority] = 0
- alerts_by_priority[priority] += 1
-
- return {
- 'total_alerts': len(self.alert_history),
- 'active_alerts': len(self.active_alerts),
- 'total_cost': total_cost,
- 'alerts_by_priority': alerts_by_priority,
- 'top_triggered_rules': self._get_top_rules(),
- 'average_cost_per_alert': total_cost / max(len(self.alert_history), 1)
- }
-
- def _get_top_rules(self) -> List[Dict]:
- """Get most frequently triggered rules."""
- rule_counts = {}
- for alert in self.alert_history:
- rule = alert['rule_name']
- rule_counts[rule] = rule_counts.get(rule, 0) + 1
-
- return [{'rule': rule, 'count': count}
- for rule, count in sorted(rule_counts.items(),
- key=lambda x: x[1], reverse=True)[:3]]
-
-# Example usage: Production alert setup
-alert_manager = ProductionAlertManager(adapter)
-alert_manager.create_ml_ops_alerts()
-
-# Simulate alert triggers
-alert_manager.trigger_alert("model_accuracy_critical_drop", 0.82, {
- 'model_id': 'fraud-detection-v3',
- 'environment': 'production',
- 'recent_predictions': 15000
-})
-
-alert_manager.trigger_alert("daily_budget_exceeded", 0.95, {
- 'daily_spending': 285.50,
- 'budget_limit': 300.00,
- 'time_remaining': '4 hours'
-})
-
-# Get alert summary
-summary = alert_manager.get_alert_summary()
-print(f"\n๐ Alert Summary:")
-print(f"Total Alerts: {summary['total_alerts']}")
-print(f"Alert Cost: ${summary['total_cost']:.2f}")
-print(f"By Priority: {summary['alerts_by_priority']}")
-```
-
-### Dashboard Integration Patterns
-
-```python
-class ArizeDataSourceIntegration:
- """Integration patterns for popular monitoring dashboards."""
-
- def __init__(self, adapter: GenOpsArizeAdapter):
- self.adapter = adapter
-
- def generate_grafana_dashboard_config(self) -> Dict:
- """Generate Grafana dashboard configuration."""
- return {
- "dashboard": {
- "title": "Arize AI + GenOps Monitoring",
- "tags": ["ml", "arize", "genops", "production"],
- "panels": [
- {
- "title": "Model Performance Metrics",
- "type": "graph",
- "targets": [
- {
- "expr": "arize_model_accuracy",
- "legendFormat": "{{model_id}} Accuracy"
- },
- {
- "expr": "arize_data_drift_score",
- "legendFormat": "{{model_id}} Drift Score"
- }
- ],
- "yAxes": [{"min": 0, "max": 1}],
- "thresholds": [
- {"value": 0.85, "colorMode": "critical", "op": "lt"},
- {"value": 0.20, "colorMode": "critical", "op": "gt", "yAxisId": 1}
- ]
- },
- {
- "title": "Cost Tracking & Budget",
- "type": "stat",
- "targets": [
- {
- "expr": "genops_daily_cost_total",
- "legendFormat": "Daily Spending"
- },
- {
- "expr": "genops_budget_remaining",
- "legendFormat": "Budget Remaining"
- }
- ],
- "fieldConfig": {
- "thresholds": [
- {"color": "green", "value": 0},
- {"color": "yellow", "value": 0.8},
- {"color": "red", "value": 0.95}
- ]
- }
- },
- {
- "title": "Prediction Volume & Latency",
- "type": "graph",
- "targets": [
- {
- "expr": "rate(arize_predictions_total[5m])",
- "legendFormat": "Predictions/sec"
- },
- {
- "expr": "arize_prediction_latency_p95",
- "legendFormat": "P95 Latency (ms)"
- }
- ]
- },
- {
- "title": "Alert Status",
- "type": "table",
- "targets": [
- {
- "expr": "arize_active_alerts",
- "format": "table"
- }
- ]
- }
- ],
- "time": {"from": "now-24h", "to": "now"},
- "refresh": "30s"
- }
- }
-
- def generate_datadog_dashboard_config(self) -> Dict:
- """Generate DataDog dashboard configuration."""
- return {
- "title": "Arize AI ML Monitoring",
- "description": "Comprehensive ML model monitoring with cost governance",
- "template_variables": [
- {
- "name": "model_id",
- "prefix": "model_id",
- "default": "*"
- },
- {
- "name": "environment",
- "prefix": "environment",
- "default": "production"
- }
- ],
- "widgets": [
- {
- "definition": {
- "title": "Model Accuracy Over Time",
- "type": "timeseries",
- "requests": [
- {
- "q": "avg:arize.model.accuracy{$model_id,$environment}",
- "display_type": "line",
- "style": {"palette": "dog_classic"}
- }
- ],
- "markers": [
- {
- "value": "y = 0.85",
- "display_type": "error dashed"
- }
- ]
- }
- },
- {
- "definition": {
- "title": "Cost Governance Overview",
- "type": "query_value",
- "requests": [
- {
- "q": "sum:genops.cost.daily{$model_id,$environment}",
- "aggregator": "last"
- }
- ],
- "custom_links": [
- {
- "label": "Cost Optimization Guide",
- "link": "https://docs.genops.ai/cost-optimization"
- }
- ]
- }
- },
- {
- "definition": {
- "title": "Data Quality Heatmap",
- "type": "heatmap",
- "requests": [
- {
- "q": "avg:arize.data.quality.score{$model_id,$environment} by {feature_name}"
- }
- ]
- }
- }
- ],
- "layout_type": "free"
- }
-
- def setup_prometheus_metrics(self) -> Dict[str, str]:
- """Setup Prometheus metrics collection."""
- return {
- "job_name": "arize-genops-monitoring",
- "metrics_path": "/metrics",
- "scrape_interval": "15s",
- "static_configs": [
- {
- "targets": ["localhost:8080"]
- }
- ],
- "metric_relabel_configs": [
- {
- "source_labels": ["__name__"],
- "regex": "arize_(.*)",
- "target_label": "service",
- "replacement": "arize-ai"
- },
- {
- "source_labels": ["__name__"],
- "regex": "genops_(.*)",
- "target_label": "service",
- "replacement": "genops-governance"
- }
- ]
- }
-
- def create_alertmanager_rules(self) -> Dict:
- """Create Alertmanager rules for Prometheus."""
- return {
- "groups": [
- {
- "name": "arize-ml-alerts",
- "rules": [
- {
- "alert": "ModelAccuracyDrop",
- "expr": "arize_model_accuracy < 0.85",
- "for": "5m",
- "labels": {
- "severity": "critical",
- "service": "arize-ai"
- },
- "annotations": {
- "summary": "Model accuracy below threshold",
- "description": "Model {{$labels.model_id}} accuracy is {{$value}}, below 0.85 threshold"
- }
- },
- {
- "alert": "BudgetThresholdExceeded",
- "expr": "genops_daily_budget_utilization > 0.90",
- "for": "1m",
- "labels": {
- "severity": "warning",
- "service": "genops-governance"
- },
- "annotations": {
- "summary": "Daily budget threshold exceeded",
- "description": "Daily budget utilization is {{$value | humanizePercentage}}"
- }
- }
- ]
- }
- ]
- }
-
-# Example dashboard integration
-dashboard_integration = ArizeDataSourceIntegration(adapter)
-
-# Generate configurations
-grafana_config = dashboard_integration.generate_grafana_dashboard_config()
-datadog_config = dashboard_integration.generate_datadog_dashboard_config()
-prometheus_config = dashboard_integration.setup_prometheus_metrics()
-
-print("๐ Dashboard Integration Configs Generated:")
-print(f"Grafana panels: {len(grafana_config['dashboard']['panels'])}")
-print(f"DataDog widgets: {len(datadog_config['widgets'])}")
-print(f"Prometheus job: {prometheus_config['job_name']}")
-```
-
-### Performance Monitoring Integration
-
-```python
-class PerformanceMonitoringIntegration:
- """Integration with APM tools for ML model performance monitoring."""
-
- def __init__(self, adapter: GenOpsArizeAdapter):
- self.adapter = adapter
- self.performance_metrics = {}
-
- def setup_honeycomb_tracing(self) -> Dict:
- """Setup Honeycomb distributed tracing for ML operations."""
- return {
- "service_name": "arize-ml-monitoring",
- "honeycomb_config": {
- "write_key": "${HONEYCOMB_API_KEY}",
- "dataset": "ml-monitoring",
- "sample_rate": 1
- },
- "custom_fields": [
- "model_id",
- "model_version",
- "environment",
- "team",
- "project",
- "prediction_count",
- "monitoring_cost",
- "data_quality_score"
- ],
- "trace_examples": [
- {
- "operation_name": "model_monitoring_session",
- "duration_ms": 250,
- "custom_fields": {
- "model_id": "fraud-detection-v3",
- "prediction_count": 1500,
- "monitoring_cost": 1.25,
- "data_quality_score": 0.94
- }
- },
- {
- "operation_name": "prediction_batch_logging",
- "duration_ms": 45,
- "custom_fields": {
- "batch_size": 1000,
- "cost_per_prediction": 0.001,
- "latency_p95": 23
- }
- }
- ]
- }
-
- def setup_new_relic_monitoring(self) -> Dict:
- """Setup New Relic monitoring for ML operations."""
- return {
- "app_name": "Arize ML Monitoring",
- "license_key": "${NEW_RELIC_LICENSE_KEY}",
- "custom_events": [
- {
- "eventType": "ModelMonitoringSession",
- "attributes": [
- "modelId", "modelVersion", "environment",
- "predictionCount", "monitoringCost", "sessionDuration",
- "dataQualityScore", "alertsTriggered"
- ]
- },
- {
- "eventType": "MLCostGovernance",
- "attributes": [
- "team", "project", "dailyCost", "budgetUtilization",
- "costPerPrediction", "optimizationOpportunities"
- ]
- }
- ],
- "custom_metrics": [
- {
- "name": "Custom/ML/ModelAccuracy",
- "unit": "ratio"
- },
- {
- "name": "Custom/ML/DataDriftScore",
- "unit": "ratio"
- },
- {
- "name": "Custom/ML/MonitoringCost",
- "unit": "currency"
- }
- ]
- }
-
- def create_slo_definitions(self) -> List[Dict]:
- """Create Service Level Objective definitions for ML systems."""
- return [
- {
- "name": "Model Accuracy SLO",
- "description": "Model accuracy should remain above 85% for 99.5% of time",
- "sli": "arize_model_accuracy",
- "threshold": 0.85,
- "target": 0.995, # 99.5%
- "time_window": "30d",
- "alerting": {
- "error_budget_burn_rate": [
- {"threshold": 0.02, "duration": "1h"}, # 2% error budget in 1 hour
- {"threshold": 0.05, "duration": "6h"} # 5% error budget in 6 hours
- ]
- }
- },
- {
- "name": "Prediction Latency SLO",
- "description": "95% of predictions processed within 100ms",
- "sli": "arize_prediction_latency_p95",
- "threshold": 100, # ms
- "target": 0.95,
- "time_window": "7d"
- },
- {
- "name": "Data Quality SLO",
- "description": "Data quality score above 90% for 99% of time",
- "sli": "arize_data_quality_score",
- "threshold": 0.90,
- "target": 0.99,
- "time_window": "30d"
- },
- {
- "name": "Cost Governance SLO",
- "description": "Daily budget adherence 95% of time",
- "sli": "genops_daily_budget_adherence",
- "threshold": 1.0, # 100% budget adherence
- "target": 0.95,
- "time_window": "30d"
- }
- ]
-
- def generate_sli_queries(self) -> Dict[str, str]:
- """Generate SLI queries for different monitoring systems."""
- return {
- "prometheus": {
- "model_accuracy": """
- sum(rate(arize_model_predictions_correct_total[5m])) /
- sum(rate(arize_model_predictions_total[5m]))
- """,
- "prediction_latency_p95": "histogram_quantile(0.95, arize_prediction_duration_seconds)",
- "data_quality_score": "avg(arize_data_quality_score)",
- "budget_adherence": "genops_daily_spending / genops_daily_budget_limit"
- },
- "datadog": {
- "model_accuracy": "sum:arize.predictions.correct{*}.as_rate() / sum:arize.predictions.total{*}.as_rate()",
- "prediction_latency_p95": "p95:arize.prediction.duration{*}",
- "data_quality_score": "avg:arize.data.quality.score{*}",
- "budget_adherence": "sum:genops.daily.spending{*} / sum:genops.daily.budget{*}"
- }
- }
-
-# Example performance monitoring setup
-perf_monitoring = PerformanceMonitoringIntegration(adapter)
-
-# Generate monitoring configurations
-honeycomb_config = perf_monitoring.setup_honeycomb_tracing()
-newrelic_config = perf_monitoring.setup_new_relic_monitoring()
-slo_definitions = perf_monitoring.create_slo_definitions()
-sli_queries = perf_monitoring.generate_sli_queries()
-
-print("๐ฏ Performance Monitoring Setup:")
-print(f"Honeycomb custom fields: {len(honeycomb_config['custom_fields'])}")
-print(f"New Relic custom events: {len(newrelic_config['custom_events'])}")
-print(f"SLO definitions: {len(slo_definitions)}")
-print(f"SLI query systems: {list(sli_queries.keys())}")
-
-# Display SLO examples
-for slo in slo_definitions[:2]: # First 2 SLOs
- print(f"\n๐ SLO: {slo['name']}")
- print(f" Target: {slo['target']*100}% over {slo['time_window']}")
- print(f" Threshold: {slo['threshold']}")
-```
-
-## Validation and Troubleshooting
-
-### Setup Validation
-
-```python
-from genops.providers.arize_validation import validate_setup, print_validation_result
-
-# Comprehensive setup validation
-result = validate_setup()
-print_validation_result(result)
-
-# Expected output:
-# โ
Overall Status: SUCCESS
-# ๐ Validation Summary:
-# โข SDK Installation: 0 issues
-# โข Authentication: 0 issues
-# โข Configuration: 0 issues
-# โข Governance: 1 issues
-# ๐ก Recommendations:
-# 1. All validation checks passed successfully!
-# ๐ Next Steps:
-# 1. You can now use GenOps Arize integration with confidence
-```
-
-### Manual Validation Components
-
-```python
-from genops.providers.arize_validation import ArizeSetupValidator
-
-validator = ArizeSetupValidator(verbose=True)
-
-# Validate specific components
-sdk_result = validator.validate_sdk_installation()
-auth_result = validator.validate_authentication()
-config_result = validator.validate_governance_configuration(
- team="ml-platform",
- project="fraud-detection"
-)
-
-# Runtime health check
-health_result = validator.perform_health_check()
-
-# Display results
-for result in [sdk_result, auth_result, config_result, health_result]:
- validator.print_validation_result(result)
-```
-
-### Troubleshooting Decision Trees
-
-#### ๐จ Problem: "Cannot Import Arize AI SDK"
-
-```
-Error: ImportError: No module named 'arize'
- โ
- โโ Check Python environment
- โ โโ โ
Virtual environment active?
- โ โ โโ pip install arize>=6.0.0 genops[arize]
- โ โ
- โ โโ โ Wrong Python version?
- โ โ โโ Requires Python 3.8+ โ upgrade Python
- โ โ
- โ โโ โ Package conflicts?
- โ โโ pip install --upgrade --force-reinstall arize
- โ
- โโ Alternative installation methods
- โ โโ conda install -c conda-forge arize
- โ โโ pip install --user arize (user install)
- โ โโ poetry add arize (Poetry projects)
- โ
- โโ Still failing?
- โโ Check system PATH and Python installation
-```
-
-#### ๐ Problem: "Authentication Failed"
-
-```
-Error: Authentication failed / Invalid API credentials
- โ
- โโ Verify credentials exist
- โ โโ echo $ARIZE_API_KEY (should show key)
- โ โโ echo $ARIZE_SPACE_KEY (should show space)
- โ โโ โ Empty? โ Set environment variables:
- โ export ARIZE_API_KEY="your-api-key"
- โ export ARIZE_SPACE_KEY="your-space-key"
- โ
- โโ Validate credential format
- โ โโ API Key: Should be 32+ character string
- โ โโ Space Key: Should be UUID format
- โ โโ โ Wrong format? โ Get new credentials from Arize dashboard
- โ
- โโ Test network connectivity
- โ โโ curl -I https://app.arize.com
- โ โโ โ Connection failed? โ Check firewall/proxy settings
- โ
- โโ Advanced troubleshooting
- โโ python -c "from arize.utils.logging import log_schema; log_schema()"
- โโ Contact Arize support with error details
-```
-
-#### ๐ฐ Problem: "Budget Exceeded" / Cost Issues
-
-```
-Error: Monitoring session would exceed daily budget
- โ
- โโ Check current usage
- โ โโ Run: python -c "from genops.providers.arize import get_current_adapter; print(get_current_adapter().get_metrics())"
- โ โโ Review daily/monthly cost trends
- โ
- โโ Immediate solutions
- โ โโ Increase budget limit:
- โ โ adapter = GenOpsArizeAdapter(daily_budget_limit=200.0)
- โ โ
- โ โโ Switch to advisory mode:
- โ โ adapter = GenOpsArizeAdapter(governance_policy="advisory")
- โ โ
- โ โโ Implement sampling:
- โ if random.random() < 0.1: # Log 10% of predictions
- โ arize_client.log(prediction)
- โ
- โโ Long-term optimization
- โ โโ Run cost optimization analysis:
- โ โ python examples/arize/cost_optimization.py
- โ โ
- โ โโ Review alert frequency and thresholds
- โ โโ Implement batch processing for high-volume scenarios
- โ
- โโ Enterprise solutions
- โโ Multi-tier budget allocation by model importance
- โโ Dynamic sampling based on remaining budget
- โโ Contact GenOps for enterprise budget management
-```
-
-#### ๐ Problem: "Network/Connection Issues"
-
-```
-Error: Connection timeout / Network unreachable
- โ
- โโ Basic connectivity check
- โ โโ ping app.arize.com
- โ โโ curl -I https://app.arize.com
- โ โโ โ Failed? โ Check internet connection
- โ
- โโ Proxy/Firewall configuration
- โ โโ Corporate network?
- โ โ โโ Set HTTP_PROXY and HTTPS_PROXY
- โ โ โโ Add *.arize.com to firewall allowlist
- โ โ โโ Contact IT for port 443/80 access
- โ โ
- โ โโ VPN issues?
- โ โโ Try connection with/without VPN
- โ
- โโ DNS resolution
- โ โโ nslookup app.arize.com
- โ โโ โ DNS failed? โ Try alternate DNS (8.8.8.8)
- โ
- โโ SSL/TLS issues
- โโ openssl s_client -connect app.arize.com:443
- โโ Check certificate chain validity
- โโ Update CA certificates if needed
-```
-
-#### ๐ Problem: "Data/Predictions Not Appearing"
-
-```
-Error: Predictions logged but not visible in Arize dashboard
- โ
- โโ Verify logging success
- โ โโ Check response status codes
- โ โโ Look for error messages in logs
- โ โโ Enable debug logging:
- โ logging.getLogger('arize').setLevel(logging.DEBUG)
- โ
- โโ Data format validation
- โ โโ prediction_id: Must be unique string
- โ โโ model_id: Must match dashboard configuration
- โ โโ model_version: Must be consistent
- โ โโ timestamp: Must be valid datetime
- โ
- โโ Dashboard configuration
- โ โโ Check model exists in Arize dashboard
- โ โโ Verify space configuration
- โ โโ Check data retention settings
- โ โโ Review model schema alignment
- โ
- โโ Timing issues
- โโ Allow 2-5 minutes for data ingestion
- โโ Check dashboard time range filters
- โโ Verify timezone configuration
-```
-
-#### โก Problem: "Performance/Speed Issues"
-
-```
-Error: Slow monitoring operations / High latency
- โ
- โโ Identify bottlenecks
- โ โโ Network latency (ping times to Arize)
- โ โโ Large payload sizes (reduce data volume)
- โ โโ High frequency logging (implement batching)
- โ
- โโ Optimization strategies
- โ โโ Batch predictions:
- โ โ session.log_prediction_batch(df, batch_size=1000)
- โ โ
- โ โโ Async logging:
- โ โ Use async Arize client if available
- โ โ
- โ โโ Reduce data quality checks frequency
- โ โโ Implement intelligent sampling
- โ
- โโ Resource optimization
- โ โโ Monitor memory usage during bulk operations
- โ โโ Use streaming for large datasets
- โ โโ Configure appropriate timeout values
- โ
- โโ Enterprise solutions
- โโ Dedicated Arize instance for high-volume workloads
- โโ Regional deployment optimization
- โโ Contact Arize for performance consultation
-```
-
-#### ๐ง Problem: "GenOps Governance Issues"
-
-```
-Error: Governance tracking not working / Missing cost attribution
- โ
- โโ Verify GenOps configuration
- โ โโ Check GENOPS_TEAM environment variable
- โ โโ Check GENOPS_PROJECT environment variable
- โ โโ Validate adapter initialization:
- โ โ adapter = GenOpsArizeAdapter(
- โ โ team="your-team",
- โ โ project="your-project",
- โ โ enable_governance=True
- โ โ )
- โ โ
- โ โโ Run setup validation:
- โ python -c "from genops.providers.arize_validation import validate_setup; validate_setup()"
- โ
- โโ Cost tracking issues
- โ โโ Enable cost tracking explicitly:
- โ โ adapter = GenOpsArizeAdapter(enable_cost_alerts=True)
- โ โ
- โ โโ Check telemetry export:
- โ โ Verify OTLP endpoint configuration
- โ โ
- โ โโ Review cost calculation methods:
- โ adapter.get_metrics()
- โ
- โโ Telemetry export problems
- โ โโ OTEL_EXPORTER_OTLP_ENDPOINT configured?
- โ โโ OTEL_EXPORTER_OTLP_HEADERS authentication?
- โ โโ Check observability platform connectivity
- โ
- โโ Advanced debugging
- โโ Enable debug mode: GENOPS_DEBUG=true
- โโ Check span creation and attribute attachment
- โโ Verify OpenTelemetry instrumentation setup
-```
-
-### Quick Diagnostic Commands
-
-```bash
-# Complete system health check
-python -c "
-from genops.providers.arize_validation import validate_setup, print_validation_result
-result = validate_setup()
-print_validation_result(result)
-"
-
-# Check current cost usage
-python -c "
-from genops.providers.arize import get_current_adapter
-adapter = get_current_adapter()
-if adapter:
- metrics = adapter.get_metrics()
- print(f'Daily usage: ${metrics[\"daily_usage\"]:.2f}')
- print(f'Budget remaining: ${metrics[\"budget_remaining\"]:.2f}')
-else:
- print('No active adapter found')
-"
-
-# Test basic connectivity
-python -c "
-import requests
-response = requests.get('https://app.arize.com', timeout=10)
-print(f'Arize connectivity: {response.status_code}')
-"
-
-# Validate environment setup
-python -c "
-import os
-required_vars = ['ARIZE_API_KEY', 'ARIZE_SPACE_KEY']
-for var in required_vars:
- value = os.getenv(var)
- status = 'โ
' if value else 'โ'
- display = f'{value[:8]}...' if value else 'Not set'
- print(f'{status} {var}: {display}')
-"
-```
-
-### Getting Help
-
-#### Self-Service Resources
-1. **Run validation first**: `python examples/arize/setup_validation.py`
-2. **Check examples**: All examples in `examples/arize/` are tested and working
-3. **Review documentation**: This guide covers most common scenarios
-4. **Enable debug logging**: Set `GENOPS_DEBUG=true` for detailed diagnostics
-
-#### Community Support
-- **GitHub Issues**: [Report bugs and feature requests](https://github.com/KoshiHQ/GenOps-AI/issues)
-- **Discussions**: [Community Q&A and best practices](https://github.com/KoshiHQ/GenOps-AI/discussions)
-- **Arize Community**: [Arize Slack workspace](https://arize-ai.slack.com)
-
-#### Enterprise Support
-- **Email**: support@genops.ai
-- **Professional Services**: Custom integration assistance
-- **Training**: Team onboarding and best practices workshops
-- **Priority Support**: SLA-backed issue resolution for enterprise customers
-
-#### When Creating Support Requests
-
-**Include this diagnostic information:**
-```bash
-# System information
-python --version
-pip show genops arize
-echo "OS: $(uname -s -r)"
-
-# Configuration (sanitized)
-echo "Environment variables:"
-env | grep -E "(GENOPS|ARIZE|OTEL)" | sed 's/=.*/=***hidden***/'
-
-# Validation results
-python -c "
-from genops.providers.arize_validation import validate_setup, print_validation_result
-result = validate_setup()
-print_validation_result(result)
-"
-```
-
-## Performance Considerations
-
-### High-Volume Optimization
-
-For high-volume monitoring scenarios (>1M predictions/day):
-
-```python
-# Use batched logging and sampling
-adapter = GenOpsArizeAdapter(
- # Enable cost optimization features
- enable_cost_alerts=True,
- daily_budget_limit=200.0
-)
-
-# Implement sampling for cost optimization
-import random
-
-def should_log_prediction(sampling_rate=0.1):
- """Sample predictions to reduce logging costs."""
- return random.random() < sampling_rate
-
-# Log only sampled predictions
-for prediction in high_volume_predictions:
- if should_log_prediction(sampling_rate=0.05): # Log 5% of predictions
- arize_client.log(prediction)
-```
-
-### Cost-Aware Monitoring
-
-```python
-# Monitor cost usage and adjust behavior dynamically
-metrics = adapter.get_metrics()
-current_usage = metrics['daily_usage']
-budget_remaining = metrics['budget_remaining']
-
-# Implement dynamic sampling based on budget remaining
-if budget_remaining < 10.0: # Less than $10 remaining
- sampling_rate = 0.01 # Reduce to 1% sampling
-elif budget_remaining < 25.0: # Less than $25 remaining
- sampling_rate = 0.05 # Reduce to 5% sampling
-else:
- sampling_rate = 0.10 # Normal 10% sampling
-
-print(f"Current Usage: ${current_usage:.2f}")
-print(f"Budget Remaining: ${budget_remaining:.2f}")
-print(f"Active Sampling Rate: {sampling_rate*100:.1f}%")
-```
-
-## Integration Examples
-
-### Flask/FastAPI Web Application
-
-```python
-from flask import Flask, request, jsonify
-from genops.providers.arize import auto_instrument
-
-app = Flask(__name__)
-
-# Enable Arize governance for the entire application
-auto_instrument(
- team="web-api-team",
- project="prediction-service",
- environment="production"
-)
-
-@app.route('/predict', methods=['POST'])
-def predict():
- data = request.json
-
- # Your prediction logic here
- prediction = model.predict(data['features'])
-
- # This is automatically tracked by GenOps
- arize_client.log(
- prediction_id=data['prediction_id'],
- prediction_label=prediction,
- model_id="production-model",
- model_version="1.0"
- )
-
- return jsonify({'prediction': prediction})
-```
-
-### Jupyter Notebook Analysis
-
-```python
-# Notebook: Model Monitoring Analysis
-import pandas as pd
-from genops.providers.arize import GenOpsArizeAdapter
-
-# Initialize adapter for notebook environment
-adapter = GenOpsArizeAdapter(
- team="data-science",
- project="model-analysis",
- environment="development",
- daily_budget_limit=20.0
-)
-
-# Load and analyze monitoring data
-with adapter.track_model_monitoring_session("analysis-session") as session:
- # Load prediction data
- predictions_df = pd.read_csv('model_predictions.csv')
-
- # Log batch predictions with cost tracking
- session.log_prediction_batch(predictions_df, cost_per_prediction=0.001)
-
- # Analyze data quality
- quality_metrics = {
- 'missing_values_pct': predictions_df.isnull().sum().sum() / len(predictions_df),
- 'duplicate_records': predictions_df.duplicated().sum(),
- 'outlier_count': detect_outliers(predictions_df)
- }
-
- session.log_data_quality_metrics(quality_metrics, cost_estimate=0.05)
-
- print(f"Analysis complete. Session cost: ${session.estimated_cost:.2f}")
-```
-
-### Batch Processing Pipeline
-
-```python
-import schedule
-import time
-from genops.providers.arize import GenOpsArizeAdapter
-
-# Scheduled batch monitoring with governance
-def run_daily_monitoring():
- adapter = GenOpsArizeAdapter(
- team="ml-ops",
- project="batch-monitoring",
- environment="production",
- daily_budget_limit=75.0
- )
-
- with adapter.track_model_monitoring_session("daily-batch") as session:
- # Load daily predictions
- daily_predictions = load_daily_predictions()
-
- # Process in chunks to manage costs
- chunk_size = 10000
- for chunk in chunked(daily_predictions, chunk_size):
- session.log_prediction_batch(
- chunk,
- cost_per_prediction=0.0005
- )
-
- # Check budget remaining
- if session.estimated_cost > 25.0: # Stop if approaching limit
- logger.warning("Approaching cost limit, stopping batch processing")
- break
-
- # Generate daily quality report
- quality_report = generate_quality_report(daily_predictions)
- session.log_data_quality_metrics(quality_report, cost_estimate=0.10)
-
- print(f"Daily monitoring complete. Total cost: ${session.estimated_cost:.2f}")
-
-# Schedule daily monitoring
-schedule.every().day.at("02:00").do(run_daily_monitoring)
-
-while True:
- schedule.run_pending()
- time.sleep(3600) # Check every hour
-```
-
-## Best Practices
-
-### 1. Cost Management
-- Set appropriate budget limits for each environment
-- Use sampling for high-volume scenarios
-- Monitor cost trends and optimize regularly
-- Implement dynamic sampling based on budget remaining
-
-### 2. Governance Configuration
-- Always set team and project attributes for proper attribution
-- Use environment-specific policies (advisory for dev, enforced for prod)
-- Configure cost alerts to prevent budget overruns
-- Regular validation of setup and configuration
-
-### 3. Performance Optimization
-- Use batch logging for multiple predictions
-- Implement prediction sampling for cost optimization
-- Monitor session costs and adjust behavior dynamically
-- Cache expensive operations where appropriate
-
-### 4. Security and Compliance
-- Store API keys securely using environment variables
-- Use governance metadata for compliance tracking
-- Implement proper access controls for different environments
-- Regular audit of governance policies and compliance
-
-## Support and Resources
-
-### Documentation Links
-- [Arize AI Documentation](https://docs.arize.com/)
-- [Arize Python SDK Reference](https://docs.arize.com/arize/sdks/python-sdk)
-- [GenOps Core Documentation](../README.md)
-- [OpenTelemetry Specifications](https://opentelemetry.io/docs/)
-
-### Community Support
-- [GenOps GitHub Issues](https://github.com/KoshiHQ/GenOps-AI/issues)
-- [GenOps Discussions](https://github.com/KoshiHQ/GenOps-AI/discussions)
-- [Arize Community Slack](https://arize-ai.slack.com)
-
-### Enterprise Support
-- Professional services for enterprise deployments
-- Custom governance policy development
-- Integration with existing observability stacks
-- Training and onboarding for teams
-
----
-
-Ready to get started? Follow our [Quick Start Guide](#quick-start) or try the [5-minute integration example](../examples/arize/README.md).
\ No newline at end of file
diff --git a/docs/integrations/autogen.md b/docs/integrations/autogen.md
deleted file mode 100644
index 07886e9..0000000
--- a/docs/integrations/autogen.md
+++ /dev/null
@@ -1,703 +0,0 @@
-# AutoGen + GenOps: Comprehensive Integration Guide
-
-**Add enterprise-grade governance to your AutoGen multi-agent conversations in under 3 minutes with zero code changes.**
-
-Turn your AutoGen applications into cost-aware, compliant, and optimized multi-agent systems with comprehensive tracking across all LLM providers.
-
-## Table of Contents
-
-- [Quick Start (3 Minutes)](#quick-start-3-minutes) - Get started immediately
-- [What You Get](#core-concepts) - Benefits and capabilities
-- [How to Use It](#integration-patterns) - Different ways to integrate
-- [Advanced Features](#advanced-features) - Cost optimization and monitoring
-- [Production Deployment](#production-deployment) - Enterprise patterns
-- [Performance & Scaling](#performance--scaling) - Optimization strategies
-- [Troubleshooting](#troubleshooting) - Common issues and solutions
-- [Complete API Reference](#api-reference) - Technical documentation
-
----
-
-## Quick Start (3 Minutes)
-
-### 1. Installation (30 seconds)
-
-```bash
-pip install genops[autogen]
-```
-
-### 2. Validation (30 seconds)
-
-```python
-from genops.providers.autogen import quick_validate
-result = quick_validate()
-print("โ
Ready!" if result else "โ Issues found")
-```
-
-### 3. Enable Governance (1 line)
-
-```python
-# Add this ONE line to any AutoGen script
-from genops.providers.autogen import enable_governance; enable_governance()
-
-# Your existing AutoGen code works unchanged
-import autogen
-assistant = autogen.AssistantAgent(name="assistant", llm_config=config)
-user_proxy.initiate_chat(assistant, message="Hello!")
-# โ Now tracked with comprehensive governance!
-```
-
-**๐ That's it!** You now have enterprise-grade AutoGen governance.
-
----
-
-## What You Get
-
-### Enterprise-Grade AutoGen Governance
-
-Transform your AutoGen multi-agent conversations with comprehensive tracking and control:
-
-**๐ฐ Financial Control**
-- **Real-time cost tracking** across OpenAI, Anthropic, Google, and all LLM providers
-- **Budget monitoring** with automatic alerts and spending limits
-- **Cost attribution** by team, project, and customer for accurate billing
-
-**๐ Performance Insights**
-- **Conversation analytics** with turn-by-turn analysis and quality metrics
-- **Agent performance monitoring** with individual optimization recommendations
-- **Multi-agent coordination** tracking for group chat efficiency
-
-**๐ Enterprise Compliance**
-- **OpenTelemetry-standard telemetry** for seamless observability integration
-- **Audit trails** with complete conversation logging and attribution
-- **Policy enforcement** with automated governance controls
-
-### How It Works (Technical Components)
-
-The integration uses five key components working together:
-1. **Adapter** - Main integration class for your AutoGen applications
-2. **Cost Aggregator** - Multi-provider cost calculation and optimization
-3. **Conversation Monitor** - Real-time flow analysis and performance metrics
-4. **Auto-Instrumentation** - Zero-code setup that works with existing applications
-5. **Validation System** - Comprehensive diagnostics and troubleshooting
-
----
-
-## How to Use It
-
-### Pattern 1: Zero-Code Auto-Instrumentation
-
-**Best for**: Existing AutoGen applications, quick setup, minimal changes
-
-```python
-from genops.providers.autogen import enable_governance
-enable_governance()
-
-# All your existing AutoGen code now has governance
-# No other changes needed!
-```
-
-**Advantages**:
-- Zero code changes to existing AutoGen
-- Automatic detection and instrumentation
-- Works with any AutoGen pattern
-
-### Pattern 2: Manual Adapter Configuration
-
-**Best for**: Custom governance settings, team/project specific configuration
-
-```python
-from genops.providers.autogen import GenOpsAutoGenAdapter
-
-adapter = GenOpsAutoGenAdapter(
- team="ai-research",
- project="customer-service",
- environment="production",
- daily_budget_limit=100.0,
- governance_policy="enforced"
-)
-
-# Then instrument your agents
-assistant = adapter.instrument_agent(assistant, "customer_assistant")
-```
-
-**Advantages**:
-- Full control over governance settings
-- Custom budget limits and policies
-- Detailed configuration options
-
-### Pattern 3: Context Manager Tracking
-
-**Best for**: Granular conversation tracking, detailed analytics
-
-```python
-with adapter.track_conversation("customer-inquiry") as context:
- response = user_proxy.initiate_chat(assistant, message="Help request")
-
- # Real-time cost and metrics available
- print(f"Cost: ${context.total_cost:.6f}")
- print(f"Turns: {context.turns_count}")
-```
-
-**Advantages**:
-- Conversation-level cost attribution
-- Real-time metrics during execution
-- Granular tracking control
-
-### Pattern 4: Group Chat Monitoring
-
-**Best for**: Multi-agent group conversations, team coordination tracking
-
-```python
-with adapter.track_group_chat("research-team", participants=agent_names) as context:
- result = group_chat_manager.run_chat(messages)
-
- # Group dynamics and coordination metrics
- print(f"Participants: {len(context.participants)}")
- print(f"Speaker transitions: {context.speaker_transitions}")
-```
-
-**Advantages**:
-- Multi-agent coordination tracking
-- Speaker transition analysis
-- Group dynamics insights
-
----
-
-## Advanced Features
-
-### Multi-Provider Cost Optimization
-
-Automatically optimize costs across multiple LLM providers:
-
-```python
-from genops.providers.autogen import analyze_conversation_costs
-
-analysis = analyze_conversation_costs(adapter, time_period_hours=24)
-
-for recommendation in analysis['recommendations']:
- print(f"๐ก {recommendation['reasoning']}")
- print(f" Potential savings: ${recommendation['potential_savings']:.4f}")
-```
-
-### Real-Time Budget Monitoring
-
-Set spending limits and get automatic alerts:
-
-```python
-adapter = GenOpsAutoGenAdapter(
- team="marketing",
- project="campaign-bots",
- daily_budget_limit=50.0, # $50/day limit
- governance_policy="enforced" # Hard limit
-)
-
-# Budget validation before expensive operations
-if adapter.validate_budget(estimated_cost):
- # Proceed with conversation
- pass
-else:
- print("โ ๏ธ Budget limit would be exceeded")
-```
-
-### Performance Analytics
-
-Get detailed performance insights for optimization:
-
-```python
-from genops.providers.autogen import get_conversation_insights
-
-insights = get_conversation_insights(monitor, "conversation-id")
-
-print(f"Quality score: {insights['conversation_quality_score']:.2f}")
-print(f"Avg response time: {insights['avg_response_time_ms']:.1f}ms")
-print(f"Efficiency score: {insights['efficiency_score']:.2f}")
-```
-
-### Custom Governance Policies
-
-Implement custom rules and controls:
-
-```python
-adapter = GenOpsAutoGenAdapter(
- team="legal-review",
- project="contract-analysis",
- governance_policy="custom",
- custom_policies={
- "max_conversation_turns": 10,
- "require_human_approval": True,
- "log_all_interactions": True
- }
-)
-```
-
----
-
-## Production Deployment
-
-### Environment Configuration
-
-**Development Environment:**
-```bash
-export GENOPS_TEAM=dev-team
-export GENOPS_PROJECT=autogen-dev
-export GENOPS_ENVIRONMENT=development
-export GENOPS_BUDGET_LIMIT=10.0
-```
-
-**Production Environment:**
-```bash
-export GENOPS_TEAM=prod-ai-team
-export GENOPS_PROJECT=customer-service
-export GENOPS_ENVIRONMENT=production
-export GENOPS_BUDGET_LIMIT=1000.0
-export GENOPS_GOVERNANCE_POLICY=enforced
-```
-
-### Docker Deployment
-
-```dockerfile
-FROM python:3.9
-
-# Install dependencies
-COPY requirements.txt .
-RUN pip install -r requirements.txt
-
-# Install AutoGen + GenOps
-RUN pip install genops[autogen]
-
-# Copy application
-COPY . /app
-WORKDIR /app
-
-# Environment variables
-ENV GENOPS_TEAM=production-team
-ENV GENOPS_PROJECT=autogen-service
-ENV GENOPS_ENVIRONMENT=production
-
-# Validate setup on startup
-RUN python -c "from genops.providers.autogen import quick_validate; assert quick_validate()"
-
-CMD ["python", "app.py"]
-```
-
-### Kubernetes Deployment
-
-```yaml
-apiVersion: apps/v1
-kind: Deployment
-metadata:
- name: autogen-service
-spec:
- replicas: 3
- selector:
- matchLabels:
- app: autogen-service
- template:
- metadata:
- labels:
- app: autogen-service
- spec:
- containers:
- - name: autogen-app
- image: autogen-service:latest
- env:
- - name: GENOPS_TEAM
- value: "k8s-ai-team"
- - name: GENOPS_PROJECT
- value: "autogen-service"
- - name: GENOPS_ENVIRONMENT
- value: "kubernetes"
- - name: GENOPS_BUDGET_LIMIT
- value: "500.0"
- - name: OPENAI_API_KEY
- valueFrom:
- secretKeyRef:
- name: api-secrets
- key: openai-key
- resources:
- limits:
- memory: "1Gi"
- cpu: "500m"
- readinessProbe:
- exec:
- command:
- - python
- - -c
- - "from genops.providers.autogen import quick_validate; exit(0 if quick_validate() else 1)"
- initialDelaySeconds: 10
- periodSeconds: 30
-```
-
-### Observability Integration
-
-**Datadog Integration:**
-```python
-from opentelemetry.exporter.datadog import DatadogExporter
-from opentelemetry import trace
-
-# Configure Datadog exporter for GenOps telemetry
-trace.get_tracer_provider().add_span_processor(
- DatadogExporter(
- agent_url="http://datadog-agent:8126",
- service="autogen-governance"
- )
-)
-
-# GenOps telemetry automatically flows to Datadog
-enable_governance()
-```
-
-**Grafana + Tempo Integration:**
-```python
-from opentelemetry.exporter.jaeger.thrift import JaegerExporter
-from opentelemetry.sdk.trace.export import BatchSpanProcessor
-
-# Configure for Grafana Tempo
-jaeger_exporter = JaegerExporter(
- agent_host_name="tempo",
- agent_port=14268,
- collector_endpoint="http://tempo:14268/api/traces",
-)
-
-trace.get_tracer_provider().add_span_processor(
- BatchSpanProcessor(jaeger_exporter)
-)
-```
-
----
-
-## Performance & Scaling
-
-### Benchmarks
-
-| Scenario | Overhead | Throughput Impact | Memory Usage |
-|----------|----------|-------------------|--------------|
-| Single conversation | <5ms | <2% | +15MB |
-| Group chat (5 agents) | <15ms | <5% | +45MB |
-| High volume (1000/min) | <2ms avg | <1% | +200MB |
-| Enterprise (10K/hr) | <1ms avg | <0.5% | +500MB |
-
-### Scaling Recommendations
-
-**Small Deployments (< 100 conversations/day):**
-```python
-# Minimal configuration
-enable_governance() # Uses defaults, minimal overhead
-```
-
-**Medium Deployments (100-10K conversations/day):**
-```python
-adapter = GenOpsAutoGenAdapter(
- daily_budget_limit=500.0,
- enable_conversation_tracking=True,
- enable_agent_tracking=True,
- max_concurrent_conversations=50
-)
-```
-
-**Large Deployments (10K+ conversations/day):**
-```python
-adapter = GenOpsAutoGenAdapter(
- daily_budget_limit=5000.0,
- enable_conversation_tracking=True,
- enable_agent_tracking=False, # Reduce overhead
- max_concurrent_conversations=200,
- sampling_rate=0.1 # Sample 10% for detailed tracking
-)
-```
-
-### Performance Optimization
-
-**1. Sampling Configuration:**
-```python
-# Track 10% of conversations in detail, 100% for costs
-adapter = GenOpsAutoGenAdapter(
- conversation_sampling_rate=0.1,
- cost_tracking_rate=1.0 # Always track costs
-)
-```
-
-**2. Async Telemetry Export:**
-```python
-# Minimize application blocking
-from opentelemetry.sdk.trace.export import BatchSpanProcessor
-
-processor = BatchSpanProcessor(
- exporter,
- max_queue_size=2048,
- schedule_delay_millis=5000, # Batch every 5 seconds
- max_export_batch_size=512
-)
-```
-
-**3. Circuit Breaker Pattern:**
-```python
-adapter = GenOpsAutoGenAdapter(
- enable_circuit_breaker=True,
- circuit_breaker_threshold=0.1, # 10% failure rate
- circuit_breaker_timeout=30 # 30 second recovery
-)
-```
-
----
-
-## Troubleshooting
-
-### Top 10 Common Issues
-
-#### 1. **AutoGen Not Installed**
-```
-โ ImportError: No module named 'autogen'
-```
-**Fix:** `pip install pyautogen` (not `autogen`)
-
-#### 2. **API Key Format Issues**
-```
-โ Invalid API Key Format: OPENAI_API_KEY
-```
-**Fix:** OpenAI keys start with `sk-`, Anthropic with `sk-ant-`
-
-#### 3. **Wrong AutoGen Package**
-```
-โ AttributeError: module 'autogen' has no attribute 'AssistantAgent'
-```
-**Fix:** `pip uninstall autogen && pip install pyautogen`
-
-#### 4. **GenOps Import Errors**
-```
-โ ImportError: No module named 'genops.providers.autogen'
-```
-**Fix:** `pip install genops` or `pip install genops[autogen]`
-
-#### 5. **Virtual Environment Issues**
-```
-โ Package conflicts or import errors
-```
-**Fix:** Use virtual environment: `python -m venv venv && source venv/bin/activate`
-
-#### 6. **Proxy Configuration Problems**
-```
-โ Connection timeout errors
-```
-**Fix:** Configure `NO_PROXY` or proxy settings for API endpoints
-
-#### 7. **Budget Limit Exceeded**
-```
-โ Budget limit would be exceeded
-```
-**Fix:** Increase limit or check usage: `adapter.get_session_summary()`
-
-#### 8. **Docker Permission Issues**
-```
-โ Docker permission denied for code execution
-```
-**Fix:** Add user to docker group or use `use_docker=False`
-
-#### 9. **Telemetry Export Failures**
-```
-โ OTLP export failed
-```
-**Fix:** Check observability platform configuration and connectivity
-
-#### 10. **Performance Degradation**
-```
-โ Slow response times
-```
-**Fix:** Reduce sampling rate or disable detailed tracking for high volume
-
-### Diagnostic Commands
-
-**Complete Setup Validation:**
-```bash
-python -c "
-from genops.providers.autogen import validate_autogen_setup, print_validation_result
-result = validate_autogen_setup(verify_connectivity=True, run_performance_tests=True)
-print_validation_result(result, verbose=True)
-"
-```
-
-**Quick Health Check:**
-```python
-from genops.providers.autogen import quick_validate, get_instrumentation_stats
-
-print("โ
Ready!" if quick_validate() else "โ Issues")
-print("Stats:", get_instrumentation_stats())
-```
-
-**Performance Profiling:**
-```python
-import time
-from genops.providers.autogen import GenOpsAutoGenAdapter
-
-start = time.time()
-adapter = GenOpsAutoGenAdapter()
-print(f"Adapter creation: {(time.time() - start)*1000:.1f}ms")
-```
-
----
-
-## API Reference
-
-### Core Classes
-
-#### `GenOpsAutoGenAdapter`
-
-Main adapter class for AutoGen governance.
-
-```python
-class GenOpsAutoGenAdapter:
- def __init__(
- self,
- team: str = "default-team",
- project: str = "autogen-app",
- environment: str = "development",
- daily_budget_limit: float = 100.0,
- governance_policy: str = "advisory",
- enable_conversation_tracking: bool = True,
- enable_agent_tracking: bool = True,
- enable_cost_tracking: bool = True
- )
-```
-
-**Methods:**
-- `track_conversation(conversation_id, participants)` - Track conversation
-- `track_group_chat(group_chat_id, participants)` - Track group chat
-- `instrument_agent(agent, agent_name)` - Instrument individual agent
-- `get_session_summary()` - Get session analytics
-- `validate_budget(cost)` - Check budget before operation
-
-### Convenience Functions
-
-#### `enable_governance(**kwargs)`
-
-Ultra-simple one-line setup.
-
-```python
-def enable_governance(
- team: str = None, # Auto-detects from env
- project: str = None, # Auto-detects from env
- daily_budget_limit: float = None # Auto-detects from env
-) -> GenOpsAutoGenAdapter
-```
-
-#### `auto_instrument(**kwargs)`
-
-Zero-code instrumentation with full configuration.
-
-```python
-def auto_instrument(
- team: str = "default-team",
- project: str = "autogen-app",
- environment: str = "development",
- daily_budget_limit: float = 100.0,
- governance_policy: str = "advisory"
-) -> GenOpsAutoGenAdapter
-```
-
-### Validation Functions
-
-#### `validate_autogen_setup(**kwargs)`
-
-Comprehensive environment validation.
-
-```python
-def validate_autogen_setup(
- team: str = "default-team",
- project: str = "autogen-validation",
- check_models: List[str] = None,
- verify_connectivity: bool = True,
- run_performance_tests: bool = False,
- api_timeout_seconds: int = 10
-) -> ValidationResult
-```
-
-#### `quick_validate()`
-
-Ultra-fast validation for CI/CD.
-
-```python
-def quick_validate() -> bool
-```
-
-### Cost Analysis
-
-#### `analyze_conversation_costs(adapter, time_period_hours)`
-
-Get cost analysis and optimization recommendations.
-
-```python
-def analyze_conversation_costs(
- adapter: GenOpsAutoGenAdapter,
- time_period_hours: int = 24
-) -> Dict[str, Any]
-```
-
-**Returns:**
-```python
-{
- "total_cost": float,
- "cost_by_provider": Dict[str, float],
- "cost_by_agent": Dict[str, float],
- "recommendations": List[Dict],
- "provider_summaries": Dict
-}
-```
-
-### Data Classes
-
-#### `ValidationResult`
-
-```python
-@dataclass
-class ValidationResult:
- success: bool
- overall_score: float # 0-100
- timestamp: datetime
- environment_info: Dict[str, Any]
- issues: List[ValidationIssue]
- checks_performed: List[str]
- recommendations: List[str]
- performance_metrics: Dict[str, Any]
-```
-
-#### `AutoGenConversationResult`
-
-```python
-@dataclass
-class AutoGenConversationResult:
- conversation_id: str
- start_time: datetime
- end_time: datetime
- total_cost: Decimal
- turns_count: int
- participants: List[str]
- total_tokens: int
- code_executions: int
- function_calls: int
-```
-
----
-
-## Next Steps
-
-๐ฏ **Ready for Production?**
-1. **Review production deployment patterns** in this guide
-2. **Set up observability integration** with your platform
-3. **Configure monitoring and alerts** for budgets and performance
-4. **Implement custom governance policies** for your use case
-
-๐ **Learn More:**
-- [AutoGen Examples](../../examples/autogen/) - Progressive learning examples
-- [AutoGen Quickstart Guide](../quickstart/autogen-quickstart.md) - 3-minute setup
-- [Performance Benchmarking](../performance-benchmarking.md) - General performance patterns
-- [Security Best Practices](../security-best-practices.md) - Enterprise security guidelines
-- [Contributing Guidelines](../../CONTRIBUTING.md) - How to contribute improvements
-
-๐ค **Get Help:**
-- [GitHub Discussions](https://github.com/KoshiHQ/GenOps-AI/discussions)
-- [GitHub Issues](https://github.com/KoshiHQ/GenOps-AI/issues)
-- [Community Examples](https://github.com/KoshiHQ/GenOps-AI/tree/main/community)
-
----
-
-**๐ Congratulations!** You now have comprehensive AutoGen governance. Your multi-agent conversations are tracked, optimized, and compliant with enterprise standards.
\ No newline at end of file
diff --git a/docs/integrations/bedrock.md b/docs/integrations/bedrock.md
deleted file mode 100644
index d79effd..0000000
--- a/docs/integrations/bedrock.md
+++ /dev/null
@@ -1,1427 +0,0 @@
-# AWS Bedrock Integration Guide
-
-Comprehensive guide for integrating AWS Bedrock with GenOps AI governance and telemetry.
-
-## Table of Contents
-
-- [Overview](#overview)
-- [Installation & Setup](#installation--setup)
-- [Integration Patterns](#integration-patterns)
-- [Multi-Model Support](#multi-model-support)
-- [Cost Intelligence](#cost-intelligence)
-- [Enterprise Governance](#enterprise-governance)
-- [Production Deployment](#production-deployment)
-- [Performance Optimization](#performance-optimization)
-- [Observability Integration](#observability-integration)
-- [Advanced Use Cases](#advanced-use-cases)
-- [Troubleshooting](#troubleshooting)
-- [API Reference](#api-reference)
-
-## Overview
-
-GenOps provides comprehensive AWS Bedrock integration with:
-
-- **Multi-model support**: Claude, Titan, Jurassic, Command, Llama, Cohere, and Mistral
-- **Real-time cost tracking**: Token-level precision across all models
-- **Enterprise governance**: SOC2, HIPAA, PCI compliance with audit trails
-- **Zero-code instrumentation**: Works with existing boto3 applications unchanged
-- **OpenTelemetry native**: Exports to any OTLP-compatible observability platform
-- **Regional optimization**: Cross-region cost comparison and optimization
-
-### Architecture Overview
-
-```
-Application Code
- โ
-GenOps Bedrock Adapter
- โ
-AWS Bedrock Service โ Multi-region support
- โ
-OpenTelemetry Pipeline โ Rich governance telemetry
- โ
-Your Observability Platform โ Datadog, Grafana, etc.
-```
-
-## Installation & Setup
-
-### Quick Installation
-
-```bash
-# Core installation
-pip install genops-ai[bedrock]
-
-# Or install all components
-pip install genops-ai[all]
-```
-
-### AWS Configuration
-
-GenOps requires standard AWS credentials and Bedrock model access:
-
-```bash
-# Configure AWS credentials
-aws configure
-
-# Verify access
-aws sts get-caller-identity
-aws bedrock list-foundation-models --region us-east-1
-```
-
-**Required IAM Permissions:**
-```json
-{
- "Version": "2012-10-17",
- "Statement": [
- {
- "Effect": "Allow",
- "Action": [
- "bedrock:InvokeModel",
- "bedrock:InvokeModelWithResponseStream",
- "bedrock:ListFoundationModels"
- ],
- "Resource": "*"
- }
- ]
-}
-```
-
-### Environment Configuration
-
-```bash
-# Required
-export AWS_REGION="us-east-1"
-export AWS_DEFAULT_REGION="us-east-1"
-
-# OpenTelemetry configuration
-export OTEL_SERVICE_NAME="bedrock-ai-application"
-export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
-
-# GenOps configuration
-export GENOPS_ENVIRONMENT="production"
-export GENOPS_PROJECT="bedrock-ai-project"
-
-# Performance tuning
-export GENOPS_SAMPLING_RATE="1.0" # Full sampling (0.0-1.0)
-export GENOPS_ASYNC_EXPORT="true" # Non-blocking telemetry
-export GENOPS_CIRCUIT_BREAKER="true" # Resilience protection
-```
-
-### Setup Validation
-
-```python
-from genops.providers.bedrock import validate_bedrock_setup, print_validation_result
-
-result = validate_bedrock_setup()
-print_validation_result(result)
-
-if result.success:
- print("โ
Ready to start using GenOps with Bedrock!")
-else:
- print("โ Please resolve the issues above before continuing")
-```
-
-## Integration Patterns
-
-### 1. Zero-Code Auto-Instrumentation
-
-**Automatically instrument existing Bedrock applications with zero code changes:**
-
-```python
-from genops.providers.bedrock import auto_instrument_bedrock
-
-# Enable automatic instrumentation
-auto_instrument_bedrock()
-
-# Your existing boto3 code now automatically tracked!
-import boto3
-import json
-
-bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
-
-response = bedrock.invoke_model(
- modelId='anthropic.claude-3-haiku-20240307-v1:0',
- body=json.dumps({
- "prompt": "Analyze this financial report...",
- "max_tokens": 300
- })
-)
-
-# Cost and performance automatically tracked and exported
-```
-
-### 2. Manual Adapter Integration
-
-**Full control over instrumentation with governance attributes:**
-
-```python
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-adapter = GenOpsBedrockAdapter(
- region_name='us-east-1',
- default_model='anthropic.claude-3-haiku-20240307-v1:0'
-)
-
-result = adapter.text_generation(
- prompt="Analyze market trends in renewable energy",
- model_id="anthropic.claude-3-sonnet-20240229-v1:0",
- max_tokens=500,
- temperature=0.3,
-
- # Governance attributes for cost attribution
- team="research-team",
- project="market-analysis",
- customer_id="energy-client-789",
- environment="production",
- cost_center="Research-Analytics"
-)
-
-print(f"๐ฐ Operation cost: ${result.cost_usd:.6f}")
-print(f"โก Latency: {result.latency_ms}ms")
-print(f"๐ท๏ธ Attributed to: {result.governance_attributes}")
-```
-
-### 3. Context Manager Pattern
-
-**Multi-operation cost tracking with automatic aggregation:**
-
-```python
-from genops.providers.bedrock_cost_aggregator import create_bedrock_cost_context
-
-with create_bedrock_cost_context("financial_analysis_workflow") as cost_context:
- adapter = GenOpsBedrockAdapter()
-
- # Step 1: Document classification
- classification = adapter.text_generation(
- prompt="Classify this document type...",
- model_id="anthropic.claude-3-haiku-20240307-v1:0",
- team="finance-ai"
- )
-
- # Step 2: Detailed analysis with more powerful model
- analysis = adapter.text_generation(
- prompt="Perform detailed financial analysis...",
- model_id="anthropic.claude-3-opus-20240229-v1:0", # Premium model
- team="finance-ai"
- )
-
- # Step 3: Executive summary
- summary = adapter.text_generation(
- prompt="Create executive summary...",
- model_id="amazon.titan-text-express-v1", # Cost-effective
- team="finance-ai"
- )
-
- # Get unified cost summary across all operations
- final_summary = cost_context.get_current_summary()
- print(f"๐ฐ Total workflow cost: ${final_summary.total_cost:.6f}")
- print(f"๐ง Models used: {list(final_summary.unique_models)}")
- print(f"๐ญ Providers: {list(final_summary.unique_providers)}")
-```
-
-### 4. Function Decorator Pattern
-
-**Automatic instrumentation for specific functions:**
-
-```python
-from genops import track_usage
-
-@track_usage(
- operation_name="document_analysis",
- team="ai-platform",
- project="document-intelligence",
- customer_id="enterprise-client"
-)
-def analyze_document(document_content: str) -> dict:
- from genops.providers.bedrock import GenOpsBedrockAdapter
-
- adapter = GenOpsBedrockAdapter()
-
- result = adapter.text_generation(
- prompt=f"Analyze this document: {document_content}",
- model_id="anthropic.claude-3-sonnet-20240229-v1:0"
- )
-
- return {"analysis": result.content, "cost": result.cost_usd}
-
-# Function calls automatically tracked with governance
-result = analyze_document("QUARTERLY FINANCIAL RESULTS...")
-```
-
-## Multi-Model Support
-
-GenOps supports all major Bedrock foundation models with intelligent cost optimization:
-
-### Supported Models
-
-**Anthropic Claude Models:**
-```python
-models = {
- "anthropic.claude-3-opus-20240229-v1:0": "Premium - highest quality",
- "anthropic.claude-3-sonnet-20240229-v1:0": "Balanced - quality + performance",
- "anthropic.claude-3-haiku-20240307-v1:0": "Fast - cost-effective",
- "anthropic.claude-instant-v1": "Fastest - real-time responses"
-}
-```
-
-**Amazon Titan Models:**
-```python
-models = {
- "amazon.titan-text-express-v1": "Balanced text generation",
- "amazon.titan-text-lite-v1": "Cost-effective option",
- "amazon.titan-embed-text-v1": "Text embeddings"
-}
-```
-
-**AI21 Labs Jurassic Models:**
-```python
-models = {
- "ai21.j2-ultra-v1": "Highest quality",
- "ai21.j2-mid-v1": "Balanced performance",
- "ai21.j2-light-v1": "Fast and cost-effective"
-}
-```
-
-**Cohere Command Models:**
-```python
-models = {
- "cohere.command-text-v14": "Latest command model",
- "cohere.command-light-text-v14": "Lighter variant"
-}
-```
-
-### Intelligent Model Selection
-
-**Cost-aware model selection based on complexity and budget:**
-
-```python
-from genops.providers.bedrock import GenOpsBedrockAdapter
-from genops.providers.bedrock_pricing import get_cost_optimization_recommendations
-
-adapter = GenOpsBedrockAdapter()
-
-# Analyze task complexity and recommend optimal model
-task_prompt = "Analyze this complex financial derivative contract..."
-
-recommendations = get_cost_optimization_recommendations(
- prompt=task_prompt,
- budget_constraint=0.05, # $0.05 maximum
- quality_requirement="high", # Options: low, medium, high, premium
- region="us-east-1"
-)
-
-print(f"๐ฏ Recommended model: {recommendations.recommended_model}")
-print(f"๐ฐ Estimated cost: ${recommendations.estimated_cost:.6f}")
-print(f"โก Expected latency: {recommendations.estimated_latency_ms}ms")
-
-# Use the recommendation
-result = adapter.text_generation(
- prompt=task_prompt,
- model_id=recommendations.recommended_model,
- team="financial-analysis"
-)
-```
-
-### Multi-Model Comparison
-
-**Compare performance and costs across different models:**
-
-```python
-from genops.providers.bedrock_pricing import compare_bedrock_models
-
-models_to_compare = [
- "anthropic.claude-3-opus-20240229-v1:0",
- "anthropic.claude-3-sonnet-20240229-v1:0",
- "anthropic.claude-3-haiku-20240307-v1:0",
- "amazon.titan-text-express-v1"
-]
-
-comparison = compare_bedrock_models(
- prompt="Analyze quarterly financial performance",
- models=models_to_compare,
- region="us-east-1",
- expected_output_tokens=300
-)
-
-for model_result in comparison.model_comparisons:
- print(f"๐ค {model_result.model_id}")
- print(f" ๐ฐ Cost: ${model_result.estimated_cost:.6f}")
- print(f" โก Speed: {model_result.estimated_latency_ms}ms")
- print(f" ๐ฏ Quality Score: {model_result.quality_score}/10")
- print()
-
-print(f"๐ก Best for cost: {comparison.best_for_cost}")
-print(f"๐ Best for speed: {comparison.best_for_speed}")
-print(f"๐ Best for quality: {comparison.best_for_quality}")
-```
-
-## Cost Intelligence
-
-### Real-Time Cost Tracking
-
-**Accurate cost attribution with token-level precision:**
-
-```python
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-adapter = GenOpsBedrockAdapter()
-
-result = adapter.text_generation(
- prompt="Long complex analysis prompt...",
- model_id="anthropic.claude-3-sonnet-20240229-v1:0",
- max_tokens=1000,
- team="analytics-team",
- project="cost-optimization-study"
-)
-
-# Detailed cost breakdown
-print(f"๐ฐ Total cost: ${result.cost_usd:.6f}")
-print(f"๐ฅ Input cost: ${result.input_cost:.6f} ({result.input_tokens} tokens)")
-print(f"๐ค Output cost: ${result.output_cost:.6f} ({result.output_tokens} tokens)")
-print(f"๐ท๏ธ Cost per 1K tokens: ${result.cost_per_1k_tokens:.6f}")
-print(f"๐ Region: {result.region}")
-```
-
-### Budget-Constrained Operations
-
-**Operate within budget constraints with automatic optimization:**
-
-```python
-from genops.providers.bedrock_workflow import production_workflow_context, ComplianceLevel
-
-with production_workflow_context(
- workflow_name="budget_conscious_analysis",
- customer_id="startup-client",
- budget_limit=2.00, # $2.00 maximum budget
- team="cost-optimization",
- compliance_level=ComplianceLevel.SOC2
-) as (workflow, workflow_id):
-
- adapter = GenOpsBedrockAdapter()
-
- # Step 1: Quick classification with budget tracking
- workflow.record_step("classification")
- classification = adapter.text_generation(
- prompt="Classify document type quickly...",
- model_id="anthropic.claude-3-haiku-20240307-v1:0", # Cost-effective
- max_tokens=50
- )
-
- # Check budget before expensive operation
- current_cost = workflow.get_current_cost_summary()
- if current_cost.total_cost < 1.50: # Leave buffer
- # Step 2: Detailed analysis only if budget allows
- workflow.record_step("detailed_analysis")
- analysis = adapter.text_generation(
- prompt="Perform detailed analysis...",
- model_id="anthropic.claude-3-sonnet-20240229-v1:0",
- max_tokens=500
- )
- else:
- print("โ ๏ธ Skipping detailed analysis - budget constraint")
-
- final_cost = workflow.get_current_cost_summary()
- print(f"๐ฐ Final cost: ${final_cost.total_cost:.6f}")
- print(f"๐ Budget utilization: {(final_cost.total_cost/2.00)*100:.1f}%")
-```
-
-### Regional Cost Optimization
-
-**Compare costs across AWS regions and optimize:**
-
-```python
-from genops.providers.bedrock_pricing import calculate_regional_costs
-
-prompt = "Analyze market opportunities in renewable energy sector"
-model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
-
-regional_costs = calculate_regional_costs(
- prompt=prompt,
- model_id=model_id,
- regions=["us-east-1", "us-west-2", "eu-west-1", "ap-southeast-1"],
- expected_output_tokens=400
-)
-
-print("๐ Regional Cost Comparison:")
-for region_cost in regional_costs:
- print(f" {region_cost.region}: ${region_cost.total_cost:.6f}")
- print(f" Input: ${region_cost.input_cost:.6f}")
- print(f" Output: ${region_cost.output_cost:.6f}")
- print(f" Availability: {region_cost.model_available}")
- print()
-
-print(f"๐ก Cheapest region: {regional_costs[0].region}")
-print(f"๐ฐ Potential savings: ${regional_costs[-1].total_cost - regional_costs[0].total_cost:.6f}")
-```
-
-## Enterprise Governance
-
-### SOC2 Compliance Workflows
-
-**Enterprise-grade workflows with comprehensive audit trails:**
-
-```python
-from genops.providers.bedrock_workflow import production_workflow_context, ComplianceLevel
-
-with production_workflow_context(
- workflow_name="financial_document_analysis",
- customer_id="financial_services_client",
- team="compliance_ai",
- project="regulatory_reporting",
- environment="production",
- compliance_level=ComplianceLevel.SOC2,
- cost_center="Compliance-Technology",
- enable_cloudtrail=True,
- alert_webhooks=["https://alerts.company.com/compliance"]
-) as (workflow, workflow_id):
-
- adapter = GenOpsBedrockAdapter()
-
- # Step 1: Document classification with compliance tracking
- workflow.record_step("document_classification", {
- "classification_types": ["financial", "pii", "confidential"],
- "compliance_framework": "SOC2"
- })
-
- classification = adapter.text_generation(
- prompt="Classify this financial document for SOC2 compliance...",
- model_id="anthropic.claude-3-haiku-20240307-v1:0",
- temperature=0.1 # Low temperature for consistency
- )
-
- # Compliance checkpoint
- workflow.record_checkpoint("classification_complete", {
- "pii_detected": False,
- "financial_data_classified": True,
- "compliance_level_maintained": "SOC2"
- })
-
- # Step 2: Content analysis with audit trail
- workflow.record_step("content_analysis", {
- "analysis_type": "financial_risk_assessment",
- "data_handling": "encrypted_in_transit_and_rest"
- })
-
- analysis = adapter.text_generation(
- prompt="Perform SOC2-compliant analysis...",
- model_id="anthropic.claude-3-sonnet-20240229-v1:0"
- )
-
- # Final compliance validation
- workflow.record_checkpoint("analysis_complete", {
- "audit_trail_complete": True,
- "data_retention_compliant": True,
- "access_controls_verified": True,
- "encryption_maintained": True
- })
-
- # Performance and cost metrics
- final_cost = workflow.get_current_cost_summary()
- workflow.record_performance_metric("total_cost", final_cost.total_cost, "USD")
- workflow.record_performance_metric("compliance_score", 1.0, "percentage")
-
- print(f"โ
SOC2 compliant workflow completed")
- print(f"๐ Workflow ID: {workflow_id}")
- print(f"๐ฐ Total cost: ${final_cost.total_cost:.6f}")
- print(f"๐ Compliance checkpoints: Passed")
-```
-
-### Multi-Tenant Customer Attribution
-
-**Comprehensive cost attribution and isolation for multi-tenant applications:**
-
-```python
-from genops.providers.bedrock_cost_aggregator import create_bedrock_cost_context
-
-# Process multiple customers with unified cost tracking
-customers = [
- {"id": "enterprise_client_1", "tier": "premium"},
- {"id": "startup_client_2", "tier": "standard"},
- {"id": "enterprise_client_3", "tier": "premium"}
-]
-
-customer_costs = {}
-
-for customer in customers:
- customer_id = customer["id"]
- tier = customer["tier"]
-
- # Customer-specific cost context
- with create_bedrock_cost_context(f"customer_analysis_{customer_id}") as cost_context:
- adapter = GenOpsBedrockAdapter()
-
- # Tier-based model selection
- if tier == "premium":
- model = "anthropic.claude-3-opus-20240229-v1:0" # Best quality
- else:
- model = "anthropic.claude-3-haiku-20240307-v1:0" # Cost-effective
-
- # Customer analysis
- result = adapter.text_generation(
- prompt=f"Analyze requirements for {customer_id}...",
- model_id=model,
- customer_id=customer_id,
- team="customer_success",
- service_tier=tier
- )
-
- # Store customer-specific costs
- summary = cost_context.get_current_summary()
- customer_costs[customer_id] = {
- "total_cost": summary.total_cost,
- "model_used": model,
- "tier": tier,
- "operations": summary.total_operations
- }
-
-# Generate customer billing report
-print("๐ Customer Cost Attribution Report:")
-total_cost = 0
-for customer_id, cost_data in customer_costs.items():
- print(f" ๐ค {customer_id}")
- print(f" ๐ฐ Cost: ${cost_data['total_cost']:.6f}")
- print(f" ๐ค Model: {cost_data['model_used']}")
- print(f" ๐ท๏ธ Tier: {cost_data['tier']}")
- print()
- total_cost += cost_data['total_cost']
-
-print(f"๐ฐ Total revenue: ${total_cost:.6f}")
-```
-
-## Production Deployment
-
-### Serverless Deployment (AWS Lambda)
-
-**Optimized Lambda deployment with cold-start optimization:**
-
-```python
-import json
-import os
-from genops.providers.bedrock import GenOpsBedrockAdapter, instrument_bedrock
-
-# Enable auto-instrumentation for optimal Lambda performance
-instrument_bedrock()
-
-# Initialize outside handler for connection reuse
-adapter = GenOpsBedrockAdapter(
- region_name=os.environ.get('AWS_REGION', 'us-east-1'),
- default_model="anthropic.claude-3-haiku-20240307-v1:0" # Fast model for Lambda
-)
-
-def lambda_handler(event, context):
- """Lambda handler optimized for serverless AI processing."""
-
- try:
- document_text = event.get('document_text', '')
- customer_id = event.get('customer_id', 'unknown')
-
- # Fast document analysis optimized for Lambda
- result = adapter.text_generation(
- prompt=f"Quick analysis: {document_text[:500]}",
- model_id="anthropic.claude-3-haiku-20240307-v1:0",
- max_tokens=200,
- temperature=0.2,
- team="serverless-ai",
- customer_id=customer_id,
- environment="lambda"
- )
-
- return {
- 'statusCode': 200,
- 'body': json.dumps({
- 'analysis': result.content,
- 'cost': result.cost_usd,
- 'latency': result.latency_ms,
- 'customer_id': customer_id
- })
- }
-
- except Exception as e:
- return {
- 'statusCode': 500,
- 'body': json.dumps({'error': str(e)})
- }
-```
-
-**SAM Template for Lambda deployment:**
-
-```yaml
-# template.yaml
-AWSTemplateFormatVersion: '2010-09-09'
-Transform: AWS::Serverless-2016-10-31
-
-Globals:
- Function:
- Runtime: python3.9
- Timeout: 300
- MemorySize: 1024
- Environment:
- Variables:
- GENOPS_ENVIRONMENT: production
- GENOPS_PROJECT: bedrock-lambda
- OTEL_SERVICE_NAME: bedrock-lambda-ai
-
-Resources:
- BedrockAnalysisFunction:
- Type: AWS::Serverless::Function
- Properties:
- CodeUri: src/
- Handler: lambda_handler.lambda_handler
- Policies:
- - AWSLambdaBasicExecutionRole
- - Version: '2012-10-17'
- Statement:
- - Effect: Allow
- Action:
- - bedrock:InvokeModel
- - bedrock:InvokeModelWithResponseStream
- Resource: '*'
- Events:
- ApiEvent:
- Type: Api
- Properties:
- Path: /analyze
- Method: post
-```
-
-### Container Deployment (ECS)
-
-**Production-ready container configuration:**
-
-```dockerfile
-# Dockerfile
-FROM python:3.9-slim
-
-# Install dependencies
-COPY requirements.txt .
-RUN pip install -r requirements.txt
-
-# Copy application
-COPY . /app
-WORKDIR /app
-
-# Set GenOps environment
-ENV GENOPS_ENVIRONMENT=production
-ENV GENOPS_PROJECT=bedrock-ecs
-ENV OTEL_SERVICE_NAME=bedrock-ecs-service
-
-# Health check
-HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
- CMD curl -f http://localhost:8080/health || exit 1
-
-# Run application
-CMD ["python", "app.py"]
-```
-
-**ECS Task Definition:**
-
-```json
-{
- "family": "genops-bedrock-service",
- "networkMode": "awsvpc",
- "requiresCompatibilities": ["FARGATE"],
- "cpu": "1024",
- "memory": "2048",
- "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
- "taskRoleArn": "arn:aws:iam::ACCOUNT:role/genops-bedrock-task-role",
- "containerDefinitions": [
- {
- "name": "genops-bedrock-app",
- "image": "your-account.dkr.ecr.region.amazonaws.com/genops-bedrock:latest",
- "portMappings": [{"containerPort": 8080, "protocol": "tcp"}],
- "environment": [
- {"name": "AWS_REGION", "value": "us-east-1"},
- {"name": "GENOPS_ENVIRONMENT", "value": "production"},
- {"name": "OTEL_SERVICE_NAME", "value": "bedrock-ecs"}
- ],
- "logConfiguration": {
- "logDriver": "awslogs",
- "options": {
- "awslogs-group": "/ecs/genops-bedrock-service",
- "awslogs-region": "us-east-1",
- "awslogs-stream-prefix": "ecs"
- }
- },
- "healthCheck": {
- "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
- "interval": 30,
- "timeout": 5,
- "retries": 3
- }
- }
- ]
-}
-```
-
-## Performance Optimization
-
-### High-Volume Applications
-
-**Configuration for applications processing 10,000+ operations per day:**
-
-```python
-import os
-
-# Performance configuration
-os.environ.update({
- "GENOPS_SAMPLING_RATE": "0.1", # Sample 10% for reduced overhead
- "GENOPS_ASYNC_EXPORT": "true", # Non-blocking telemetry
- "GENOPS_BATCH_SIZE": "50", # Smaller batches
- "GENOPS_CIRCUIT_BREAKER": "true", # Protect against failures
- "GENOPS_CB_THRESHOLD": "3" # Quick failure detection
-})
-
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-# High-volume processing with optimized configuration
-adapter = GenOpsBedrockAdapter(
- enable_sampling=True,
- async_export=True,
- circuit_breaker_enabled=True
-)
-
-# Batch processing with cost optimization
-batch_size = 10
-documents = ["doc1", "doc2", "doc3"] * 100 # 300 documents
-
-for i in range(0, len(documents), batch_size):
- batch = documents[i:i + batch_size]
-
- # Process batch with cost-effective model
- for doc in batch:
- result = adapter.text_generation(
- prompt=f"Process: {doc}",
- model_id="amazon.titan-text-lite-v1", # Most cost-effective
- max_tokens=100,
- team="batch-processing"
- )
-
- # Batch telemetry export reduces overhead
- if i % 100 == 0: # Every 10 batches
- print(f"Processed {i + batch_size} documents")
-```
-
-### Connection Pooling and Caching
-
-**Optimize for repeated operations:**
-
-```python
-from genops.providers.bedrock import GenOpsBedrockAdapter
-import functools
-
-# Connection pooling for high-frequency operations
-adapter = GenOpsBedrockAdapter(
- region_name='us-east-1',
- connection_pool_size=20, # Increased pool size
- enable_connection_reuse=True
-)
-
-# Caching for repeated prompts
-@functools.lru_cache(maxsize=1000)
-def cached_classification(prompt_hash: str, model_id: str):
- """Cache classification results for repeated prompts."""
- return adapter.text_generation(
- prompt=prompt_hash, # Use hash for cache key
- model_id=model_id,
- max_tokens=50,
- temperature=0.0 # Deterministic for caching
- )
-
-# High-frequency processing with caching
-for document in documents:
- prompt = f"Classify: {document}"
- prompt_hash = hash(prompt) # Simple hash for demo
-
- # Use cached result if available
- result = cached_classification(prompt_hash, "anthropic.claude-3-haiku-20240307-v1:0")
-```
-
-### Circuit Breaker Pattern
-
-**Resilience for production workloads:**
-
-```python
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-# Circuit breaker configuration
-adapter = GenOpsBedrockAdapter(
- circuit_breaker_enabled=True,
- circuit_breaker_threshold=5, # Open after 5 failures
- circuit_breaker_timeout=60, # Reset after 60 seconds
- circuit_breaker_fallback="cache" # Fallback strategy
-)
-
-def resilient_analysis(document: str) -> dict:
- """Analysis with circuit breaker protection."""
-
- try:
- result = adapter.text_generation(
- prompt=f"Analyze: {document}",
- model_id="anthropic.claude-3-haiku-20240307-v1:0",
- team="resilient-ai"
- )
-
- return {
- "analysis": result.content,
- "cost": result.cost_usd,
- "source": "live"
- }
-
- except Exception as e:
- if "circuit breaker" in str(e).lower():
- # Fallback to cached result or simplified analysis
- return {
- "analysis": f"Circuit breaker active - using fallback analysis",
- "cost": 0.0,
- "source": "fallback",
- "error": str(e)
- }
- else:
- raise # Re-raise non-circuit-breaker errors
-```
-
-## Observability Integration
-
-### AWS CloudWatch Integration
-
-**Native integration with CloudWatch for comprehensive monitoring:**
-
-```python
-import boto3
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-# CloudWatch metrics automatically exported by GenOps
-cloudwatch = boto3.client('cloudwatch')
-
-# Custom dashboard configuration
-dashboard_config = {
- "dashboard_name": "GenOps-Bedrock-Operations",
- "widgets": [
- {
- "type": "metric",
- "properties": {
- "metrics": [
- ["GenOps/Bedrock", "OperationCount", "Team", "ai-platform"],
- ["GenOps/Bedrock", "TotalCost", "Team", "ai-platform"],
- ["GenOps/Bedrock", "AverageLatency", "Team", "ai-platform"],
- ["GenOps/Bedrock", "ErrorRate", "Team", "ai-platform"]
- ],
- "period": 300,
- "stat": "Average",
- "region": "us-east-1",
- "title": "GenOps Bedrock Metrics"
- }
- }
- ]
-}
-
-# Alarms for cost and performance monitoring
-cost_alarm = {
- "alarm_name": "GenOps-Bedrock-HighCost",
- "description": "Alert when Bedrock costs exceed threshold",
- "metric_name": "CostPerOperation",
- "namespace": "GenOps/Bedrock",
- "threshold": 0.01, # $0.01 per operation
- "comparison_operator": "GreaterThanThreshold",
- "evaluation_periods": 2
-}
-```
-
-### Datadog Integration
-
-**Export rich telemetry to Datadog:**
-
-```python
-# Configure Datadog exporter
-import os
-
-os.environ.update({
- "OTEL_EXPORTER_OTLP_ENDPOINT": "https://otlp.datadoghq.com:4317",
- "OTEL_EXPORTER_OTLP_HEADERS": "dd-api-key=your-datadog-api-key",
- "OTEL_RESOURCE_ATTRIBUTES": "service.name=bedrock-ai,env=production"
-})
-
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-adapter = GenOpsBedrockAdapter()
-
-# Telemetry automatically flows to Datadog with rich tags
-result = adapter.text_generation(
- prompt="Customer support inquiry analysis",
- model_id="anthropic.claude-3-haiku-20240307-v1:0",
- # Rich tagging for Datadog dashboards
- team="customer-support",
- project="ai-assistant",
- customer_id="enterprise-client",
- priority="high",
- department="support"
-)
-
-# Datadog dashboard will show:
-# - Costs by team, project, customer
-# - Latency percentiles by model
-# - Error rates and success metrics
-# - Custom business metrics
-```
-
-### Custom OTLP Integration
-
-**Works with any OTLP-compatible backend:**
-
-```python
-import os
-
-# Configure for your observability platform
-os.environ.update({
- "OTEL_EXPORTER_OTLP_ENDPOINT": "http://your-collector:4317",
- "OTEL_SERVICE_NAME": "bedrock-ai-service",
- "OTEL_RESOURCE_ATTRIBUTES": "deployment.environment=production,team.name=ai-platform"
-})
-
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-adapter = GenOpsBedrockAdapter()
-
-# Rich telemetry exported includes:
-# - Span data with AWS context
-# - Custom metrics for cost and performance
-# - Resource attributes with business context
-# - Baggage for cross-service correlation
-
-result = adapter.text_generation(
- prompt="Multi-service analysis request",
- model_id="anthropic.claude-3-sonnet-20240229-v1:0",
- # Business context propagated in telemetry
- team="analytics",
- project="cross-team-analysis",
- trace_id="parent-trace-id", # Correlation with other services
- span_context="inherited"
-)
-```
-
-## Advanced Use Cases
-
-### Multi-Region Failover
-
-**Automatic failover across AWS regions:**
-
-```python
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-# Multi-region configuration
-regions = ["us-east-1", "us-west-2", "eu-west-1"]
-adapters = {
- region: GenOpsBedrockAdapter(region_name=region)
- for region in regions
-}
-
-def resilient_analysis(prompt: str, primary_region: str = "us-east-1"):
- """Analysis with automatic regional failover."""
-
- for region in [primary_region] + [r for r in regions if r != primary_region]:
- try:
- adapter = adapters[region]
-
- result = adapter.text_generation(
- prompt=prompt,
- model_id="anthropic.claude-3-haiku-20240307-v1:0",
- team="resilient-ai",
- region=region
- )
-
- print(f"โ
Success in region: {region}")
- return result
-
- except Exception as e:
- print(f"โ Failed in region {region}: {e}")
- continue
-
- raise Exception("All regions failed - check service health")
-
-# Use with automatic failover
-result = resilient_analysis("Analyze customer feedback trends")
-```
-
-### A/B Testing for Model Performance
-
-**Compare model performance in production:**
-
-```python
-import random
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-adapter = GenOpsBedrockAdapter()
-
-def ab_test_models(prompt: str, customer_id: str):
- """A/B test different models for the same task."""
-
- # Model variants for testing
- models = {
- "variant_a": "anthropic.claude-3-haiku-20240307-v1:0", # Control
- "variant_b": "anthropic.claude-3-sonnet-20240229-v1:0", # Test
- }
-
- # Random assignment (50/50 split)
- variant = "variant_a" if random.random() < 0.5 else "variant_b"
- model = models[variant]
-
- result = adapter.text_generation(
- prompt=prompt,
- model_id=model,
- team="ab-testing",
- customer_id=customer_id,
- # A/B testing metadata
- experiment_variant=variant,
- experiment_name="model_quality_test",
- experiment_id="exp_001"
- )
-
- # Log for analysis
- print(f"๐งช A/B Test - Variant: {variant}, Cost: ${result.cost_usd:.6f}")
-
- return result, variant
-
-# Usage in production
-for customer_request in customer_requests:
- result, variant = ab_test_models(
- prompt=customer_request["prompt"],
- customer_id=customer_request["customer_id"]
- )
-
- # Track conversion metrics by variant
- track_conversion(variant, customer_request["customer_id"], result)
-```
-
-### Dynamic Budget Management
-
-**Real-time budget management with alerts:**
-
-```python
-from genops.providers.bedrock_workflow import production_workflow_context
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-class BudgetManager:
- def __init__(self, daily_budget: float = 100.0):
- self.daily_budget = daily_budget
- self.current_spend = 0.0
- self.alert_thresholds = [0.5, 0.8, 0.9] # 50%, 80%, 90%
-
- def check_budget(self, operation_cost: float) -> bool:
- """Check if operation is within budget."""
- projected_spend = self.current_spend + operation_cost
- return projected_spend <= self.daily_budget
-
- def record_spend(self, amount: float):
- """Record spending and check for alerts."""
- self.current_spend += amount
- utilization = self.current_spend / self.daily_budget
-
- for threshold in self.alert_thresholds:
- if utilization >= threshold:
- self.send_budget_alert(threshold, utilization)
- self.alert_thresholds.remove(threshold) # Prevent duplicate alerts
-
- def send_budget_alert(self, threshold: float, utilization: float):
- """Send budget alert."""
- print(f"๐จ Budget Alert: {utilization:.1%} of daily budget used (threshold: {threshold:.1%})")
-
-# Usage with budget management
-budget_manager = BudgetManager(daily_budget=50.0)
-adapter = GenOpsBedrockAdapter()
-
-def budget_aware_analysis(prompt: str, max_cost: float = 0.05):
- """Perform analysis within budget constraints."""
-
- if not budget_manager.check_budget(max_cost):
- return {"error": "Budget exceeded - operation denied"}
-
- # Choose model based on remaining budget
- remaining_budget = budget_manager.daily_budget - budget_manager.current_spend
-
- if remaining_budget > 10.0:
- model = "anthropic.claude-3-opus-20240229-v1:0" # Premium
- elif remaining_budget > 1.0:
- model = "anthropic.claude-3-sonnet-20240229-v1:0" # Balanced
- else:
- model = "anthropic.claude-3-haiku-20240307-v1:0" # Cost-effective
-
- result = adapter.text_generation(
- prompt=prompt,
- model_id=model,
- team="budget-conscious-ai"
- )
-
- # Record actual spend
- budget_manager.record_spend(result.cost_usd)
-
- return {
- "analysis": result.content,
- "cost": result.cost_usd,
- "model_used": model,
- "budget_remaining": budget_manager.daily_budget - budget_manager.current_spend
- }
-
-# Budget-aware processing
-for request in daily_requests:
- response = budget_aware_analysis(request["prompt"])
- if "error" not in response:
- print(f"โ
Analysis complete - Remaining budget: ${response['budget_remaining']:.2f}")
- else:
- print(f"โ {response['error']}")
-```
-
-## Troubleshooting
-
-### Common Issues and Solutions
-
-| Issue | Symptoms | Solution |
-|-------|----------|----------|
-| **AWS Credentials** | `NoCredentialsError`, `CredentialsNotFound` | Run `aws configure` or set environment variables |
-| **Bedrock Access** | `AccessDeniedException`, `UnauthorizedOperation` | Enable model access in AWS Console โ Bedrock โ Model access |
-| **Region Issues** | `EndpointConnectionError`, `InvalidRegion` | Use supported region like `us-east-1` |
-| **Model Not Available** | `ValidationException`, `ModelNotFound` | Check model availability in your region |
-| **High Costs** | Budget alerts, unexpected bills | Use cost optimization tools and budget limits |
-| **Circuit Breaker** | "Circuit breaker is open" | Wait for cooldown or disable circuit breaker |
-| **No Telemetry** | Missing observability data | Set `OTEL_EXPORTER_OTLP_ENDPOINT` |
-
-### Comprehensive Diagnostics
-
-```python
-from genops.providers.bedrock import validate_bedrock_setup, print_validation_result
-
-# Run complete diagnostic
-result = validate_bedrock_setup(verbose=True)
-print_validation_result(result)
-
-# Check specific issues
-if not result.success:
- print("\n๐ Detailed Diagnostics:")
-
- for check_name, check_result in result.detailed_checks.items():
- if not check_result.passed:
- print(f"โ {check_name}: {check_result.error}")
- print(f"๐ก Fix: {check_result.fix_suggestion}")
- if check_result.documentation_link:
- print(f"๐ Docs: {check_result.documentation_link}")
- print()
-```
-
-### Debug Mode
-
-```python
-import logging
-import os
-
-# Enable debug mode
-os.environ["GENOPS_LOG_LEVEL"] = "DEBUG"
-logging.getLogger("genops").setLevel(logging.DEBUG)
-
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-# Debug information will be logged
-adapter = GenOpsBedrockAdapter(debug_mode=True)
-
-result = adapter.text_generation(
- prompt="Debug test prompt",
- model_id="anthropic.claude-3-haiku-20240307-v1:0",
- team="debugging"
-)
-
-# Debug output includes:
-# - Request/response details
-# - Cost calculations step-by-step
-# - Telemetry export information
-# - AWS SDK interactions
-```
-
-### Performance Profiling
-
-```python
-import time
-from genops.providers.bedrock import GenOpsBedrockAdapter
-
-adapter = GenOpsBedrockAdapter(enable_profiling=True)
-
-# Performance profiling
-start_time = time.time()
-
-result = adapter.text_generation(
- prompt="Performance test prompt",
- model_id="anthropic.claude-3-haiku-20240307-v1:0",
- team="performance-testing"
-)
-
-end_time = time.time()
-
-print(f"โฑ๏ธ Total time: {(end_time - start_time)*1000:.2f}ms")
-print(f"๐ GenOps overhead: {result.genops_overhead_ms:.2f}ms")
-print(f"๐ค Model latency: {result.model_latency_ms:.2f}ms")
-print(f"๐ Telemetry export: {result.telemetry_export_ms:.2f}ms")
-```
-
-## API Reference
-
-### Core Classes
-
-#### `GenOpsBedrockAdapter`
-
-**Main adapter class for Bedrock integration:**
-
-```python
-class GenOpsBedrockAdapter:
- def __init__(
- self,
- region_name: str = "us-east-1",
- default_model: str = "anthropic.claude-3-haiku-20240307-v1:0",
- enable_sampling: bool = True,
- sampling_rate: float = 1.0,
- async_export: bool = True,
- circuit_breaker_enabled: bool = False,
- debug_mode: bool = False
- ):
- """Initialize GenOps Bedrock adapter."""
-
- def text_generation(
- self,
- prompt: str,
- model_id: str,
- max_tokens: int = 256,
- temperature: float = 0.7,
- top_p: float = 1.0,
- team: str = None,
- project: str = None,
- customer_id: str = None,
- environment: str = None,
- cost_center: str = None,
- feature: str = None,
- **kwargs
- ) -> BedrockResult:
- """Generate text with comprehensive governance tracking."""
-
- def is_available(self) -> bool:
- """Check if Bedrock service is available."""
-
- def get_supported_models(self, region: str = None) -> List[str]:
- """Get list of supported models in region."""
-```
-
-#### `BedrockResult`
-
-**Result object with cost and governance data:**
-
-```python
-@dataclass
-class BedrockResult:
- content: str # Generated content
- cost_usd: float # Total cost in USD
- input_cost: float # Input token cost
- output_cost: float # Output token cost
- input_tokens: int # Number of input tokens
- output_tokens: int # Number of output tokens
- latency_ms: float # Total latency
- model_latency_ms: float # Model-only latency
- genops_overhead_ms: float # GenOps processing overhead
- region: str # AWS region used
- model_id: str # Model identifier
- governance_attributes: Dict[str, str] # Governance metadata
- span_id: str # OpenTelemetry span ID
- trace_id: str # OpenTelemetry trace ID
-```
-
-### Utility Functions
-
-#### `validate_bedrock_setup()`
-
-```python
-def validate_bedrock_setup(
- region: str = "us-east-1",
- verbose: bool = False
-) -> ValidationResult:
- """Comprehensive setup validation."""
-```
-
-#### `auto_instrument_bedrock()`
-
-```python
-def auto_instrument_bedrock(
- sampling_rate: float = 1.0,
- enable_cost_tracking: bool = True,
- export_to_cloudwatch: bool = True
-) -> None:
- """Enable zero-code auto-instrumentation."""
-```
-
-#### Cost Intelligence Functions
-
-```python
-def calculate_bedrock_cost(
- input_tokens: int,
- output_tokens: int,
- model_id: str,
- region: str = "us-east-1"
-) -> CostBreakdown:
- """Calculate precise costs for Bedrock operation."""
-
-def compare_bedrock_models(
- prompt: str,
- models: List[str],
- region: str = "us-east-1"
-) -> ModelComparison:
- """Compare costs and performance across models."""
-
-def get_cost_optimization_recommendations(
- prompt: str,
- budget_constraint: float = None,
- quality_requirement: str = "medium",
- region: str = "us-east-1"
-) -> OptimizationRecommendations:
- """Get intelligent model recommendations."""
-```
-
-### Context Managers
-
-#### `create_bedrock_cost_context()`
-
-```python
-def create_bedrock_cost_context(
- context_id: str,
- budget_limit: float = None,
- alert_threshold: float = 0.8,
- enable_optimization_recommendations: bool = True
-) -> BedrockCostContext:
- """Create cost tracking context for multi-operation workflows."""
-```
-
-#### `production_workflow_context()`
-
-```python
-def production_workflow_context(
- workflow_name: str,
- customer_id: str,
- team: str,
- project: str,
- environment: str = "production",
- compliance_level: ComplianceLevel = ComplianceLevel.BASIC,
- cost_center: str = None,
- budget_limit: float = None,
- region: str = "us-east-1",
- enable_cloudtrail: bool = False,
- alert_webhooks: List[str] = None
-) -> Tuple[WorkflowContext, str]:
- """Create enterprise workflow context with full governance."""
-```
-
----
-
-## Next Steps
-
-**๐ฏ You're now ready to use GenOps with AWS Bedrock!**
-
-- **Quick Start**: Try the [5-minute quickstart guide](../bedrock-quickstart.md)
-- **Examples**: Explore comprehensive examples in [`examples/bedrock/`](../../examples/bedrock/)
-- **Community**: Join discussions at [GitHub Discussions](https://github.com/KoshiHQ/GenOps-AI/discussions)
-- **Support**: Report issues at [GitHub Issues](https://github.com/KoshiHQ/GenOps-AI/issues)
-
-**๐ Related Documentation:**
-- [OpenTelemetry Integration](./opentelemetry.md)
-- [Multi-Provider Comparison](./providers-comparison.md)
-- [Enterprise Deployment Guide](./enterprise-deployment.md)
\ No newline at end of file
diff --git a/docs/integrations/cohere.md b/docs/integrations/cohere.md
deleted file mode 100644
index 617b47f..0000000
--- a/docs/integrations/cohere.md
+++ /dev/null
@@ -1,2308 +0,0 @@
-# Cohere Integration Guide
-
-**Complete reference for integrating GenOps AI governance with Cohere's enterprise AI platform**
-
-This guide provides comprehensive documentation for all GenOps Cohere features, from basic cost tracking to advanced multi-operation optimization for enterprise AI workflows.
-
-## Overview
-
-GenOps provides complete governance for Cohere deployments including:
-
-- **๐ Multi-Operation Tracking** - Unified cost tracking across chat, embed, and rerank operations
-- **๐ฐ Token + Operation-Based Pricing** - Accurate costs for Cohere's hybrid pricing model
-- **๐ฏ Enterprise Optimization** - Cost intelligence for complex AI workflows using multiple operations
-- **๐ท๏ธ Team Attribution** - Attribute costs to teams, projects, and customers across all operation types
-- **โก Advanced Analytics** - Performance insights and recommendations for multi-operation workflows
-- **๐ก๏ธ Budget Controls** - Set limits, alerts, and automatic cost enforcement
-- **๐ OpenTelemetry Integration** - Export to your existing observability stack
-
-## Quick Start
-
-> **๐ New to GenOps + Cohere?** Start with the [5-Minute Quickstart Guide](../cohere-quickstart.md) for an instant working example, then return here for comprehensive reference.
-
-### Installation
-
-```bash
-# Install Cohere client
-pip install cohere
-
-# Install GenOps
-pip install genops-ai
-
-# Set your API key
-export CO_API_KEY="your-cohere-api-key"
-```
-
-### Basic Setup
-
-```python
-from genops.providers.cohere import instrument_cohere
-
-# Enable comprehensive tracking for all Cohere operations
-adapter = instrument_cohere(
- team="ai-team",
- project="enterprise-ai"
-)
-
-# Your existing Cohere code now includes GenOps tracking
-response = adapter.chat(
- message="What is machine learning?",
- model="command-r-plus-08-2024"
-)
-
-# Multi-operation workflow with unified tracking
-embeddings = adapter.embed(
- texts=["machine learning", "artificial intelligence"],
- model="embed-english-v4.0"
-)
-
-rankings = adapter.rerank(
- query="machine learning",
- documents=["ML is about algorithms", "AI includes ML"],
- model="rerank-english-v3.0"
-)
-
-# All operations automatically tracked with cost attribution
-```
-
-## Core Components
-
-### 1. GenOpsCohereAdapter
-
-The main adapter class for comprehensive Cohere instrumentation with multi-operation cost tracking.
-
-```python
-from genops.providers.cohere import GenOpsCohereAdapter
-
-# Create adapter with advanced configuration
-adapter = GenOpsCohereAdapter(
- api_key="your-api-key", # Optional, uses CO_API_KEY env var
-
- # Cost tracking configuration
- cost_tracking_enabled=True,
- budget_limit=100.0, # $100 budget limit
- cost_alert_threshold=0.8, # 80% threshold for alerts
-
- # Governance defaults
- default_team="ml-engineering",
- default_project="ai-platform",
- default_environment="production",
-
- # Performance settings
- timeout=60.0,
- max_retries=3,
- enable_streaming=True
-)
-```
-
-#### Chat Operations
-
-```python
-# Conversational AI with governance tracking
-response = adapter.chat(
- message="Explain quantum computing",
- model="command-r-plus-08-2024",
- temperature=0.7,
- max_tokens=500,
- team="research-team",
- project="quantum-ai",
- customer_id="enterprise-123"
-)
-
-print(f"Response: {response.content}")
-print(f"Cost: ${response.usage.total_cost:.6f}")
-print(f"Tokens: {response.usage.total_tokens}")
-```
-
-#### Text Generation
-
-```python
-# Direct text generation
-response = adapter.generate(
- prompt="Write a summary of machine learning:",
- model="command-r-08-2024",
- temperature=0.5,
- max_tokens=200,
- stop_sequences=[".", "!", "?"]
-)
-
-print(f"Generated text: {response.content}")
-print(f"Cost breakdown: Input=${response.usage.input_cost:.6f}, Output=${response.usage.output_cost:.6f}")
-```
-
-#### Embedding Operations
-
-```python
-# Text embeddings with cost tracking
-response = adapter.embed(
- texts=[
- "Machine learning is a subset of AI",
- "Deep learning uses neural networks",
- "AI transforms business processes"
- ],
- model="embed-english-v4.0",
- input_type="search_document",
- team="search-team",
- project="semantic-search"
-)
-
-print(f"Embeddings: {len(response.embeddings)} vectors")
-print(f"Embedding cost: ${response.usage.total_cost:.6f}")
-print(f"Cost per embedding: ${response.usage.total_cost / len(response.embeddings):.6f}")
-```
-
-#### Reranking Operations
-
-```python
-# Document reranking for search optimization
-response = adapter.rerank(
- query="machine learning applications",
- documents=[
- "ML helps in medical diagnosis",
- "Machine learning improves search results",
- "AI assists in financial trading",
- "Deep learning powers image recognition"
- ],
- model="rerank-english-v3.0",
- top_n=3,
- team="search-team"
-)
-
-print(f"Top rankings:")
-for i, ranking in enumerate(response.rankings[:3]):
- print(f"{i+1}. Score: {ranking['relevance_score']:.3f} - {ranking['document']['text'][:50]}...")
-
-print(f"Rerank cost: ${response.usage.total_cost:.6f}")
-```
-
-### 2. Multi-Operation Workflows
-
-Cohere's strength lies in combining multiple operations. GenOps provides unified cost tracking:
-
-```python
-def intelligent_search_pipeline(query: str, documents: list[str]):
- """Complete search pipeline with unified cost tracking."""
-
- # Step 1: Generate query embeddings
- query_embedding = adapter.embed(
- texts=[query],
- model="embed-english-v4.0",
- input_type="search_query"
- )
-
- # Step 2: Generate document embeddings
- doc_embeddings = adapter.embed(
- texts=documents,
- model="embed-english-v4.0",
- input_type="search_document"
- )
-
- # Step 3: Rerank documents for relevance
- rankings = adapter.rerank(
- query=query,
- documents=documents,
- model="rerank-english-v3.0",
- top_n=5
- )
-
- # Step 4: Generate summary of top results
- top_docs = [r['document']['text'] for r in rankings.rankings[:3]]
- summary = adapter.chat(
- message=f"Summarize these search results for '{query}': {'; '.join(top_docs)}",
- model="command-r-08-2024"
- )
-
- # Unified cost tracking across all operations
- total_cost = (query_embedding.usage.total_cost +
- doc_embeddings.usage.total_cost +
- rankings.usage.total_cost +
- summary.usage.total_cost)
-
- return {
- "summary": summary.content,
- "rankings": rankings.rankings,
- "total_cost": total_cost,
- "cost_breakdown": {
- "query_embedding": query_embedding.usage.total_cost,
- "doc_embeddings": doc_embeddings.usage.total_cost,
- "reranking": rankings.usage.total_cost,
- "summarization": summary.usage.total_cost
- }
- }
-
-# Execute pipeline with full cost attribution
-result = intelligent_search_pipeline(
- "machine learning applications",
- ["AI in healthcare", "ML in finance", "Deep learning for vision"]
-)
-print(f"Pipeline cost: ${result['total_cost']:.6f}")
-```
-
-### 3. Cost Optimization and Model Comparison
-
-```python
-# Compare costs across Cohere models
-from genops.providers.cohere_pricing import CohereCalculator
-
-calculator = CohereCalculator()
-
-# Compare generation models
-models = ["command-light", "command-r-08-2024", "command-r-plus-08-2024"]
-comparison = calculator.compare_model_costs(
- models=models,
- operation="CHAT",
- input_tokens=100,
- output_tokens=150
-)
-
-print("Model cost comparison:")
-for model, cost_breakdown in comparison.items():
- print(f"{model}: ${cost_breakdown.total_cost:.6f}")
-
-# Find cheapest model for operation
-cheapest = calculator.get_cheapest_model(
- models=models,
- operation="CHAT",
- input_tokens=100,
- output_tokens=150
-)
-print(f"Cheapest model: {cheapest}")
-```
-
-### 4. Advanced Cost Analytics
-
-```python
-from genops.providers.cohere_cost_aggregator import CohereCostAggregator, TimeWindow
-
-# Initialize cost aggregator
-aggregator = CohereCostAggregator(
- enable_detailed_tracking=True,
- cost_alert_threshold=50.0, # $50 alert threshold
- budget_period_hours=24
-)
-
-# Use aggregator with adapter
-adapter = GenOpsCohereAdapter(cost_aggregator=aggregator)
-
-# Run various operations...
-# (operations are automatically tracked in aggregator)
-
-# Get comprehensive analytics
-summary = aggregator.get_cost_summary(TimeWindow.DAY)
-print(f"Daily cost: ${summary['overview']['total_cost']:.6f}")
-print(f"Operations: {summary['overview']['total_operations']}")
-
-# Get optimization insights
-insights = aggregator.get_cost_optimization_insights()
-for recommendation in insights['recommendations']:
- print(f"๐ก {recommendation}")
-
-# Export data for analysis
-cost_data = aggregator.export_cost_data(format="dict")
-```
-
-## Advanced Features
-
-### Auto-Instrumentation
-
-For zero-code integration with existing Cohere applications:
-
-```python
-from genops.providers.cohere import auto_instrument
-
-# Enable automatic instrumentation
-success = auto_instrument()
-
-if success:
- # Your existing Cohere code now has GenOps tracking
- import cohere
- client = cohere.ClientV2()
-
- # This is automatically tracked
- response = client.chat(
- model="command-r-plus-08-2024",
- messages=[{"role": "user", "content": "Hello!"}]
- )
-```
-
-### Streaming Responses
-
-```python
-# Streaming chat with cost tracking
-def stream_chat(message: str, model: str = "command-r-08-2024"):
- response = adapter.chat(
- message=message,
- model=model,
- stream=True,
- team="realtime-team"
- )
-
- # Process streaming response
- for chunk in response:
- if chunk.content:
- print(chunk.content, end="", flush=True)
-
- print(f"\nStreaming cost: ${response.usage.total_cost:.6f}")
-```
-
-### Budget Controls and Alerts
-
-```python
-# Configure budget controls
-adapter = GenOpsCohereAdapter(
- budget_limit=100.0, # $100 daily limit
- cost_alert_threshold=0.8, # Alert at 80% of limit
-
- # Custom alert handler
- alert_callback=lambda cost, limit: print(f"โ ๏ธ Cost alert: ${cost:.2f} / ${limit:.2f}")
-)
-
-# Operations will automatically check budget
-try:
- response = adapter.chat(
- message="Long conversation...",
- model="command-r-plus-08-2024"
- )
-except BudgetExceededException as e:
- print(f"Operation blocked: {e}")
-```
-
-### Enterprise Integration Patterns
-
-```python
-# Enterprise deployment with comprehensive governance
-class EnterpriseCohere:
- def __init__(self):
- self.adapters = {}
- self.aggregator = CohereCostAggregator(
- cost_alert_threshold=1000.0, # $1000 daily limit
- enable_detailed_tracking=True
- )
-
- def get_team_adapter(self, team: str, project: str):
- """Get team-specific adapter with governance."""
- key = f"{team}-{project}"
- if key not in self.adapters:
- self.adapters[key] = GenOpsCohereAdapter(
- default_team=team,
- default_project=project,
- cost_aggregator=self.aggregator,
- budget_limit=100.0 # Per-team budget
- )
- return self.adapters[key]
-
- def get_usage_report(self) -> dict:
- """Generate enterprise usage report."""
- return {
- "summary": self.aggregator.get_cost_summary(),
- "by_team": self.aggregator.get_operation_summary(),
- "optimization": self.aggregator.get_cost_optimization_insights()
- }
-
-# Usage
-enterprise = EnterpriseCohere()
-
-# Team-specific usage
-ml_adapter = enterprise.get_team_adapter("ml-team", "recommendation-engine")
-search_adapter = enterprise.get_team_adapter("search-team", "semantic-search")
-
-# Generate reports
-report = enterprise.get_usage_report()
-```
-
-## Cost Optimization Strategies
-
-### 1. Model Selection Optimization
-
-```python
-# Intelligent model selection based on requirements
-def select_optimal_model(
- use_case: str,
- max_cost_per_operation: float,
- quality_priority: str = "balanced"
-) -> str:
- """Select optimal Cohere model based on requirements."""
-
- calculator = CohereCalculator()
-
- if use_case == "chat":
- candidates = ["command-light", "command-r-08-2024", "command-r-plus-08-2024"]
- elif use_case == "embedding":
- candidates = ["embed-english-v3.0", "embed-english-v4.0"]
- elif use_case == "rerank":
- candidates = ["rerank-english-v3.0", "rerank-multilingual-v3.0"]
-
- # Filter by cost constraints
- affordable_models = []
- for model in candidates:
- cost = calculator.estimate_cost(
- model=model,
- operation=use_case.upper(),
- input_text_length=1000, # Estimate
- expected_output_length=500
- )
-
- if cost <= max_cost_per_operation:
- affordable_models.append((model, cost))
-
- if not affordable_models:
- return None
-
- # Select based on quality priority
- if quality_priority == "cost":
- return min(affordable_models, key=lambda x: x[1])[0]
- elif quality_priority == "quality":
- return max(affordable_models, key=lambda x: x[1])[0] # Assume higher cost = higher quality
- else: # balanced
- return sorted(affordable_models, key=lambda x: x[1])[len(affordable_models)//2][0]
-
-# Usage
-optimal_model = select_optimal_model(
- use_case="chat",
- max_cost_per_operation=0.001, # $0.001 limit
- quality_priority="balanced"
-)
-print(f"Optimal model: {optimal_model}")
-```
-
-### 2. Batching and Caching Strategies
-
-```python
-# Efficient embedding with batching
-def batch_embed_with_caching(
- texts: list[str],
- batch_size: int = 96, # Cohere's batch limit
- cache_key_prefix: str = ""
-) -> list[list[float]]:
- """Batch embedding with caching for cost optimization."""
-
- cache = {} # In production, use Redis or similar
- embeddings = []
- to_embed = []
-
- # Check cache first
- for text in texts:
- cache_key = f"{cache_key_prefix}:{hash(text)}"
- if cache_key in cache:
- embeddings.append(cache[cache_key])
- else:
- to_embed.append((text, cache_key))
-
- # Batch embed uncached texts
- if to_embed:
- for i in range(0, len(to_embed), batch_size):
- batch_texts = [item[0] for item in to_embed[i:i+batch_size]]
- batch_keys = [item[1] for item in to_embed[i:i+batch_size]]
-
- response = adapter.embed(
- texts=batch_texts,
- model="embed-english-v4.0",
- team="optimization-team"
- )
-
- # Cache results
- for embedding, cache_key in zip(response.embeddings, batch_keys):
- cache[cache_key] = embedding
- embeddings.append(embedding)
-
- return embeddings
-```
-
-### 3. Multi-Operation Workflow Optimization
-
-```python
-# Optimize complex workflows
-def optimize_search_workflow(
- query: str,
- documents: list[str],
- quality_threshold: float = 0.8
-) -> dict:
- """Optimized search with adaptive quality/cost trade-offs."""
-
- # Step 1: Use fast reranking for initial filtering
- initial_ranking = adapter.rerank(
- query=query,
- documents=documents,
- model="rerank-english-v3.0",
- top_n=min(10, len(documents) // 2) # Reduce search space
- )
-
- # Step 2: Only embed high-quality candidates
- high_quality_docs = [
- r['document']['text'] for r in initial_ranking.rankings
- if r['relevance_score'] > quality_threshold
- ]
-
- if high_quality_docs:
- # Step 3: Generate embeddings for detailed analysis
- embeddings = adapter.embed(
- texts=high_quality_docs,
- model="embed-english-v4.0"
- )
-
- # Step 4: Generate summary only for top candidates
- summary = adapter.chat(
- message=f"Summarize: {'; '.join(high_quality_docs[:3])}",
- model="command-light" # Use cost-effective model
- )
-
- return {
- "summary": summary.content,
- "candidates": high_quality_docs,
- "optimization": "adaptive_quality_filtering"
- }
- else:
- # Fallback: direct summarization without embeddings
- summary = adapter.chat(
- message=f"Summarize search results for '{query}': {'; '.join(documents[:5])}",
- model="command-light"
- )
-
- return {
- "summary": summary.content,
- "candidates": documents[:5],
- "optimization": "cost_optimized_fallback"
- }
-```
-
-## Validation and Diagnostics
-
-### Setup Validation
-
-```python
-from genops.providers.cohere_validation import validate_setup, print_validation_result
-
-# Comprehensive setup validation
-result = validate_setup(
- api_key="your-api-key", # Optional, uses env var
- include_performance_tests=True
-)
-
-# Print detailed results
-print_validation_result(result, detailed=True)
-
-# Check specific aspects
-if result.has_critical_issues:
- print("โ Critical issues found - setup incomplete")
- for issue in result.issues:
- if issue.level.value == "critical":
- print(f" {issue.title}: {issue.fix_suggestion}")
-
-elif result.success:
- print("โ
Setup validated - ready for production")
-
- # Show performance metrics
- if result.performance_metrics:
- print("Performance metrics:")
- for metric, value in result.performance_metrics.items():
- print(f" {metric}: {value:.1f}ms")
-```
-
-### Quick Health Check
-
-```python
-from genops.providers.cohere_validation import quick_validate
-
-# Simple success/failure check
-if quick_validate():
- print("โ
Cohere integration ready")
-else:
- print("โ Setup issues detected")
- # Run full validation for details
- result = validate_setup()
- print_validation_result(result)
-```
-
-## Monitoring and Observability
-
-### OpenTelemetry Integration
-
-GenOps Cohere automatically exports telemetry to your existing observability stack:
-
-```python
-# Configure OpenTelemetry (standard setup)
-from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
-from opentelemetry.sdk.trace import TracerProvider
-from opentelemetry.sdk.trace.export import BatchSpanProcessor
-
-# Set up exporter (example: Jaeger)
-otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")
-tracer_provider = TracerProvider()
-tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
-
-# GenOps will automatically use configured tracer
-adapter = GenOpsCohereAdapter()
-response = adapter.chat(message="Hello") # Automatically traced
-```
-
-### Custom Metrics Export
-
-```python
-# Export metrics to custom systems
-def export_to_datadog(aggregator: CohereCostAggregator):
- """Export cost metrics to Datadog."""
- summary = aggregator.get_cost_summary()
-
- # Example Datadog integration
- statsd.gauge('genops.cohere.total_cost', summary['overview']['total_cost'])
- statsd.gauge('genops.cohere.operations', summary['overview']['total_operations'])
- statsd.gauge('genops.cohere.avg_cost_per_op', summary['overview']['avg_cost_per_operation'])
-
-def export_to_prometheus(aggregator: CohereCostAggregator):
- """Export metrics to Prometheus."""
- from prometheus_client import Gauge
-
- cost_gauge = Gauge('genops_cohere_total_cost', 'Total Cohere cost')
- ops_gauge = Gauge('genops_cohere_operations', 'Total Cohere operations')
-
- summary = aggregator.get_cost_summary()
- cost_gauge.set(summary['overview']['total_cost'])
- ops_gauge.set(summary['overview']['total_operations'])
-```
-
-## Security Best Practices
-
-### API Key Management
-
-```python
-# Secure API key handling
-import os
-from genops.providers.cohere import GenOpsCohereAdapter
-
-# Use environment variables (recommended)
-adapter = GenOpsCohereAdapter() # Automatically uses CO_API_KEY
-
-# Or use secure key management
-from your_key_manager import get_secret
-
-api_key = get_secret("cohere-api-key")
-adapter = GenOpsCohereAdapter(api_key=api_key)
-```
-
-### Data Privacy Controls
-
-```python
-# Configure privacy controls
-adapter = GenOpsCohereAdapter(
- # Disable request/response logging in production
- debug=False,
-
- # Enable request sanitization
- sanitize_requests=True,
-
- # Configure data retention
- telemetry_retention_days=30
-)
-```
-
-### Access Controls
-
-```python
-# Team-based access controls
-class SecureCohere:
- def __init__(self):
- self.team_budgets = {
- "ml-team": 500.0,
- "search-team": 200.0,
- "research-team": 1000.0
- }
-
- def get_adapter(self, team: str, user: str) -> GenOpsCohereAdapter:
- if team not in self.team_budgets:
- raise PermissionError(f"Team {team} not authorized")
-
- return GenOpsCohereAdapter(
- default_team=team,
- budget_limit=self.team_budgets[team],
- user_id=user # For audit trails
- )
-
-secure_cohere = SecureCohere()
-adapter = secure_cohere.get_adapter("ml-team", "alice")
-```
-
-## Performance Optimization and Benchmarks
-
-### Performance Benchmarks
-
-GenOps adds minimal overhead to Cohere operations while providing comprehensive tracking. Here are typical performance characteristics:
-
-#### Operation Latency Overhead
-- **Chat Operations**: < 5ms additional latency
-- **Embed Operations**: < 3ms additional latency
-- **Rerank Operations**: < 2ms additional latency
-- **Telemetry Export**: Async, 0ms blocking time
-
-#### Throughput Benchmarks
-Based on testing with production workloads:
-
-```
-Operation Type | Baseline RPS | With GenOps | Overhead
-------------------|--------------|-------------|----------
-Chat (small) | 100 RPS | 98 RPS | 2%
-Chat (large) | 50 RPS | 49 RPS | 2%
-Embed (batch=10) | 200 RPS | 195 RPS | 2.5%
-Embed (batch=50) | 80 RPS | 78 RPS | 2.5%
-Rerank (10 docs) | 150 RPS | 147 RPS | 2%
-Rerank (100 docs) | 30 RPS | 29 RPS | 3%
-```
-
-#### Memory Usage
-- **Base overhead**: ~5MB per adapter instance
-- **Per operation**: ~500 bytes (detailed tracking enabled)
-- **Per operation**: ~100 bytes (detailed tracking disabled)
-
-### High-Volume Optimization
-
-For applications processing >1000 operations/minute:
-
-```python
-# Optimized adapter configuration for high volume
-adapter = GenOpsCohereAdapter(
- # Reduce telemetry overhead
- detailed_tracking=False,
- sampling_rate=0.1, # Sample 10% of operations
-
- # Optimize batch processing
- batch_telemetry=True,
- telemetry_batch_size=100,
-
- # Connection pooling
- max_connections=20,
- connection_pool_size=10,
-
- # Async telemetry export
- async_telemetry=True,
- telemetry_buffer_size=1000
-)
-```
-
-### Scaling Guidelines
-
-#### Single Instance Limits
-- **Maximum concurrent operations**: 50
-- **Maximum operations/second**: 100
-- **Memory usage at scale**: ~50MB for 1000 ops/minute
-
-#### Multi-Instance Deployment
-For >100 RPS, use multiple adapter instances:
-
-```python
-# Load balancing across multiple adapters
-import random
-from concurrent.futures import ThreadPoolExecutor
-
-class CohereAdapterPool:
- def __init__(self, pool_size: int = 5):
- self.adapters = [
- GenOpsCohereAdapter(
- cost_tracking_enabled=True,
- sampling_rate=1.0 / pool_size # Distribute sampling
- ) for _ in range(pool_size)
- ]
- self.executor = ThreadPoolExecutor(max_workers=pool_size * 2)
-
- def execute_operation(self, operation_func, **kwargs):
- """Execute operation on random adapter from pool."""
- adapter = random.choice(self.adapters)
- return self.executor.submit(operation_func, adapter, **kwargs)
-
-# Usage
-pool = CohereAdapterPool(pool_size=10)
-future = pool.execute_operation(
- lambda adapter, **kw: adapter.chat(**kw),
- message="Hello",
- model="command-light"
-)
-result = future.result()
-```
-
-### Performance Monitoring
-
-Track GenOps performance impact in production:
-
-```python
-import time
-from genops.providers.cohere import GenOpsCohereAdapter
-
-class PerformanceMonitor:
- def __init__(self, adapter: GenOpsCohereAdapter):
- self.adapter = adapter
- self.metrics = {
- 'total_operations': 0,
- 'total_latency': 0.0,
- 'genops_overhead': 0.0
- }
-
- def monitored_operation(self, operation_func, **kwargs):
- """Execute operation with performance monitoring."""
- # Baseline timing
- start = time.perf_counter()
-
- # Execute with GenOps
- result = operation_func(**kwargs)
-
- genops_end = time.perf_counter()
- genops_latency = genops_end - start
-
- # Track metrics
- self.metrics['total_operations'] += 1
- self.metrics['total_latency'] += genops_latency
-
- # Estimate GenOps overhead (conservative)
- estimated_overhead = min(genops_latency * 0.05, 0.010) # Max 10ms
- self.metrics['genops_overhead'] += estimated_overhead
-
- return result
-
- def get_performance_summary(self) -> dict:
- """Get performance impact summary."""
- if self.metrics['total_operations'] == 0:
- return {}
-
- avg_latency = self.metrics['total_latency'] / self.metrics['total_operations']
- avg_overhead = self.metrics['genops_overhead'] / self.metrics['total_operations']
- overhead_percentage = (avg_overhead / avg_latency) * 100
-
- return {
- 'total_operations': self.metrics['total_operations'],
- 'average_latency_ms': avg_latency * 1000,
- 'average_overhead_ms': avg_overhead * 1000,
- 'overhead_percentage': overhead_percentage,
- 'operations_per_second': self.metrics['total_operations'] / self.metrics['total_latency'] if self.metrics['total_latency'] > 0 else 0
- }
-
-# Usage
-monitor = PerformanceMonitor(adapter)
-
-# Monitor operations
-result = monitor.monitored_operation(
- lambda **kw: adapter.chat(**kw),
- message="Performance test",
- model="command-light"
-)
-
-# Get performance report
-summary = monitor.get_performance_summary()
-print(f"GenOps overhead: {summary['overhead_percentage']:.1f}%")
-```
-
-### Optimization Strategies
-
-#### 1. Model Selection for Performance
-
-```python
-# Performance-optimized model selection
-PERFORMANCE_OPTIMIZED_MODELS = {
- 'chat': {
- 'fastest': 'command-light', # ~200ms avg latency
- 'balanced': 'command-r-08-2024', # ~500ms avg latency
- 'quality': 'command-r-plus-08-2024' # ~800ms avg latency
- },
- 'embed': {
- 'fastest': 'embed-english-v3.0', # ~150ms for 10 texts
- 'balanced': 'embed-english-v4.0', # ~200ms for 10 texts
- },
- 'rerank': {
- 'fastest': 'rerank-english-v3.0', # ~100ms for 10 docs
- 'multilingual': 'rerank-multilingual-v3.0' # ~150ms for 10 docs
- }
-}
-
-def select_performance_optimized_model(operation: str, priority: str = 'balanced'):
- """Select model optimized for performance requirements."""
- return PERFORMANCE_OPTIMIZED_MODELS.get(operation, {}).get(priority)
-```
-
-#### 2. Batch Processing Optimization
-
-```python
-# Optimize embedding operations with batching
-async def optimized_embed_workflow(texts: list[str], adapter: GenOpsCohereAdapter):
- """Process large text collections efficiently."""
-
- # Cohere's optimal batch size for embeddings
- OPTIMAL_BATCH_SIZE = 96
-
- results = []
- for i in range(0, len(texts), OPTIMAL_BATCH_SIZE):
- batch = texts[i:i + OPTIMAL_BATCH_SIZE]
-
- # Process batch with minimal overhead
- result = adapter.embed(
- texts=batch,
- model="embed-english-v4.0",
- # Reduce tracking overhead for bulk operations
- detailed_tracking=False
- )
-
- results.extend(result.embeddings)
-
- return results
-```
-
-#### 3. Caching Strategies
-
-```python
-# Implement intelligent caching for repeated operations
-import hashlib
-import pickle
-from functools import lru_cache
-
-class CohereCache:
- def __init__(self, adapter: GenOpsCohereAdapter, cache_size: int = 1000):
- self.adapter = adapter
- self.cache = {}
- self.cache_size = cache_size
-
- def _generate_cache_key(self, operation: str, **kwargs) -> str:
- """Generate cache key for operation."""
- # Create deterministic key from operation and parameters
- key_data = f"{operation}:{sorted(kwargs.items())}"
- return hashlib.md5(key_data.encode()).hexdigest()
-
- def cached_embed(self, texts: list[str], **kwargs):
- """Embed with intelligent caching."""
- cache_key = self._generate_cache_key('embed', texts=tuple(texts), **kwargs)
-
- if cache_key in self.cache:
- # Return cached result (but still track for cost)
- self.adapter._track_cached_operation('embed', kwargs.get('model', ''))
- return self.cache[cache_key]
-
- # Execute operation and cache result
- result = self.adapter.embed(texts=texts, **kwargs)
-
- if result.success and len(self.cache) < self.cache_size:
- self.cache[cache_key] = result
-
- return result
-```
-
-### Troubleshooting Performance Issues
-
-#### Common Performance Problems
-
-**1. High Latency**
-```python
-# Diagnose high latency issues
-def diagnose_latency_issues(adapter):
- import time
-
- # Test individual operations
- operations = [
- ('chat', lambda: adapter.chat(message="test", model="command-light")),
- ('embed', lambda: adapter.embed(texts=["test"], model="embed-english-v4.0")),
- ('rerank', lambda: adapter.rerank(query="test", documents=["doc"], model="rerank-english-v3.0"))
- ]
-
- for op_name, op_func in operations:
- times = []
- for _ in range(5): # Test 5 times
- start = time.perf_counter()
- result = op_func()
- end = time.perf_counter()
- if result.success:
- times.append(end - start)
-
- if times:
- avg_time = sum(times) / len(times)
- print(f"{op_name}: {avg_time*1000:.1f}ms avg")
-
- if avg_time > 2.0: # > 2 seconds is concerning
- print(f" โ ๏ธ High latency detected for {op_name}")
-```
-
-**2. Memory Usage**
-```python
-# Monitor memory usage
-import psutil
-import os
-
-def monitor_memory_usage(adapter, num_operations=100):
- """Monitor memory usage during operations."""
-
- process = psutil.Process(os.getpid())
- initial_memory = process.memory_info().rss / 1024 / 1024 # MB
-
- # Execute operations
- for i in range(num_operations):
- adapter.chat(message=f"test {i}", model="command-light")
-
- final_memory = process.memory_info().rss / 1024 / 1024 # MB
-
- memory_increase = final_memory - initial_memory
- memory_per_operation = memory_increase / num_operations
-
- print(f"Memory usage:")
- print(f" Initial: {initial_memory:.1f} MB")
- print(f" Final: {final_memory:.1f} MB")
- print(f" Increase: {memory_increase:.1f} MB")
- print(f" Per operation: {memory_per_operation*1024:.1f} KB")
-
- if memory_per_operation > 0.5: # > 500KB per operation
- print(" โ ๏ธ High memory usage per operation")
-```
-
-## Production Deployment
-
-### Kubernetes Deployment
-
-```yaml
-# cohere-genops-deployment.yaml
-apiVersion: apps/v1
-kind: Deployment
-metadata:
- name: cohere-genops-service
-spec:
- replicas: 3
- selector:
- matchLabels:
- app: cohere-genops
- template:
- metadata:
- labels:
- app: cohere-genops
- spec:
- containers:
- - name: cohere-service
- image: your-registry/cohere-genops:latest
- env:
- - name: CO_API_KEY
- valueFrom:
- secretKeyRef:
- name: cohere-secrets
- key: api-key
- - name: GENOPS_TELEMETRY_ENABLED
- value: "true"
- - name: OTEL_EXPORTER_OTLP_ENDPOINT
- value: "http://jaeger-collector:4318"
- resources:
- requests:
- memory: "256Mi"
- cpu: "250m"
- limits:
- memory: "512Mi"
- cpu: "500m"
----
-apiVersion: v1
-kind: Secret
-metadata:
- name: cohere-secrets
-type: Opaque
-data:
- api-key: