perf: Implement response caching for frequently asked questions

## Performance Optimization

Implement intelligent caching for frequently asked questions to improve response times and reduce API costs.

## Level of Effort: 🟡 Medium (3-4 days)
- **Cache implementation**: 2 days for caching logic and storage
- **Cache strategy**: 1 day for intelligent cache management
- **Testing and tuning**: 1 day for performance validation

## Current Performance Characteristics

**Response time breakdown:**
- Discovery Engine API call: 2-5 seconds (majority of time)
- Response processing: 100-500ms
- BigQuery logging: 50-200ms
- Total: 2.5-6 seconds per query

**Caching opportunities:**
- Identical questions from different users
- Similar questions with equivalent answers
- Popular topics that get asked repeatedly
- FAQ-style questions with stable answers

## Proposed Caching Strategy

### 1. Multi-Layer Caching

**Layer 1: Exact Match Cache**
```python
# src/answer_app/cache_manager.py
class ResponseCache:
    def __init__(self, redis_client=None, ttl_seconds=3600):
        self.redis_client = redis_client
        self.memory_cache = {}
        self.ttl_seconds = ttl_seconds
    
    async def get_cached_response(self, query_hash: str) -> Optional[CachedResponse]:
        """Get cached response for exact query match."""
        
    async def cache_response(self, query: str, response: dict, metadata: dict):
        """Cache successful response with metadata."""
        
    def generate_query_hash(self, query: str, user_context: dict = None) -> str:
        """Generate consistent hash for query caching."""
```

**Layer 2: Semantic Similarity Cache**
```python
class SemanticCache:
    def __init__(self, similarity_threshold=0.85):
        self.similarity_threshold = similarity_threshold
        self.query_embeddings = {}
    
    async def find_similar_cached_query(self, query: str) -> Optional[CachedResponse]:
        """Find semantically similar cached queries."""
        
    async def compute_query_embedding(self, query: str) -> List[float]:
        """Generate embeddings for semantic similarity."""
```

### 2. Cache Storage Options

**Option A: Redis (Recommended for production)**
- Fast in-memory caching
- Distributed caching across instances
- Automatic expiration handling
- Persistence options available

**Option B: In-Memory Cache (Development)**
- Simple implementation for testing
- No external dependencies
- Limited to single instance

**Option C: BigQuery Cache Table**
- Persistent cache storage
- Query-based cache management
- Integration with existing BigQuery setup

### 3. Intelligent Cache Management

**Cache key strategy:**
```python
def generate_cache_key(query: str, user_context: dict) -> str:
    """Generate cache key considering context."""
    # Normalize query text
    normalized_query = normalize_query(query)
    
    # Include relevant context
    context_hash = hash_user_context(user_context)
    
    return f"query:{hash(normalized_query)}:ctx:{context_hash}"

def normalize_query(query: str) -> str:
    """Normalize query for consistent caching."""
    # Remove extra whitespace, lowercase, remove punctuation
    # Handle common variations and synonyms
    return query.lower().strip()
```

**Cache invalidation strategy:**
- Time-based expiration (TTL)
- Manual invalidation for updated content
- LRU eviction for memory management
- Version-based invalidation for content updates

### 4. Cache Analytics and Monitoring

**Cache performance metrics:**
```python
class CacheMetrics:
    def track_cache_hit(self, query_type: str, response_time_saved: float):
        """Track successful cache hits."""
        
    def track_cache_miss(self, query_type: str, reason: str):
        """Track cache misses and reasons."""
        
    def get_cache_statistics(self) -> dict:
        """Get cache performance statistics."""
        return {
            "hit_rate": self.calculate_hit_rate(),
            "avg_response_time_saved": self.avg_time_saved,
            "popular_queries": self.get_top_cached_queries(),
            "cache_size": self.get_cache_size()
        }
```

## Implementation Areas

### Backend Components:
- **`src/answer_app/cache_manager.py`**: Core caching logic
- **`src/answer_app/semantic_cache.py`**: Similarity-based caching
- **`src/answer_app/cache_metrics.py`**: Cache performance tracking
- **`src/answer_app/main.py`**: Cache middleware integration

### Configuration:
- **`src/answer_app/config.yaml`**: Cache configuration options
- **Cache deployment**: Redis or alternative cache storage

### Infrastructure:
- **`terraform/modules/cache/`**: Redis deployment (if using Redis)
- **Monitoring**: Cache performance dashboards

## Configuration Options

**Add to `config.yaml`:**
```yaml
caching:
  enabled: true
  provider: "redis"  # redis, memory, bigquery
  
  redis:
    host: "localhost"
    port: 6379
    db: 0
    password: null
  
  cache_settings:
    default_ttl: 3600  # 1 hour
    max_cache_size: 10000  # entries
    similarity_threshold: 0.85
    
  cache_policies:
    exact_match_ttl: 3600
    semantic_match_ttl: 1800
    popular_query_ttl: 7200
    
  monitoring:
    log_cache_hits: true
    track_performance: true
    alert_on_low_hit_rate: 0.3
```

## Cache Warming Strategies

### 1. Popular Query Pre-loading
- Identify frequently asked questions from BigQuery logs
- Pre-populate cache with common queries
- Update cache during low-traffic periods

### 2. Proactive Caching
- Cache responses for trending topics
- Pre-generate responses for FAQ content
- Background cache refresh for expiring popular items

## Testing Strategy

### Performance Testing:
- Cache hit rate measurement
- Response time improvement validation
- Memory usage monitoring
- Cache invalidation testing

### Functionality Testing:
- Exact match caching accuracy
- Semantic similarity matching
- Cache expiration behavior
- Cache consistency across instances

## Expected Performance Improvements

**With 50% cache hit rate:**
- Average response time: 2.5-6s → 1.5-3s (50% improvement for cached queries)
- Discovery Engine API calls: 50% reduction
- Cost savings: ~50% reduction in Discovery Engine costs
- Improved user experience for repeat questions

**With semantic caching:**
- Additional 20-30% cache hit rate for similar questions
- Better handling of question variations
- Reduced API costs for semantically similar queries

## Acceptance Criteria

- [ ] Exact match caching implemented and working
- [ ] Semantic similarity caching (optional advanced feature)
- [ ] Configurable cache TTL and size limits
- [ ] Cache performance monitoring and metrics
- [ ] Cache hit rate > 40% for typical usage patterns
- [ ] Response time improvement measurable
- [ ] Cache invalidation working correctly
- [ ] Documentation for cache configuration and tuning

## Priority

**Low** - Performance optimization that provides value but isn't critical at current scale.

## When to Implement

This becomes more valuable when:
- Application handles >100 queries/day consistently
- Discovery Engine API costs become significant
- Response time optimization becomes important
- Users frequently ask similar questions
- Traffic patterns show repeated query patterns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Implement response caching for frequently asked questions #18

Performance Optimization

Level of Effort: 🟡 Medium (3-4 days)

Current Performance Characteristics

Proposed Caching Strategy

1. Multi-Layer Caching

2. Cache Storage Options

3. Intelligent Cache Management

4. Cache Analytics and Monitoring

Implementation Areas

Backend Components:

Configuration:

Infrastructure:

Configuration Options

Cache Warming Strategies

1. Popular Query Pre-loading

2. Proactive Caching

Testing Strategy

Performance Testing:

Functionality Testing:

Expected Performance Improvements

Acceptance Criteria

Priority

When to Implement

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf: Implement response caching for frequently asked questions #18

Description

Performance Optimization

Level of Effort: 🟡 Medium (3-4 days)

Current Performance Characteristics

Proposed Caching Strategy

1. Multi-Layer Caching

2. Cache Storage Options

3. Intelligent Cache Management

4. Cache Analytics and Monitoring

Implementation Areas

Backend Components:

Configuration:

Infrastructure:

Configuration Options

Cache Warming Strategies

1. Popular Query Pre-loading

2. Proactive Caching

Testing Strategy

Performance Testing:

Functionality Testing:

Expected Performance Improvements

Acceptance Criteria

Priority

When to Implement

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions