perf: Implement HTTP client connection pooling for improved performance

## Performance Optimization

Implement HTTP client connection pooling to reduce connection overhead and improve request performance for service-to-service communication.

## Level of Effort: 🟢 Small (1-2 days)
- **Implementation**: 1 day for connection pooling setup and configuration
- **Testing**: 0.5 day for performance validation and testing
- **Documentation**: 0.5 day for configuration and usage documentation

## Current Implementation

**File**: `src/client/utils.py` (lines 212-230)
```python
async def send_request(self, url: str, data: dict) -> dict:
    """Send HTTP request with authentication."""
    # Current: Creates new httpx.AsyncClient for each request
    async with httpx.AsyncClient() as client:
        response = await client.post(
            url,
            json=data,
            headers={"Authorization": f"Bearer {id_token}"},
            timeout=30.0
        )
```

**Performance issues:**
- New HTTP connection established for each request
- TCP handshake overhead on every API call
- No connection reuse between requests
- Potential connection pool exhaustion under load

## Proposed Implementation

### 1. Singleton HTTP Client with Connection Pooling

**Enhanced client with connection pooling:**
```python
# src/client/utils.py
class UtilHandler:
    def __init__(self, log_level: str = "INFO"):
        # ... existing initialization
        self._http_client: Optional[httpx.AsyncClient] = None
        self._client_lock = asyncio.Lock()
    
    async def _get_http_client(self) -> httpx.AsyncClient:
        """Get or create HTTP client with connection pooling."""
        if self._http_client is None:
            async with self._client_lock:
                if self._http_client is None:
                    limits = httpx.Limits(
                        max_keepalive_connections=20,
                        max_connections=100,
                        keepalive_expiry=30.0
                    )
                    timeout = httpx.Timeout(30.0, connect=10.0)
                    
                    self._http_client = httpx.AsyncClient(
                        limits=limits,
                        timeout=timeout,
                        http2=True  # Enable HTTP/2 if supported
                    )
        return self._http_client
    
    async def send_request(self, url: str, data: dict) -> dict:
        """Send HTTP request with connection pooling."""
        id_token = await self._get_id_token()
        client = await self._get_http_client()
        
        response = await client.post(
            url,
            json=data,
            headers={"Authorization": f"Bearer {id_token}"},
        )
        
        if response.status_code == 200:
            return response.json()
        else:
            # ... error handling
    
    async def close(self):
        """Close HTTP client and clean up connections."""
        if self._http_client:
            await self._http_client.aclose()
            self._http_client = None
```

### 2. Backend HTTP Client Optimization

**If backend makes external HTTP calls:**
```python
# src/answer_app/utils.py (if needed for external API calls)
class UtilHandler:
    def __init__(self, log_level: str = "INFO"):
        # ... existing initialization
        self._http_client: Optional[httpx.AsyncClient] = None
    
    async def _get_http_client(self) -> httpx.AsyncClient:
        """Get HTTP client for external API calls."""
        if self._http_client is None:
            limits = httpx.Limits(
                max_keepalive_connections=10,
                max_connections=50,
                keepalive_expiry=60.0
            )
            self._http_client = httpx.AsyncClient(limits=limits)
        return self._http_client
```

### 3. Application Lifecycle Management

**Proper client lifecycle in FastAPI:**
```python
# src/answer_app/main.py
@app.on_event("startup")
async def startup_event():
    """Initialize shared resources."""
    # HTTP client will be created on first use
    pass

@app.on_event("shutdown")
async def shutdown_event():
    """Clean up shared resources."""
    # Close HTTP clients
    if hasattr(utils, '_http_client') and utils._http_client:
        await utils._http_client.aclose()
```

**Streamlit app lifecycle:**
```python
# src/client/streamlit_app.py
import atexit

# Register cleanup function
atexit.register(lambda: asyncio.run(utils.close()) if utils._http_client else None)
```

## Configuration Options

**Add to application config:**
```yaml
# config.yaml
http_client:
  max_keepalive_connections: 20
  max_connections: 100
  keepalive_expiry: 30.0
  connection_timeout: 10.0
  request_timeout: 30.0
  enable_http2: true
  
  # Retry configuration
  max_retries: 3
  retry_backoff_factor: 0.5
```

## Expected Performance Improvements

### Connection Overhead Reduction:
- **TCP handshake elimination**: Reuse existing connections
- **SSL/TLS handshake savings**: Keep secure connections alive
- **DNS lookup reduction**: Connection pooling reduces DNS queries

### Performance Metrics:
- **Latency improvement**: 10-50ms reduction per request (depending on network)
- **Throughput increase**: 20-40% improvement for concurrent requests
- **Resource efficiency**: Reduced system socket usage

### Load Handling:
- **Better concurrency**: Efficient handling of multiple simultaneous requests
- **Connection limits**: Prevent connection pool exhaustion
- **Graceful degradation**: Proper timeout and retry handling

## Implementation Areas

### Files to Modify:
- **`src/client/utils.py`**: Add connection pooling to UtilHandler
- **`src/client/streamlit_app.py`**: Add proper client lifecycle management
- **`src/answer_app/main.py`**: Add startup/shutdown hooks (if backend needs HTTP client)
- **Tests**: Update mocking to work with persistent client

### Configuration:
- **`src/answer_app/config.yaml`**: Add HTTP client configuration options
- **Environment variables**: HTTP client tuning parameters

## Testing Strategy

### Performance Testing:
- Measure request latency before/after implementation
- Test concurrent request handling
- Validate connection reuse metrics
- Monitor resource usage under load

### Functional Testing:
- Ensure all existing functionality works with pooled connections
- Test connection recovery after network issues
- Validate proper client cleanup on application shutdown

### Edge Case Testing:
- Connection timeout scenarios
- Server connection limits
- Network interruption recovery
- Long-running connection behavior

## Monitoring and Metrics

**Connection pool metrics to track:**
- Active connections count
- Connection reuse rate
- Connection creation/destruction frequency
- Request latency improvements
- Failed connection attempts

**Logging enhancements:**
```python
logging.info(f"HTTP client stats: active={client.pool.active_count}, "
            f"idle={client.pool.idle_count}")
```

## Acceptance Criteria

- [ ] HTTP client connection pooling implemented in client utils
- [ ] Configurable connection pool parameters
- [ ] Proper client lifecycle management (startup/shutdown)
- [ ] Performance improvement measurable (latency reduction)
- [ ] No regression in existing functionality
- [ ] Connection pool metrics and monitoring
- [ ] Updated tests to work with persistent client
- [ ] Documentation for configuration options

## Priority

**Low** - Performance optimization that provides value but isn't critical at current scale.

## When to Implement

This becomes more valuable when:
- Application handles >50 requests/hour consistently
- Network latency between services becomes noticeable
- Multiple concurrent users create performance bottlenecks
- Service-to-service communication frequency increases
- Performance optimization becomes a focus area

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Implement HTTP client connection pooling for improved performance #21

Performance Optimization

Level of Effort: 🟢 Small (1-2 days)

Current Implementation

Proposed Implementation

1. Singleton HTTP Client with Connection Pooling

2. Backend HTTP Client Optimization

3. Application Lifecycle Management

Configuration Options

Expected Performance Improvements

Connection Overhead Reduction:

Performance Metrics:

Load Handling:

Implementation Areas

Files to Modify:

Configuration:

Testing Strategy

Performance Testing:

Functional Testing:

Edge Case Testing:

Monitoring and Metrics

Acceptance Criteria

Priority

When to Implement

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf: Implement HTTP client connection pooling for improved performance #21

Description

Performance Optimization

Level of Effort: 🟢 Small (1-2 days)

Current Implementation

Proposed Implementation

1. Singleton HTTP Client with Connection Pooling

2. Backend HTTP Client Optimization

3. Application Lifecycle Management

Configuration Options

Expected Performance Improvements

Connection Overhead Reduction:

Performance Metrics:

Load Handling:

Implementation Areas

Files to Modify:

Configuration:

Testing Strategy

Performance Testing:

Functional Testing:

Edge Case Testing:

Monitoring and Metrics

Acceptance Criteria

Priority

When to Implement

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions