Skip to content

perf: Implement HTTP client connection pooling for improved performance #21

@doughayden

Description

@doughayden

Performance Optimization

Implement HTTP client connection pooling to reduce connection overhead and improve request performance for service-to-service communication.

Level of Effort: 🟢 Small (1-2 days)

  • Implementation: 1 day for connection pooling setup and configuration
  • Testing: 0.5 day for performance validation and testing
  • Documentation: 0.5 day for configuration and usage documentation

Current Implementation

File: src/client/utils.py (lines 212-230)

async def send_request(self, url: str, data: dict) -> dict:
    """Send HTTP request with authentication."""
    # Current: Creates new httpx.AsyncClient for each request
    async with httpx.AsyncClient() as client:
        response = await client.post(
            url,
            json=data,
            headers={"Authorization": f"Bearer {id_token}"},
            timeout=30.0
        )

Performance issues:

  • New HTTP connection established for each request
  • TCP handshake overhead on every API call
  • No connection reuse between requests
  • Potential connection pool exhaustion under load

Proposed Implementation

1. Singleton HTTP Client with Connection Pooling

Enhanced client with connection pooling:

# src/client/utils.py
class UtilHandler:
    def __init__(self, log_level: str = "INFO"):
        # ... existing initialization
        self._http_client: Optional[httpx.AsyncClient] = None
        self._client_lock = asyncio.Lock()
    
    async def _get_http_client(self) -> httpx.AsyncClient:
        """Get or create HTTP client with connection pooling."""
        if self._http_client is None:
            async with self._client_lock:
                if self._http_client is None:
                    limits = httpx.Limits(
                        max_keepalive_connections=20,
                        max_connections=100,
                        keepalive_expiry=30.0
                    )
                    timeout = httpx.Timeout(30.0, connect=10.0)
                    
                    self._http_client = httpx.AsyncClient(
                        limits=limits,
                        timeout=timeout,
                        http2=True  # Enable HTTP/2 if supported
                    )
        return self._http_client
    
    async def send_request(self, url: str, data: dict) -> dict:
        """Send HTTP request with connection pooling."""
        id_token = await self._get_id_token()
        client = await self._get_http_client()
        
        response = await client.post(
            url,
            json=data,
            headers={"Authorization": f"Bearer {id_token}"},
        )
        
        if response.status_code == 200:
            return response.json()
        else:
            # ... error handling
    
    async def close(self):
        """Close HTTP client and clean up connections."""
        if self._http_client:
            await self._http_client.aclose()
            self._http_client = None

2. Backend HTTP Client Optimization

If backend makes external HTTP calls:

# src/answer_app/utils.py (if needed for external API calls)
class UtilHandler:
    def __init__(self, log_level: str = "INFO"):
        # ... existing initialization
        self._http_client: Optional[httpx.AsyncClient] = None
    
    async def _get_http_client(self) -> httpx.AsyncClient:
        """Get HTTP client for external API calls."""
        if self._http_client is None:
            limits = httpx.Limits(
                max_keepalive_connections=10,
                max_connections=50,
                keepalive_expiry=60.0
            )
            self._http_client = httpx.AsyncClient(limits=limits)
        return self._http_client

3. Application Lifecycle Management

Proper client lifecycle in FastAPI:

# src/answer_app/main.py
@app.on_event("startup")
async def startup_event():
    """Initialize shared resources."""
    # HTTP client will be created on first use
    pass

@app.on_event("shutdown")
async def shutdown_event():
    """Clean up shared resources."""
    # Close HTTP clients
    if hasattr(utils, '_http_client') and utils._http_client:
        await utils._http_client.aclose()

Streamlit app lifecycle:

# src/client/streamlit_app.py
import atexit

# Register cleanup function
atexit.register(lambda: asyncio.run(utils.close()) if utils._http_client else None)

Configuration Options

Add to application config:

# config.yaml
http_client:
  max_keepalive_connections: 20
  max_connections: 100
  keepalive_expiry: 30.0
  connection_timeout: 10.0
  request_timeout: 30.0
  enable_http2: true
  
  # Retry configuration
  max_retries: 3
  retry_backoff_factor: 0.5

Expected Performance Improvements

Connection Overhead Reduction:

  • TCP handshake elimination: Reuse existing connections
  • SSL/TLS handshake savings: Keep secure connections alive
  • DNS lookup reduction: Connection pooling reduces DNS queries

Performance Metrics:

  • Latency improvement: 10-50ms reduction per request (depending on network)
  • Throughput increase: 20-40% improvement for concurrent requests
  • Resource efficiency: Reduced system socket usage

Load Handling:

  • Better concurrency: Efficient handling of multiple simultaneous requests
  • Connection limits: Prevent connection pool exhaustion
  • Graceful degradation: Proper timeout and retry handling

Implementation Areas

Files to Modify:

  • src/client/utils.py: Add connection pooling to UtilHandler
  • src/client/streamlit_app.py: Add proper client lifecycle management
  • src/answer_app/main.py: Add startup/shutdown hooks (if backend needs HTTP client)
  • Tests: Update mocking to work with persistent client

Configuration:

  • src/answer_app/config.yaml: Add HTTP client configuration options
  • Environment variables: HTTP client tuning parameters

Testing Strategy

Performance Testing:

  • Measure request latency before/after implementation
  • Test concurrent request handling
  • Validate connection reuse metrics
  • Monitor resource usage under load

Functional Testing:

  • Ensure all existing functionality works with pooled connections
  • Test connection recovery after network issues
  • Validate proper client cleanup on application shutdown

Edge Case Testing:

  • Connection timeout scenarios
  • Server connection limits
  • Network interruption recovery
  • Long-running connection behavior

Monitoring and Metrics

Connection pool metrics to track:

  • Active connections count
  • Connection reuse rate
  • Connection creation/destruction frequency
  • Request latency improvements
  • Failed connection attempts

Logging enhancements:

logging.info(f"HTTP client stats: active={client.pool.active_count}, "
            f"idle={client.pool.idle_count}")

Acceptance Criteria

  • HTTP client connection pooling implemented in client utils
  • Configurable connection pool parameters
  • Proper client lifecycle management (startup/shutdown)
  • Performance improvement measurable (latency reduction)
  • No regression in existing functionality
  • Connection pool metrics and monitoring
  • Updated tests to work with persistent client
  • Documentation for configuration options

Priority

Low - Performance optimization that provides value but isn't critical at current scale.

When to Implement

This becomes more valuable when:

  • Application handles >50 requests/hour consistently
  • Network latency between services becomes noticeable
  • Multiple concurrent users create performance bottlenecks
  • Service-to-service communication frequency increases
  • Performance optimization becomes a focus area

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions