Skip to content

Latest commit

 

History

History
293 lines (213 loc) · 7.94 KB

File metadata and controls

293 lines (213 loc) · 7.94 KB

Production Deployment

Guide for deploying py-code-mode in production environments.

Architecture

Production deployments typically combine:

  • RedisStorage - Shared workflow library across instances
  • ContainerExecutor - Isolated code execution
  • Pre-configured dependencies - Locked down environment
  • Monitoring and observability - Health checks and logging
import os
from py_code_mode import Session, RedisStorage
from py_code_mode.execution import ContainerExecutor, ContainerConfig

# Shared workflow library
storage = RedisStorage(url=os.getenv("REDIS_URL"), prefix="production")

# Isolated execution with authentication and pre-configured deps
config = ContainerConfig(
    timeout=60.0,
    allow_runtime_deps=False,  # Lock down package installation
    auth_token=os.getenv("CONTAINER_AUTH_TOKEN"),  # Required for production
    deps=["pandas>=2.0", "numpy", "requests"],  # Pre-configured dependencies
)
executor = ContainerExecutor(config)

async with Session(storage=storage, executor=executor, sync_deps_on_start=True) as session:
    result = await session.run(agent_code)

Security Best Practices

1. Enable API Authentication

The container HTTP API requires authentication by default. Never deploy without authentication.

# Load token from environment/secret store
token = os.getenv("CONTAINER_AUTH_TOKEN")
# Or: token = azure_keyvault.get_secret("container-auth-token")
# Or: token = hashicorp_vault.read("secret/container-auth")["token"]

config = ContainerConfig(
    auth_token=token,  # Required - server refuses to start without it
)

Fail-closed design: If you forget to configure auth, the container refuses to start. This prevents accidental unauthenticated deployments.

2. Lock Down Dependencies

Prevent agents from installing arbitrary packages:

config = ContainerConfig(
    allow_runtime_deps=False,  # Block runtime installation
    deps=["pandas>=2.0", "requests>=2.28.0"],  # Pre-configure allowed packages
)

3. Use Container Isolation

Run untrusted agent code in containers:

executor = ContainerExecutor(ContainerConfig(
    timeout=60.0,
    auth_token=os.getenv("CONTAINER_AUTH_TOKEN"),
    network_disabled=False,  # Set True to disable network
    memory_limit="512m",
    cpu_quota=None
))

4. Validate Input

Never trust agent code without validation:

# Bad: Direct execution
result = await session.run(user_provided_code)

# Better: Validation layer
if is_safe(user_provided_code):
    result = await session.run(user_provided_code)
else:
    raise SecurityError("Unsafe code detected")

5. Isolate Storage by Tenant

Use a stable environment prefix plus workspace_id for multi-tenant deployments:

def get_storage(tenant_id: str, redis_url: str) -> RedisStorage:
    return RedisStorage(
        url=redis_url,
        prefix="production",
        workspace_id=tenant_id,
    )

If workspace_id is omitted, the system uses the legacy default namespace. That is one shared unscoped namespace, so multi-tenant deployments should set workspace_id explicitly.


Scalability Patterns

Horizontal Scaling

Multiple agent instances share workflow library via Redis:

┌─────────────┐     ┌──────────┐     ┌─────────────┐
│  Instance 1 │────▶│  Redis   │◀────│  Instance 2 │
└─────────────┘     │ (Workflows) │     └─────────────┘
                    └──────────┘
                         ▲
                         │
                    ┌─────────────┐
                    │  Instance 3 │
                    └─────────────┘

All instances benefit when any instance creates a workflow.

Load Balancing

# Each instance runs the same code
async def handle_request(agent_code: str, tenant_id: str):
    storage = get_storage(tenant_id)
    executor = ContainerExecutor(config)

    async with Session(storage=storage, executor=executor) as session:
        return await session.run(agent_code)

Load balancer distributes requests across instances.

Remote Session Servers

For remote ContainerExecutor(remote_url=...) deployments:

  • the client provides workspace_id through the storage backend
  • the session server creates an execution session_id
  • workflow/artifact isolation is enforced by the server's workspace-scoped storage bundle

Configure the session server with server-owned storage roots:

  • storage_base_path for file-backed storage
  • storage_prefix for Redis-backed storage

The host storage configuration and the remote session server must refer to the same logical backing store. In practice, Redis-backed storage is the recommended production topology for remote deployments because both sides can share one namespace directly.


Container Image Management

Building Images

# Build base image
docker build -t py-code-mode:base -f docker/Dockerfile.base .

# Build with additional tools
docker build -t py-code-mode:tools -f docker/Dockerfile.tools .

Updating Images

When you update py-code-mode library code:

# Rebuild images with new code
docker build -t py-code-mode:base -f docker/Dockerfile.base .

# Restart containers to use new image
# (Kubernetes will do this automatically on rollout)

Multi-Stage Builds

Use multi-stage builds to keep images small:

# Dockerfile.base
FROM python:3.11-slim as builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user -r requirements.txt

FROM python:3.11-slim
COPY --from=builder /root/.local /root/.local
COPY src/ /app/src/
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "-m", "py_code_mode.container.server"]

Monitoring and Observability

Health Checks

from fastapi import FastAPI
from py_code_mode import Session, RedisStorage

app = FastAPI()

@app.get("/health")
async def health():
    try:
        # Check Redis connectivity
        redis_client.ping()

        # Check executor can start
        async with Session(storage=storage, executor=executor) as session:
            await session.run("print('health check')")

        return {"status": "healthy"}
    except Exception as e:
        return {"status": "unhealthy", "error": str(e)}

Logging

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

async def run_agent(code: str):
    logger.info("Starting agent execution", extra={"code_length": len(code)})

    try:
        result = await session.run(code)
        logger.info("Execution succeeded", extra={"result_type": type(result.value)})
        return result
    except Exception as e:
        logger.error("Execution failed", extra={"error": str(e)}, exc_info=True)
        raise

Metrics

Track key metrics:

  • Execution time per request
  • Success/failure rates
  • Skill creation rate
  • Redis memory usage
  • Container startup time

Example Deployment: Azure Container Apps

See examples/azure-container-apps/ for a complete production deployment example including:

  • Docker image configuration
  • Azure Container Apps deployment
  • Redis integration
  • Environment configuration
  • Scaling policies

Checklist

Before going to production:

  • Container API authentication configured (auth_token set from secret store)
  • Dependencies pre-configured and locked (allow_runtime_deps=False)
  • Using ContainerExecutor for isolation
  • Redis configured with persistence and backups
  • Health checks implemented
  • Logging and metrics in place
  • Multi-instance testing completed
  • Resource limits set (memory, CPU, timeout)
  • Secrets management configured (API keys, credentials, container auth token)
  • Disaster recovery plan documented
  • Monitoring and alerting configured