Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions BUILDZ_AI_FIXES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# buildz.ai Error Fixes

This document outlines the fixes applied to resolve the 500 error and WebSocket connection issues on buildz.ai/workspace.

## Issues Identified

1. **API 500 Error**: `/api/workspaces` endpoint was failing due to insufficient error handling
2. **WebSocket Connection Failure**: `wss://buildz.ai/socket.io/` connection was being closed before establishment

## Root Causes

### 1. API Error Handling
- The `/api/workspaces` endpoint lacked comprehensive try-catch error handling
- Database connection errors or session issues were not properly caught and logged
- Error responses didn't provide sufficient debugging information

### 2. WebSocket Configuration Issues
- Missing WebSocket-specific ingress annotations for GKE
- No BackendConfig for proper WebSocket connection handling
- Potential CORS configuration issues for buildz.ai domain
- Client-side socket URL validation needed improvement

## Fixes Applied

### 1. Enhanced API Error Handling

**File**: `apps/sim/app/api/workspaces/route.ts`

- Wrapped the entire GET function in try-catch block
- Added comprehensive logging for debugging
- Enhanced error responses with detailed error messages
- Added user context logging for better troubleshooting

### 2. WebSocket Infrastructure Improvements

**File**: `helm/sim/examples/ingress-buildz.yaml`

- Added WebSocket-specific annotations:
- `nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"`
- `nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"`
- `cloud.google.com/backend-config` reference for WebSocket support

**File**: `helm/sim/examples/backend-config-buildz.yaml` (NEW)

- Created BackendConfig for WebSocket connections:
- Connection draining for graceful shutdowns
- Extended timeout for WebSocket connections (3600s)
- Session affinity with CLIENT_IP
- Health check configuration pointing to `/health` endpoint

### 3. Socket Server CORS Configuration

**File**: `apps/sim/socket-server/config/socket.ts`

- Explicitly added buildz.ai domains to allowed origins:
- `https://buildz.ai`
- `https://www.buildz.ai`

### 4. Client-Side Socket Configuration

**File**: `apps/sim/contexts/socket-context.tsx`

- Added validation to detect socket URL misconfigurations
- Enhanced logging for socket connection debugging
- Added environment variable debugging information

## Deployment Instructions

### Automatic Deployment

Run the deployment script:

```bash
./deploy-buildz-fix.sh
```

### Manual Deployment

1. **Apply BackendConfig**:
```bash
kubectl apply -f helm/sim/examples/backend-config-buildz.yaml
```

2. **Update Ingress**:
```bash
kubectl apply -f helm/sim/examples/ingress-buildz.yaml
```

3. **Upgrade Helm Deployment**:
```bash
helm upgrade sim-gcp ./helm/sim \
--namespace simstudio \
--values helm/sim/examples/values-gcp-buildz.yaml \
--wait \
--timeout=10m
```

4. **Verify Deployment**:
```bash
kubectl get pods -n simstudio -l app.kubernetes.io/name=sim-gcp
kubectl get ingress -n simstudio sim-ingress
```

## Verification Steps

### 1. API Endpoint Test
```bash
# Test the workspaces API (requires authentication)
curl -H "Authorization: Bearer <token>" https://buildz.ai/api/workspaces
```

### 2. WebSocket Health Check
```bash
# Test WebSocket server health
curl https://ws.buildz.ai/health
```

### 3. WebSocket Connection Test
- Open browser DevTools on https://buildz.ai
- Navigate to Network tab, filter by "WS"
- Look for successful socket.io connections to ws.buildz.ai

### 4. Diagnostic Script
Run the diagnostic script for comprehensive checks:
```bash
./diagnose-buildz-issue.sh
```

## Expected Behavior After Fixes

1. **API Endpoints**: Should return proper JSON responses or detailed error messages instead of generic 500 errors
2. **WebSocket Connections**: Should successfully connect to `wss://ws.buildz.ai/socket.io/`
3. **Real-time Features**: Collaborative editing, presence indicators, and live updates should work properly

## Monitoring and Troubleshooting

### Log Monitoring
```bash
# Monitor application logs
kubectl logs -n simstudio -l app.kubernetes.io/name=sim-gcp -f

# Monitor WebSocket server logs
kubectl logs -n simstudio -l app=sim-gcp-realtime -f
```

### Common Issues and Solutions

1. **DNS Resolution**: Ensure ws.buildz.ai resolves correctly
2. **SSL Certificate**: Verify certificate covers both buildz.ai and ws.buildz.ai
3. **Environment Variables**: Check NEXT_PUBLIC_SOCKET_URL is set correctly in pods
4. **Load Balancer**: Ensure GKE ingress properly routes WebSocket traffic

## Files Modified

1. `apps/sim/app/api/workspaces/route.ts` - Enhanced error handling
2. `apps/sim/socket-server/config/socket.ts` - Added buildz.ai CORS origins
3. `apps/sim/contexts/socket-context.tsx` - Added URL validation and debugging
4. `helm/sim/examples/ingress-buildz.yaml` - Added WebSocket annotations
5. `helm/sim/examples/backend-config-buildz.yaml` - NEW: WebSocket backend config

## Files Created

1. `deploy-buildz-fix.sh` - Automated deployment script
2. `diagnose-buildz-issue.sh` - Diagnostic script
3. `helm/sim/examples/backend-config-buildz.yaml` - WebSocket backend configuration
4. `BUILDZ_AI_FIXES.md` - This documentation

The fixes address both the immediate 500 error and the underlying WebSocket connectivity issues, providing a more robust and debuggable system.
78 changes: 47 additions & 31 deletions apps/sim/app/api/workspaces/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,46 +10,62 @@ const logger = createLogger('Workspaces')

// Get all workspaces for the current user
export async function GET() {
const session = await getSession()
try {
const session = await getSession()

if (!session?.user?.id) {
return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
}
if (!session?.user?.id) {
return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
}

// Get all workspaces where the user has permissions
const userWorkspaces = await db
.select({
workspace: workspace,
permissionType: permissions.permissionType,
})
.from(permissions)
.innerJoin(workspace, eq(permissions.entityId, workspace.id))
.where(and(eq(permissions.userId, session.user.id), eq(permissions.entityType, 'workspace')))
.orderBy(desc(workspace.createdAt))
logger.info('Fetching workspaces for user', { userId: session.user.id })

// Get all workspaces where the user has permissions
const userWorkspaces = await db
.select({
workspace: workspace,
permissionType: permissions.permissionType,
})
.from(permissions)
.innerJoin(workspace, eq(permissions.entityId, workspace.id))
.where(and(eq(permissions.userId, session.user.id), eq(permissions.entityType, 'workspace')))
.orderBy(desc(workspace.createdAt))

if (userWorkspaces.length === 0) {
// Create a default workspace for the user
const defaultWorkspace = await createDefaultWorkspace(session.user.id, session.user.name)
if (userWorkspaces.length === 0) {
logger.info('No workspaces found, creating default workspace', { userId: session.user.id })
// Create a default workspace for the user
const defaultWorkspace = await createDefaultWorkspace(session.user.id, session.user.name)

// Migrate existing workflows to the default workspace
await migrateExistingWorkflows(session.user.id, defaultWorkspace.id)
// Migrate existing workflows to the default workspace
await migrateExistingWorkflows(session.user.id, defaultWorkspace.id)

return NextResponse.json({ workspaces: [defaultWorkspace] })
}
return NextResponse.json({ workspaces: [defaultWorkspace] })
}

// If user has workspaces but might have orphaned workflows, migrate them
await ensureWorkflowsHaveWorkspace(session.user.id, userWorkspaces[0].workspace.id)

// If user has workspaces but might have orphaned workflows, migrate them
await ensureWorkflowsHaveWorkspace(session.user.id, userWorkspaces[0].workspace.id)
// Format the response with permission information
const workspacesWithPermissions = userWorkspaces.map(
({ workspace: workspaceDetails, permissionType }) => ({
...workspaceDetails,
role: permissionType === 'admin' ? 'owner' : 'member', // Map admin to owner for compatibility
permissions: permissionType,
})
)

// Format the response with permission information
const workspacesWithPermissions = userWorkspaces.map(
({ workspace: workspaceDetails, permissionType }) => ({
...workspaceDetails,
role: permissionType === 'admin' ? 'owner' : 'member', // Map admin to owner for compatibility
permissions: permissionType,
logger.info('Successfully fetched workspaces', {
userId: session.user.id,
workspaceCount: workspacesWithPermissions.length
})
)

return NextResponse.json({ workspaces: workspacesWithPermissions })
return NextResponse.json({ workspaces: workspacesWithPermissions })
} catch (error) {
logger.error('Failed to fetch workspaces:', error)
return NextResponse.json(
{ error: 'Failed to fetch workspaces', details: error instanceof Error ? error.message : 'Unknown error' },
{ status: 500 }
)
}
}

// POST /api/workspaces - Create a new workspace
Expand Down
12 changes: 12 additions & 0 deletions apps/sim/contexts/socket-context.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -165,12 +165,24 @@ export function SocketProvider({ children, user }: SocketProviderProps) {
const token = await generateSocketToken()

const socketUrl = getEnv('NEXT_PUBLIC_SOCKET_URL') || 'http://localhost:3002'

// Validate that we have a proper socket URL and it's not defaulting to the main domain
if (socketUrl.includes('buildz.ai') && !socketUrl.includes('ws.buildz.ai')) {
logger.error('Invalid socket URL detected - should use ws.buildz.ai subdomain', {
socketUrl,
envVar: getEnv('NEXT_PUBLIC_SOCKET_URL'),
processEnv: process.env.NEXT_PUBLIC_SOCKET_URL,
})
throw new Error('Socket server URL misconfiguration detected')
}

logger.info('Attempting to connect to Socket.IO server', {
url: socketUrl,
userId: user?.id || 'no-user',
hasToken: !!token,
timestamp: new Date().toISOString(),
envVar: getEnv('NEXT_PUBLIC_SOCKET_URL'),
processEnv: process.env.NEXT_PUBLIC_SOCKET_URL,
})

const socketInstance = io(socketUrl, {
Expand Down
3 changes: 3 additions & 0 deletions apps/sim/socket-server/config/socket.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ function getAllowedOrigins(): string[] {
env.NEXT_PUBLIC_VERCEL_URL,
'http://localhost:3000',
'http://localhost:3001',
// Explicitly add buildz.ai domains
'https://buildz.ai',
'https://www.buildz.ai',
...(env.ALLOWED_ORIGINS?.split(',') || []),
].filter((url): url is string => Boolean(url))

Expand Down
59 changes: 59 additions & 0 deletions deploy-buildz-fix.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash

# Deployment script for buildz.ai WebSocket and API fixes
# This script applies the necessary configurations to fix the 500 error and WebSocket connection issues

set -e

echo "🚀 Deploying fixes for buildz.ai..."

# Check if kubectl is available
if ! command -v kubectl &> /dev/null; then
echo "❌ kubectl not found. Please install kubectl and configure it for your cluster."
exit 1
fi

# Check if we're in the right directory
if [[ ! -f "helm/sim/examples/values-gcp-buildz.yaml" ]]; then
echo "❌ Please run this script from the project root directory"
exit 1
fi

echo "📋 Applying configurations..."

# Apply the backend config for WebSocket support
echo "1. Creating BackendConfig for WebSocket support..."
kubectl apply -f helm/sim/examples/backend-config-buildz.yaml

# Apply the updated ingress configuration
echo "2. Updating Ingress configuration..."
kubectl apply -f helm/sim/examples/ingress-buildz.yaml

# Update the Helm deployment with the latest configurations
echo "3. Upgrading Helm deployment..."
helm upgrade sim-gcp ./helm/sim \
--namespace simstudio \
--values helm/sim/examples/values-gcp-buildz.yaml \
--wait \
--timeout=10m

echo "4. Waiting for pods to be ready..."
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=sim-gcp --namespace=simstudio --timeout=300s

echo "5. Checking deployment status..."
kubectl get pods -n simstudio -l app.kubernetes.io/name=sim-gcp

echo "6. Checking ingress status..."
kubectl get ingress -n simstudio sim-ingress

echo "✅ Deployment completed successfully!"
echo ""
echo "🔍 To verify the fixes:"
echo "1. Check API endpoint: curl -H 'Authorization: Bearer <token>' https://buildz.ai/api/workspaces"
echo "2. Check WebSocket health: curl https://ws.buildz.ai/health"
echo "3. Monitor logs: kubectl logs -n simstudio -l app=sim-gcp-realtime -f"
echo ""
echo "📝 If issues persist, check:"
echo "- DNS resolution for ws.buildz.ai"
echo "- SSL certificate for ws.buildz.ai subdomain"
echo "- Environment variable NEXT_PUBLIC_SOCKET_URL in the app pods"
Loading