Skip to content

Latest commit

 

History

History
1121 lines (901 loc) · 42.9 KB

File metadata and controls

1121 lines (901 loc) · 42.9 KB

Frontend Design Specification

Cloud Console Automation System - Electron + Next.js Frontend

Version: 1.0
Last Updated: November 2025
Tech Stack: Electron 28+, Next.js 14+, TypeScript, Tailwind CSS, Zustand (state), Socket.io-client


1. Overview

1.1 Purpose

The Electron frontend provides a desktop application that embeds the browser-use controlled browser directly within the application window. Users can:

  • View and interact with the GCP console browser
  • Send natural language commands via chat interface
  • Monitor agent thinking and actions in real-time
  • Control agent execution (play/pause)
  • Review task history with screenshots

1.2 High-Level Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           ELECTRON APPLICATION                               │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │                         MAIN PROCESS                                    │ │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │ │
│  │  │   Backend   │  │   Window    │  │  BrowserView│  │     IPC       │  │ │
│  │  │  Launcher   │  │   Manager   │  │   Manager   │  │   Handler     │  │ │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └───────────────┘  │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                     │                                        │
│                              IPC Communication                               │
│                                     │                                        │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │                       RENDERER PROCESS (Next.js)                        │ │
│  │                                                                          │ │
│  │  ┌──────────────────────────────────┬──────────────────────────────┐   │ │
│  │  │                                  │                              │   │ │
│  │  │      BROWSER VIEW AREA           │      CHAT PANEL              │   │ │
│  │  │      (Embedded Browser)          │                              │   │ │
│  │  │                                  │   ┌──────────────────────┐   │   │ │
│  │  │   ┌─────┬─────┬─────┐           │   │  Agent Messages      │   │   │ │
│  │  │   │ Tab │ Tab │ Tab │           │   │  - Thinking          │   │   │ │
│  │  │   └─────┴─────┴─────┘           │   │  - Actions           │   │   │ │
│  │  │   ┌─────────────────────────┐   │   │  - Responses         │   │   │ │
│  │  │   │                         │   │   └──────────────────────┘   │   │ │
│  │  │   │   GCP Console View      │   │                              │   │ │
│  │  │   │   (Click-enabled)       │   │   ┌──────────────────────┐   │   │ │
│  │  │   │                         │   │   │  Command Input       │   │   │ │
│  │  │   │                         │   │   │  [____________] Send │   │   │ │
│  │  │   └─────────────────────────┘   │   └──────────────────────┘   │   │ │
│  │  │                                  │                              │   │ │
│  │  └──────────────────────────────────┴──────────────────────────────┘   │ │
│  │                                                                          │ │
│  │  ┌──────────────────────────────────────────────────────────────────┐   │ │
│  │  │                    CONTROL BAR                                    │   │ │
│  │  │   [⏸ Pause/▶ Play]  Status: Running Task  [📋 History]           │   │ │
│  │  └──────────────────────────────────────────────────────────────────┘   │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

1.3 Key Features

Feature Description
Embedded Browser Real Chromium browser embedded via Electron BrowserView
Tab Display Shows browser tabs (read-only, user cannot manage)
Click Passthrough User can click in browser for manual actions
Chat Interface Send commands, view agent responses
Real-time Updates Live agent thinking, actions, progress
Play/Pause Control Start/stop agent execution
Task History View past tasks, screenshots, logs

2. Directory Structure

frontend/
├── package.json                       # Dependencies and scripts
├── tsconfig.json                      # TypeScript configuration
├── next.config.js                     # Next.js configuration
├── tailwind.config.js                 # Tailwind CSS configuration
├── electron-builder.yml               # Electron builder config
├── .env.local                         # Environment variables
│
├── main/                              # Electron main process
│   ├── index.ts                       # Main entry point
│   ├── window.ts                      # Window management
│   ├── browser-view.ts                # BrowserView management for embedded browser
│   ├── backend-launcher.ts            # Launches Django backend subprocess
│   ├── ipc-handlers.ts                # IPC communication handlers
│   ├── preload.ts                     # Preload script for renderer security
│   └── utils/
│       ├── paths.ts                   # Path utilities
│       └── logger.ts                  # Main process logging
│
├── renderer/                          # Next.js renderer process
│   ├── app/                           # Next.js App Router
│   │   ├── layout.tsx                 # Root layout with providers
│   │   ├── page.tsx                   # Main application page
│   │   ├── globals.css                # Global styles
│   │   └── history/
│   │       └── page.tsx               # Task history page
│   │
│   ├── components/
│   │   ├── layout/
│   │   │   ├── AppShell.tsx           # Main app shell with split layout
│   │   │   ├── Header.tsx             # Application header (minimal)
│   │   │   └── ControlBar.tsx         # Bottom control bar with play/pause
│   │   │
│   │   ├── browser/
│   │   │   ├── BrowserContainer.tsx   # Container for embedded browser
│   │   │   ├── TabBar.tsx             # Display browser tabs (read-only)
│   │   │   ├── TabItem.tsx            # Individual tab display
│   │   │   ├── BrowserOverlay.tsx     # Overlay for click capture
│   │   │   └── LoadingState.tsx       # Browser loading indicator
│   │   │
│   │   ├── chat/
│   │   │   ├── ChatPanel.tsx          # Main chat container
│   │   │   ├── MessageList.tsx        # Scrollable message list
│   │   │   ├── MessageItem.tsx        # Individual message component
│   │   │   ├── ThinkingMessage.tsx    # Agent thinking indicator
│   │   │   ├── ActionMessage.tsx      # Agent action display
│   │   │   ├── ErrorMessage.tsx       # Error message display
│   │   │   ├── CommandInput.tsx       # User command input field
│   │   │   └── TypingIndicator.tsx    # Agent is processing indicator
│   │   │
│   │   ├── controls/
│   │   │   ├── PlayPauseButton.tsx    # Play/pause toggle button
│   │   │   ├── StatusIndicator.tsx    # Current agent status display
│   │   │   ├── ProgressBar.tsx        # Task progress indicator
│   │   │   └── HistoryButton.tsx      # Navigate to history view
│   │   │
│   │   ├── history/
│   │   │   ├── HistoryPanel.tsx       # Task history list container
│   │   │   ├── TaskCard.tsx           # Individual task summary card
│   │   │   ├── TaskDetail.tsx         # Expanded task details view
│   │   │   ├── ScreenshotFlow.tsx     # Horizontal scrollable screenshots
│   │   │   ├── ScreenshotItem.tsx     # Individual screenshot with caption
│   │   │   └── LogViewer.tsx          # Task log display
│   │   │
│   │   ├── modals/
│   │   │   ├── LoginPrompt.tsx        # Manual login required modal
│   │   │   ├── ErrorModal.tsx         # Error display modal
│   │   │   └── ConfirmDialog.tsx      # Generic confirmation dialog
│   │   │
│   │   └── ui/                        # Reusable UI primitives
│   │       ├── Button.tsx
│   │       ├── Input.tsx
│   │       ├── Card.tsx
│   │       ├── ScrollArea.tsx
│   │       ├── Tooltip.tsx
│   │       ├── Badge.tsx
│   │       ├── Spinner.tsx
│   │       └── Toast.tsx
│   │
│   ├── hooks/
│   │   ├── useWebSocket.ts            # WebSocket connection hook
│   │   ├── useAgentStatus.ts          # Agent status subscription
│   │   ├── useChat.ts                 # Chat messages and commands
│   │   ├── useTasks.ts                # Task management hook
│   │   ├── useElectron.ts             # Electron IPC communication
│   │   ├── useBrowserView.ts          # Browser view control
│   │   └── useHistory.ts              # Task history fetching
│   │
│   ├── stores/
│   │   ├── index.ts                   # Store exports
│   │   ├── agentStore.ts              # Agent state (status, paused, etc.)
│   │   ├── chatStore.ts               # Chat messages state
│   │   ├── taskStore.ts               # Current and queued tasks
│   │   ├── sessionStore.ts            # Session state (logged in, etc.)
│   │   └── uiStore.ts                 # UI state (modals, panels, etc.)
│   │
│   ├── services/
│   │   ├── api.ts                     # REST API client
│   │   ├── websocket.ts               # WebSocket service
│   │   └── electron-bridge.ts         # Electron IPC bridge
│   │
│   ├── types/
│   │   ├── agent.ts                   # Agent-related types
│   │   ├── chat.ts                    # Chat message types
│   │   ├── task.ts                    # Task types
│   │   ├── session.ts                 # Session types
│   │   ├── websocket.ts               # WebSocket message types
│   │   └── electron.ts                # Electron IPC types
│   │
│   ├── lib/
│   │   ├── utils.ts                   # Utility functions
│   │   ├── constants.ts               # Application constants
│   │   └── formatters.ts              # Date, text formatters
│   │
│   └── styles/
│       └── themes/
│           └── dark.css               # Dark theme (primary)
│
├── public/
│   ├── icons/                         # App icons
│   └── fonts/                         # Custom fonts
│
└── resources/                         # Electron build resources
    ├── icon.icns                      # macOS icon
    ├── icon.ico                       # Windows icon
    └── icon.png                       # Linux icon

3. Electron Main Process

3.1 Main Entry Point (main/index.ts)

Responsibilities:

  • Initialize Electron application
  • Launch Django backend as subprocess
  • Create main application window
  • Set up IPC handlers
  • Manage application lifecycle

Startup Flow:

1. App ready event
2. Launch Django backend subprocess
3. Wait for backend health check
4. Create main BrowserWindow
5. Create BrowserView for embedded browser
6. Load Next.js renderer
7. Connect to backend WebSocket
8. Request browser CDP info from backend
9. Attach BrowserView to CDP endpoint

3.2 Window Manager (main/window.ts)

Responsibilities:

  • Create and configure main application window
  • Handle window events (close, minimize, maximize)
  • Manage window state persistence
  • Configure window frame and title bar

Window Configuration:

Property Value Description
Width 1400px Default window width
Height 900px Default window height
Min Width 1200px Minimum width
Min Height 700px Minimum height
Frame true Use native window frame
Title Bar Style hidden (macOS) Custom title bar on macOS

3.3 BrowserView Manager (main/browser-view.ts)

This is the most critical component for embedding the browser.

Responsibilities:

  • Create Electron BrowserView for embedded browser
  • Connect BrowserView to browser-use browser's CDP endpoint
  • Handle BrowserView bounds (resize with window)
  • Manage BrowserView lifecycle
  • Forward user clicks from overlay to actual browser

How Browser Embedding Works:

┌─────────────────────────────────────────────────────────────┐
│                    ELECTRON WINDOW                           │
│  ┌───────────────────────────────────────────────────────┐  │
│  │               RENDERER (Next.js)                       │  │
│  │  ┌─────────────────────────┐  ┌─────────────────────┐ │  │
│  │  │   BrowserContainer      │  │    ChatPanel        │ │  │
│  │  │   (Placeholder div)     │  │                     │ │  │
│  │  │                         │  │                     │ │  │
│  │  │   This area is where    │  │                     │ │  │
│  │  │   BrowserView renders   │  │                     │ │  │
│  │  │                         │  │                     │ │  │
│  │  └─────────────────────────┘  └─────────────────────┘ │  │
│  └───────────────────────────────────────────────────────┘  │
│                            │                                 │
│  ┌─────────────────────────▼─────────────────────────────┐  │
│  │              ELECTRON BROWSERVIEW                      │  │
│  │         (Attached below renderer, z-indexed)          │  │
│  │                                                        │  │
│  │    Connected to browser-use browser via CDP            │  │
│  │    ws://localhost:9222/devtools/browser/xxxxx         │  │
│  │                                                        │  │
│  └────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

BrowserView Attachment Method:

  1. Backend starts browser-use browser with --remote-debugging-port=9222
  2. Backend returns CDP WebSocket endpoint to frontend
  3. Electron main process creates BrowserView
  4. BrowserView navigates to http://localhost:9222 (DevTools frontend) or directly attaches to the browser page
  5. BrowserView bounds are calculated based on BrowserContainer element position
  6. BrowserView is attached to main window at correct position

Alternative Approach (Simpler):

Instead of complex CDP attachment, we can:

  1. Let browser-use open browser normally (headed mode)
  2. Get the browser window handle
  3. Use Electron's BrowserView.setAutoResize() to embed the browser window

Key Methods:

Method Description
createBrowserView() Creates new BrowserView instance
attachToCDP(endpoint) Connects BrowserView to CDP endpoint
updateBounds(rect) Updates BrowserView position/size
forwardClick(x, y) Sends click event to embedded browser
destroy() Cleans up BrowserView

3.4 Backend Launcher (main/backend-launcher.ts)

Responsibilities:

  • Spawn Django backend as child process
  • Monitor backend health
  • Handle backend crashes/restarts
  • Graceful shutdown on app close

Process Management:

Event Action
App Start Spawn uvicorn process
Backend Exit Attempt restart (max 3 times)
App Close Send SIGTERM, wait, then SIGKILL
Backend Ready Emit event to renderer

3.5 IPC Handlers (main/ipc-handlers.ts)

Responsibilities:

  • Handle IPC messages from renderer
  • Coordinate between renderer and main process
  • Forward browser control commands

IPC Channels:

Channel Direction Purpose
browser:get-bounds Renderer → Main Get BrowserView position
browser:update-bounds Renderer → Main Update BrowserView position
browser:forward-click Renderer → Main Forward click to browser
backend:status Main → Renderer Backend health status
app:ready Main → Renderer App fully initialized

3.6 Preload Script (main/preload.ts)

Responsibilities:

  • Expose safe APIs to renderer via contextBridge
  • Prevent direct Node.js access in renderer
  • Define IPC communication interface

Exposed APIs:

window.electronAPI = {
  // Browser control
  browser: {
    updateBounds: (rect) => ipcRenderer.invoke('browser:update-bounds', rect),
    forwardClick: (x, y) => ipcRenderer.invoke('browser:forward-click', x, y),
  },
  // Backend status
  backend: {
    onStatusChange: (callback) => ipcRenderer.on('backend:status', callback),
  },
  // App lifecycle
  app: {
    onReady: (callback) => ipcRenderer.on('app:ready', callback),
    quit: () => ipcRenderer.send('app:quit'),
  }
}

4. Renderer Process (Next.js)

4.1 Page Structure

Main Page (app/page.tsx)

The primary application interface with split layout.

Layout Structure:

┌─────────────────────────────────────────────────────────────┐
│  AppShell                                                    │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  Header (minimal - app title, status)                  │  │
│  └───────────────────────────────────────────────────────┘  │
│  ┌─────────────────────────────┬─────────────────────────┐  │
│  │                             │                         │  │
│  │     BrowserContainer        │      ChatPanel          │  │
│  │     (70% width)             │      (30% width)        │  │
│  │                             │                         │  │
│  │  ┌─────────────────────┐   │   ┌─────────────────┐   │  │
│  │  │  TabBar             │   │   │  MessageList    │   │  │
│  │  │  [Tab][Tab][Tab]    │   │   │  (scrollable)   │   │  │
│  │  └─────────────────────┘   │   │                 │   │  │
│  │  ┌─────────────────────┐   │   │                 │   │  │
│  │  │                     │   │   │                 │   │  │
│  │  │  Browser View Area  │   │   │                 │   │  │
│  │  │  (BrowserView here) │   │   │                 │   │  │
│  │  │                     │   │   └─────────────────┘   │  │
│  │  │                     │   │   ┌─────────────────┐   │  │
│  │  │                     │   │   │ CommandInput    │   │  │
│  │  └─────────────────────┘   │   └─────────────────┘   │  │
│  │                             │                         │  │
│  └─────────────────────────────┴─────────────────────────┘  │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  ControlBar                                            │  │
│  │  [▶/⏸ Play/Pause] [Status: Running] [📜 History]     │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

History Page (app/history/page.tsx)

Task history view with screenshots and logs.

Layout Structure:

┌─────────────────────────────────────────────────────────────┐
│  HistoryPanel                                                │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  Header: Task History                    [← Back]      │  │
│  └───────────────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  TaskCard (completed)                                  │  │
│  │  "Create a new Cloud Storage bucket named test-bucket" │  │
│  │  Status: ✅ Completed | Duration: 45s | 12 steps      │  │
│  │  ┌─────────────────────────────────────────────────┐  │  │
│  │  │ ScreenshotFlow (horizontal scroll)              │  │  │
│  │  │ [📷][📷][📷][📷][📷][📷] →                      │  │  │
│  │  └─────────────────────────────────────────────────┘  │  │
│  │  [View Logs]                                          │  │
│  └───────────────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  TaskCard (failed)                                     │  │
│  │  "Delete all unused VM instances"                      │  │
│  │  Status: ❌ Failed | Error: Permission denied          │  │
│  │  ...                                                   │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

4.2 Component Details

BrowserContainer (components/browser/BrowserContainer.tsx)

Purpose: Provides the container where the Electron BrowserView will be positioned.

Responsibilities:

  • Renders a placeholder div with specific dimensions
  • Reports its position/size to main process via IPC
  • Updates main process when container resizes
  • Handles click events on overlay and forwards to browser

Behavior:

  • On mount: Calculate bounds, send to main process
  • On resize: Recalculate bounds, update main process
  • On click: Capture click coordinates, forward via IPC

TabBar (components/browser/TabBar.tsx)

Purpose: Displays browser tabs (read-only).

Responsibilities:

  • Fetch tab information from backend/agent
  • Display tab titles and favicons
  • Show active tab indicator
  • Tabs are NOT clickable (agents control tabs)

Data Source:

  • WebSocket events when agent changes tabs
  • Backend API for initial tab state

ChatPanel (components/chat/ChatPanel.tsx)

Purpose: Main chat interface container.

Responsibilities:

  • Contains MessageList and CommandInput
  • Manages chat scroll behavior
  • Auto-scrolls to new messages

MessageList (components/chat/MessageList.tsx)

Purpose: Scrollable list of chat messages.

Responsibilities:

  • Render different message types
  • Virtual scrolling for performance (if many messages)
  • Smooth scroll to bottom on new messages

MessageItem (components/chat/MessageItem.tsx)

Purpose: Renders individual chat messages.

Message Types and Styling:

Type Visual Style Icon
user_command Blue background, right-aligned 👤 User
agent_thinking Gray italic, animated dots 🤔
agent_action Yellow background ⚡ Action
agent_response Green background 🤖 Agent
system Neutral gray ℹ️ System
error Red background ❌ Error

ThinkingMessage (components/chat/ThinkingMessage.tsx)

Purpose: Animated display of agent thinking process.

Features:

  • Animated typing indicator
  • Shows partial thoughts as they stream
  • Collapses when action is taken

ActionMessage (components/chat/ActionMessage.tsx)

Purpose: Displays agent actions.

Features:

  • Action type icon (click, type, navigate, etc.)
  • Target element description
  • Success/failure indicator
  • Expandable details

CommandInput (components/chat/CommandInput.tsx)

Purpose: User input field for natural language commands.

Features:

  • Text input with placeholder
  • Send button
  • Disabled when agent is running (unless paused)
  • Enter key to submit
  • Input validation

PlayPauseButton (components/controls/PlayPauseButton.tsx)

Purpose: Toggle button for agent execution.

States:

  • Play (▶): Agent is paused, click to resume
  • Pause (⏸): Agent is running, click to pause
  • Disabled: No task running

StatusIndicator (components/controls/StatusIndicator.tsx)

Purpose: Shows current agent/system status.

Status Values:

Status Display Color
idle "Ready" Gray
running "Running: {task}" Green
paused "Paused" Yellow
waiting_login "Waiting for login" Blue
error "Error occurred" Red

LoginPrompt (components/modals/LoginPrompt.tsx)

Purpose: Modal prompting user to complete GCP login.

Features:

  • Explains manual login requirement
  • "I've logged in" confirmation button
  • Appears when session starts
  • Non-dismissable until login confirmed

4.3 State Management (Zustand Stores)

agentStore (stores/agentStore.ts)

State:

Property Type Description
status enum idle, running, paused, error
isPaused boolean Whether agent is paused
currentAction string Current action description
thinking string Current thinking content

Actions:

Action Description
setStatus(status) Update agent status
pause() Request pause
resume() Request resume
setThinking(text) Update thinking text
clearThinking() Clear thinking display

chatStore (stores/chatStore.ts)

State:

Property Type Description
messages Message[] All chat messages
isLoading boolean Command being processed

Actions:

Action Description
addMessage(msg) Add new message
updateMessage(id, data) Update existing message
loadHistory() Load messages from backend
clearMessages() Clear all messages

taskStore (stores/taskStore.ts)

State:

Property Type Description
currentTask Task Currently executing task
history Task[] Completed tasks

Actions:

Action Description
submitTask(command) Submit new task
setCurrentTask(task) Set current task
completeTask(result) Mark task complete
loadHistory() Fetch task history

sessionStore (stores/sessionStore.ts)

State:

Property Type Description
isConnected boolean Backend connected
isLoggedIn boolean GCP login complete
sessionId string Current session ID

Actions:

Action Description
connect() Establish session
confirmLogin() Confirm GCP login
disconnect() End session

4.4 Services

API Service (services/api.ts)

Purpose: REST API client for backend communication.

Endpoints Wrapped:

Method Description
sessions.start() Start new session
sessions.current() Get current session
sessions.confirmLogin() Confirm login complete
tasks.submit(command) Submit new task
tasks.list() Get task history
tasks.get(id) Get task details
tasks.pause(id) Pause task
tasks.resume(id) Resume task
chat.getMessages() Get chat history
screenshots.getForTask(taskId) Get task screenshots

WebSocket Service (services/websocket.ts)

Purpose: Manages WebSocket connection for real-time updates.

Features:

  • Auto-reconnect on disconnect
  • Event subscription system
  • Message parsing and routing
  • Heartbeat handling

Event Handlers:

Event Handler
agent.thinking Update chatStore and agentStore
agent.action Add action message to chat
agent.status_change Update agentStore status
task.completed Update taskStore, add message
screenshot.captured Notify for history refresh
session.login_required Show login modal

Electron Bridge (services/electron-bridge.ts)

Purpose: Wrapper around window.electronAPI for type safety.

Methods:

Method Description
updateBrowserBounds(rect) Update BrowserView position
forwardClick(x, y) Forward click to browser
onBackendReady(callback) Listen for backend ready
isElectron() Check if running in Electron

4.5 Hooks

useWebSocket (hooks/useWebSocket.ts)

Purpose: Hook for WebSocket connection and events.

Returns:

  • isConnected: Connection status
  • subscribe(event, handler): Subscribe to events
  • send(event, data): Send message
  • reconnect(): Force reconnect

useAgentStatus (hooks/useAgentStatus.ts)

Purpose: Hook for agent status and control.

Returns:

  • status: Current status
  • isPaused: Pause state
  • thinking: Current thinking
  • pause(): Pause agent
  • resume(): Resume agent

useChat (hooks/useChat.ts)

Purpose: Hook for chat functionality.

Returns:

  • messages: All messages
  • sendCommand(text): Send new command
  • isLoading: Command processing

useBrowserView (hooks/useBrowserView.ts)

Purpose: Hook for browser view management.

Returns:

  • containerRef: Ref for container div
  • handleClick(event): Click handler for overlay

5. Visual Design

5.1 Color Scheme (Dark Theme)

Element Color Hex
Background Dark Gray #0f0f0f
Surface Slightly lighter #1a1a1a
Border Subtle gray #2a2a2a
Primary Blue accent #3b82f6
Success Green #22c55e
Warning Yellow #eab308
Error Red #ef4444
Text Primary White #ffffff
Text Secondary Gray #9ca3af

5.2 Typography

Element Font Size Weight
App Title Inter 18px 600
Tab Labels Inter 13px 400
Chat Messages Inter 14px 400
Command Input Inter 14px 400
Status Text Inter 12px 500
Timestamps Inter 11px 400

5.3 Layout Specifications

Element Dimension
Browser Area Width 70% of window
Chat Panel Width 30% of window
Minimum Chat Width 300px
Tab Bar Height 40px
Control Bar Height 48px
Message Padding 12px
Border Radius 8px (cards), 4px (buttons)

5.4 Component Visual States

PlayPauseButton States

State Background Icon Cursor
Playing Green pointer
Paused Yellow pointer
Disabled Gray not-allowed
Hover Lighter shade - -

Message Types Visual

Type Background Border Text Color
User Command #1e3a5f Blue left White
Agent Thinking #1f1f1f None Gray italic
Agent Action #2d2a1f Yellow left White
Agent Response #1f2d1f Green left White
Error #2d1f1f Red left White

6. User Interaction Flows

6.1 Application Startup Flow

1. User launches app
   ↓
2. Splash screen shows "Starting..."
   ↓
3. Backend subprocess starts
   ↓
4. Backend health check passes
   ↓
5. Main window appears
   ↓
6. Browser view initializes
   ↓
7. Navigates to GCP console
   ↓
8. Login modal appears: "Please log into Google Cloud"
   ↓
9. User logs in manually in browser view
   ↓
10. User clicks "I've logged in" button
    ↓
11. System verifies login (checks URL/cookies)
    ↓
12. Chat panel shows "Ready! Enter a command."
    ↓
13. User can now enter commands

6.2 Task Execution Flow

1. User types command: "Create a new VM named test-vm"
   ↓
2. User clicks Send (or presses Enter)
   ↓
3. CommandInput disabled, shows loading
   ↓
4. User command appears in chat
   ↓
5. Agent starts, status changes to "Running"
   ↓
6. WebSocket sends "agent.thinking" events
   ↓
7. Chat shows animated thinking messages
   ↓
8. WebSocket sends "agent.action" events
   ↓
9. Chat shows action messages (Click, Type, Navigate)
   ↓
10. Browser view shows actions happening
    ↓
11. Screenshots captured at key moments
    ↓
12. Task completes, status returns to "Ready"
    ↓
13. Chat shows completion message
    ↓
14. CommandInput re-enabled

6.3 Pause/Resume Flow

1. Agent is running task
   ↓
2. User clicks Pause button
   ↓
3. Button changes to Play icon
   ↓
4. Status shows "Paused"
   ↓
5. Agent stops at next safe point
   ↓
6. Chat shows "Agent paused by user"
   ↓
7. User can click in browser if needed
   ↓
8. User clicks Play button
   ↓
9. Agent resumes from where it stopped
   ↓
10. Status returns to "Running"

6.4 User Manual Click Flow

1. Agent is running or paused
   ↓
2. User clicks on browser area
   ↓
3. BrowserOverlay captures click coordinates
   ↓
4. Coordinates sent to main process via IPC
   ↓
5. Main process forwards to BrowserView
   ↓
6. Click executed in actual browser
   ↓
7. Agent observes page change (if relevant)

7. Error Handling

7.1 Error Display Strategy

Error Type Display Method
Connection Lost Toast + status indicator
Task Failed Chat error message + modal option
Backend Crash Full-screen error with restart button
Login Required Modal prompt
Invalid Command Chat error message

7.2 Error Message Format

┌────────────────────────────────────────────────┐
│ ❌ Error                                        │
│                                                 │
│ Task failed: Permission denied                  │
│                                                 │
│ The agent could not create the VM because       │
│ you don't have compute.instances.create         │
│ permission in this project.                     │
│                                                 │
│ [View Details] [Dismiss]                        │
└────────────────────────────────────────────────┘

7.3 Recovery Actions

Error Recovery
WebSocket disconnect Auto-reconnect with backoff
Backend crash Show restart dialog
Agent failure Allow retry or new command
Browser crash Restart browser session

8. Performance Considerations

8.1 Optimization Strategies

Area Strategy
Chat Messages Virtual scrolling for 100+ messages
Screenshots Lazy loading in history view
WebSocket Debounce rapid thinking updates
BrowserView Throttle bounds updates on resize

8.2 Memory Management

  • Clear old thinking messages after task completes
  • Limit in-memory message history (paginate older)
  • Cleanup screenshot blobs after viewing

9. Accessibility

9.1 Keyboard Navigation

Key Action
Tab Navigate between controls
Enter Submit command / Confirm dialogs
Space Toggle play/pause
Escape Close modals
Ctrl+H Open history

9.2 Screen Reader Support

  • Proper ARIA labels on all controls
  • Live regions for status updates
  • Meaningful alt text for screenshots

10. Build and Distribution

10.1 Development

# Install dependencies
npm install

# Start Next.js dev server
npm run dev

# Start Electron
npm run electron:dev

10.2 Production Build

# Build Next.js
npm run build

# Package Electron
npm run electron:build

10.3 Distribution Targets

Platform Format
Windows .exe installer
macOS .dmg disk image
Linux .AppImage, .deb

10.4 Auto-Update (Future)

  • Electron-updater integration
  • Update notifications in app
  • Background download and install

11. Dependencies

11.1 Production Dependencies

{
  "next": "^14.0.0",
  "react": "^18.2.0",
  "react-dom": "^18.2.0",
  "electron": "^28.0.0",
  "zustand": "^4.4.0",
  "socket.io-client": "^4.7.0",
  "tailwindcss": "^3.4.0",
  "@tanstack/react-virtual": "^3.0.0",
  "lucide-react": "^0.300.0"
}

11.2 Dev Dependencies

{
  "typescript": "^5.3.0",
  "electron-builder": "^24.9.0",
  "@types/react": "^18.2.0",
  "concurrently": "^8.2.0",
  "wait-on": "^7.2.0"
}

12. Security Considerations

12.1 Electron Security

  • Context isolation enabled
  • Node integration disabled in renderer
  • Preload script for safe API exposure
  • CSP headers configured

12.2 IPC Security

  • Validate all IPC messages
  • Whitelist allowed IPC channels
  • No arbitrary code execution

12.3 Browser Embedding Security

  • BrowserView sandboxed
  • No cross-origin access from renderer
  • CDP connection localhost only

13. Future Enhancements (Out of Scope for V1)

  • Light theme option
  • Customizable layout (panel sizes)
  • Command history (up arrow)
  • Keyboard shortcuts panel
  • Export chat/history to PDF
  • Multi-monitor support
  • Custom accent colors
  • Zoom controls for browser view

End of Frontend Design Specification