Version: 1.0
Last Updated: November 2025
Tech Stack: Electron 28+, Next.js 14+, TypeScript, Tailwind CSS, Zustand (state), Socket.io-client
The Electron frontend provides a desktop application that embeds the browser-use controlled browser directly within the application window. Users can:
- View and interact with the GCP console browser
- Send natural language commands via chat interface
- Monitor agent thinking and actions in real-time
- Control agent execution (play/pause)
- Review task history with screenshots
┌─────────────────────────────────────────────────────────────────────────────┐
│ ELECTRON APPLICATION │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ MAIN PROCESS │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │ │
│ │ │ Backend │ │ Window │ │ BrowserView│ │ IPC │ │ │
│ │ │ Launcher │ │ Manager │ │ Manager │ │ Handler │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └───────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ IPC Communication │
│ │ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ RENDERER PROCESS (Next.js) │ │
│ │ │ │
│ │ ┌──────────────────────────────────┬──────────────────────────────┐ │ │
│ │ │ │ │ │ │
│ │ │ BROWSER VIEW AREA │ CHAT PANEL │ │ │
│ │ │ (Embedded Browser) │ │ │ │
│ │ │ │ ┌──────────────────────┐ │ │ │
│ │ │ ┌─────┬─────┬─────┐ │ │ Agent Messages │ │ │ │
│ │ │ │ Tab │ Tab │ Tab │ │ │ - Thinking │ │ │ │
│ │ │ └─────┴─────┴─────┘ │ │ - Actions │ │ │ │
│ │ │ ┌─────────────────────────┐ │ │ - Responses │ │ │ │
│ │ │ │ │ │ └──────────────────────┘ │ │ │
│ │ │ │ GCP Console View │ │ │ │ │
│ │ │ │ (Click-enabled) │ │ ┌──────────────────────┐ │ │ │
│ │ │ │ │ │ │ Command Input │ │ │ │
│ │ │ │ │ │ │ [____________] Send │ │ │ │
│ │ │ └─────────────────────────┘ │ └──────────────────────┘ │ │ │
│ │ │ │ │ │ │
│ │ └──────────────────────────────────┴──────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────────┐ │ │
│ │ │ CONTROL BAR │ │ │
│ │ │ [⏸ Pause/▶ Play] Status: Running Task [📋 History] │ │ │
│ │ └──────────────────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Feature | Description |
|---|---|
| Embedded Browser | Real Chromium browser embedded via Electron BrowserView |
| Tab Display | Shows browser tabs (read-only, user cannot manage) |
| Click Passthrough | User can click in browser for manual actions |
| Chat Interface | Send commands, view agent responses |
| Real-time Updates | Live agent thinking, actions, progress |
| Play/Pause Control | Start/stop agent execution |
| Task History | View past tasks, screenshots, logs |
frontend/
├── package.json # Dependencies and scripts
├── tsconfig.json # TypeScript configuration
├── next.config.js # Next.js configuration
├── tailwind.config.js # Tailwind CSS configuration
├── electron-builder.yml # Electron builder config
├── .env.local # Environment variables
│
├── main/ # Electron main process
│ ├── index.ts # Main entry point
│ ├── window.ts # Window management
│ ├── browser-view.ts # BrowserView management for embedded browser
│ ├── backend-launcher.ts # Launches Django backend subprocess
│ ├── ipc-handlers.ts # IPC communication handlers
│ ├── preload.ts # Preload script for renderer security
│ └── utils/
│ ├── paths.ts # Path utilities
│ └── logger.ts # Main process logging
│
├── renderer/ # Next.js renderer process
│ ├── app/ # Next.js App Router
│ │ ├── layout.tsx # Root layout with providers
│ │ ├── page.tsx # Main application page
│ │ ├── globals.css # Global styles
│ │ └── history/
│ │ └── page.tsx # Task history page
│ │
│ ├── components/
│ │ ├── layout/
│ │ │ ├── AppShell.tsx # Main app shell with split layout
│ │ │ ├── Header.tsx # Application header (minimal)
│ │ │ └── ControlBar.tsx # Bottom control bar with play/pause
│ │ │
│ │ ├── browser/
│ │ │ ├── BrowserContainer.tsx # Container for embedded browser
│ │ │ ├── TabBar.tsx # Display browser tabs (read-only)
│ │ │ ├── TabItem.tsx # Individual tab display
│ │ │ ├── BrowserOverlay.tsx # Overlay for click capture
│ │ │ └── LoadingState.tsx # Browser loading indicator
│ │ │
│ │ ├── chat/
│ │ │ ├── ChatPanel.tsx # Main chat container
│ │ │ ├── MessageList.tsx # Scrollable message list
│ │ │ ├── MessageItem.tsx # Individual message component
│ │ │ ├── ThinkingMessage.tsx # Agent thinking indicator
│ │ │ ├── ActionMessage.tsx # Agent action display
│ │ │ ├── ErrorMessage.tsx # Error message display
│ │ │ ├── CommandInput.tsx # User command input field
│ │ │ └── TypingIndicator.tsx # Agent is processing indicator
│ │ │
│ │ ├── controls/
│ │ │ ├── PlayPauseButton.tsx # Play/pause toggle button
│ │ │ ├── StatusIndicator.tsx # Current agent status display
│ │ │ ├── ProgressBar.tsx # Task progress indicator
│ │ │ └── HistoryButton.tsx # Navigate to history view
│ │ │
│ │ ├── history/
│ │ │ ├── HistoryPanel.tsx # Task history list container
│ │ │ ├── TaskCard.tsx # Individual task summary card
│ │ │ ├── TaskDetail.tsx # Expanded task details view
│ │ │ ├── ScreenshotFlow.tsx # Horizontal scrollable screenshots
│ │ │ ├── ScreenshotItem.tsx # Individual screenshot with caption
│ │ │ └── LogViewer.tsx # Task log display
│ │ │
│ │ ├── modals/
│ │ │ ├── LoginPrompt.tsx # Manual login required modal
│ │ │ ├── ErrorModal.tsx # Error display modal
│ │ │ └── ConfirmDialog.tsx # Generic confirmation dialog
│ │ │
│ │ └── ui/ # Reusable UI primitives
│ │ ├── Button.tsx
│ │ ├── Input.tsx
│ │ ├── Card.tsx
│ │ ├── ScrollArea.tsx
│ │ ├── Tooltip.tsx
│ │ ├── Badge.tsx
│ │ ├── Spinner.tsx
│ │ └── Toast.tsx
│ │
│ ├── hooks/
│ │ ├── useWebSocket.ts # WebSocket connection hook
│ │ ├── useAgentStatus.ts # Agent status subscription
│ │ ├── useChat.ts # Chat messages and commands
│ │ ├── useTasks.ts # Task management hook
│ │ ├── useElectron.ts # Electron IPC communication
│ │ ├── useBrowserView.ts # Browser view control
│ │ └── useHistory.ts # Task history fetching
│ │
│ ├── stores/
│ │ ├── index.ts # Store exports
│ │ ├── agentStore.ts # Agent state (status, paused, etc.)
│ │ ├── chatStore.ts # Chat messages state
│ │ ├── taskStore.ts # Current and queued tasks
│ │ ├── sessionStore.ts # Session state (logged in, etc.)
│ │ └── uiStore.ts # UI state (modals, panels, etc.)
│ │
│ ├── services/
│ │ ├── api.ts # REST API client
│ │ ├── websocket.ts # WebSocket service
│ │ └── electron-bridge.ts # Electron IPC bridge
│ │
│ ├── types/
│ │ ├── agent.ts # Agent-related types
│ │ ├── chat.ts # Chat message types
│ │ ├── task.ts # Task types
│ │ ├── session.ts # Session types
│ │ ├── websocket.ts # WebSocket message types
│ │ └── electron.ts # Electron IPC types
│ │
│ ├── lib/
│ │ ├── utils.ts # Utility functions
│ │ ├── constants.ts # Application constants
│ │ └── formatters.ts # Date, text formatters
│ │
│ └── styles/
│ └── themes/
│ └── dark.css # Dark theme (primary)
│
├── public/
│ ├── icons/ # App icons
│ └── fonts/ # Custom fonts
│
└── resources/ # Electron build resources
├── icon.icns # macOS icon
├── icon.ico # Windows icon
└── icon.png # Linux icon
Responsibilities:
- Initialize Electron application
- Launch Django backend as subprocess
- Create main application window
- Set up IPC handlers
- Manage application lifecycle
Startup Flow:
1. App ready event
2. Launch Django backend subprocess
3. Wait for backend health check
4. Create main BrowserWindow
5. Create BrowserView for embedded browser
6. Load Next.js renderer
7. Connect to backend WebSocket
8. Request browser CDP info from backend
9. Attach BrowserView to CDP endpoint
Responsibilities:
- Create and configure main application window
- Handle window events (close, minimize, maximize)
- Manage window state persistence
- Configure window frame and title bar
Window Configuration:
| Property | Value | Description |
|---|---|---|
| Width | 1400px | Default window width |
| Height | 900px | Default window height |
| Min Width | 1200px | Minimum width |
| Min Height | 700px | Minimum height |
| Frame | true | Use native window frame |
| Title Bar Style | hidden (macOS) | Custom title bar on macOS |
This is the most critical component for embedding the browser.
Responsibilities:
- Create Electron BrowserView for embedded browser
- Connect BrowserView to browser-use browser's CDP endpoint
- Handle BrowserView bounds (resize with window)
- Manage BrowserView lifecycle
- Forward user clicks from overlay to actual browser
How Browser Embedding Works:
┌─────────────────────────────────────────────────────────────┐
│ ELECTRON WINDOW │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ RENDERER (Next.js) │ │
│ │ ┌─────────────────────────┐ ┌─────────────────────┐ │ │
│ │ │ BrowserContainer │ │ ChatPanel │ │ │
│ │ │ (Placeholder div) │ │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ This area is where │ │ │ │ │
│ │ │ BrowserView renders │ │ │ │ │
│ │ │ │ │ │ │ │
│ │ └─────────────────────────┘ └─────────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────▼─────────────────────────────┐ │
│ │ ELECTRON BROWSERVIEW │ │
│ │ (Attached below renderer, z-indexed) │ │
│ │ │ │
│ │ Connected to browser-use browser via CDP │ │
│ │ ws://localhost:9222/devtools/browser/xxxxx │ │
│ │ │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
BrowserView Attachment Method:
- Backend starts browser-use browser with
--remote-debugging-port=9222 - Backend returns CDP WebSocket endpoint to frontend
- Electron main process creates BrowserView
- BrowserView navigates to
http://localhost:9222(DevTools frontend) or directly attaches to the browser page - BrowserView bounds are calculated based on BrowserContainer element position
- BrowserView is attached to main window at correct position
Alternative Approach (Simpler):
Instead of complex CDP attachment, we can:
- Let browser-use open browser normally (headed mode)
- Get the browser window handle
- Use Electron's
BrowserView.setAutoResize()to embed the browser window
Key Methods:
| Method | Description |
|---|---|
createBrowserView() |
Creates new BrowserView instance |
attachToCDP(endpoint) |
Connects BrowserView to CDP endpoint |
updateBounds(rect) |
Updates BrowserView position/size |
forwardClick(x, y) |
Sends click event to embedded browser |
destroy() |
Cleans up BrowserView |
Responsibilities:
- Spawn Django backend as child process
- Monitor backend health
- Handle backend crashes/restarts
- Graceful shutdown on app close
Process Management:
| Event | Action |
|---|---|
| App Start | Spawn uvicorn process |
| Backend Exit | Attempt restart (max 3 times) |
| App Close | Send SIGTERM, wait, then SIGKILL |
| Backend Ready | Emit event to renderer |
Responsibilities:
- Handle IPC messages from renderer
- Coordinate between renderer and main process
- Forward browser control commands
IPC Channels:
| Channel | Direction | Purpose |
|---|---|---|
browser:get-bounds |
Renderer → Main | Get BrowserView position |
browser:update-bounds |
Renderer → Main | Update BrowserView position |
browser:forward-click |
Renderer → Main | Forward click to browser |
backend:status |
Main → Renderer | Backend health status |
app:ready |
Main → Renderer | App fully initialized |
Responsibilities:
- Expose safe APIs to renderer via contextBridge
- Prevent direct Node.js access in renderer
- Define IPC communication interface
Exposed APIs:
window.electronAPI = {
// Browser control
browser: {
updateBounds: (rect) => ipcRenderer.invoke('browser:update-bounds', rect),
forwardClick: (x, y) => ipcRenderer.invoke('browser:forward-click', x, y),
},
// Backend status
backend: {
onStatusChange: (callback) => ipcRenderer.on('backend:status', callback),
},
// App lifecycle
app: {
onReady: (callback) => ipcRenderer.on('app:ready', callback),
quit: () => ipcRenderer.send('app:quit'),
}
}The primary application interface with split layout.
Layout Structure:
┌─────────────────────────────────────────────────────────────┐
│ AppShell │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Header (minimal - app title, status) │ │
│ └───────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────┬─────────────────────────┐ │
│ │ │ │ │
│ │ BrowserContainer │ ChatPanel │ │
│ │ (70% width) │ (30% width) │ │
│ │ │ │ │
│ │ ┌─────────────────────┐ │ ┌─────────────────┐ │ │
│ │ │ TabBar │ │ │ MessageList │ │ │
│ │ │ [Tab][Tab][Tab] │ │ │ (scrollable) │ │ │
│ │ └─────────────────────┘ │ │ │ │ │
│ │ ┌─────────────────────┐ │ │ │ │ │
│ │ │ │ │ │ │ │ │
│ │ │ Browser View Area │ │ │ │ │ │
│ │ │ (BrowserView here) │ │ │ │ │ │
│ │ │ │ │ └─────────────────┘ │ │
│ │ │ │ │ ┌─────────────────┐ │ │
│ │ │ │ │ │ CommandInput │ │ │
│ │ └─────────────────────┘ │ └─────────────────┘ │ │
│ │ │ │ │
│ └─────────────────────────────┴─────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ ControlBar │ │
│ │ [▶/⏸ Play/Pause] [Status: Running] [📜 History] │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Task history view with screenshots and logs.
Layout Structure:
┌─────────────────────────────────────────────────────────────┐
│ HistoryPanel │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Header: Task History [← Back] │ │
│ └───────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ TaskCard (completed) │ │
│ │ "Create a new Cloud Storage bucket named test-bucket" │ │
│ │ Status: ✅ Completed | Duration: 45s | 12 steps │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ ScreenshotFlow (horizontal scroll) │ │ │
│ │ │ [📷][📷][📷][📷][📷][📷] → │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ │ [View Logs] │ │
│ └───────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ TaskCard (failed) │ │
│ │ "Delete all unused VM instances" │ │
│ │ Status: ❌ Failed | Error: Permission denied │ │
│ │ ... │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Purpose: Provides the container where the Electron BrowserView will be positioned.
Responsibilities:
- Renders a placeholder div with specific dimensions
- Reports its position/size to main process via IPC
- Updates main process when container resizes
- Handles click events on overlay and forwards to browser
Behavior:
- On mount: Calculate bounds, send to main process
- On resize: Recalculate bounds, update main process
- On click: Capture click coordinates, forward via IPC
Purpose: Displays browser tabs (read-only).
Responsibilities:
- Fetch tab information from backend/agent
- Display tab titles and favicons
- Show active tab indicator
- Tabs are NOT clickable (agents control tabs)
Data Source:
- WebSocket events when agent changes tabs
- Backend API for initial tab state
Purpose: Main chat interface container.
Responsibilities:
- Contains MessageList and CommandInput
- Manages chat scroll behavior
- Auto-scrolls to new messages
Purpose: Scrollable list of chat messages.
Responsibilities:
- Render different message types
- Virtual scrolling for performance (if many messages)
- Smooth scroll to bottom on new messages
Purpose: Renders individual chat messages.
Message Types and Styling:
| Type | Visual Style | Icon |
|---|---|---|
user_command |
Blue background, right-aligned | 👤 User |
agent_thinking |
Gray italic, animated dots | 🤔 |
agent_action |
Yellow background | ⚡ Action |
agent_response |
Green background | 🤖 Agent |
system |
Neutral gray | ℹ️ System |
error |
Red background | ❌ Error |
Purpose: Animated display of agent thinking process.
Features:
- Animated typing indicator
- Shows partial thoughts as they stream
- Collapses when action is taken
Purpose: Displays agent actions.
Features:
- Action type icon (click, type, navigate, etc.)
- Target element description
- Success/failure indicator
- Expandable details
Purpose: User input field for natural language commands.
Features:
- Text input with placeholder
- Send button
- Disabled when agent is running (unless paused)
- Enter key to submit
- Input validation
Purpose: Toggle button for agent execution.
States:
- Play (▶): Agent is paused, click to resume
- Pause (⏸): Agent is running, click to pause
- Disabled: No task running
Purpose: Shows current agent/system status.
Status Values:
| Status | Display | Color |
|---|---|---|
idle |
"Ready" | Gray |
running |
"Running: {task}" | Green |
paused |
"Paused" | Yellow |
waiting_login |
"Waiting for login" | Blue |
error |
"Error occurred" | Red |
Purpose: Modal prompting user to complete GCP login.
Features:
- Explains manual login requirement
- "I've logged in" confirmation button
- Appears when session starts
- Non-dismissable until login confirmed
State:
| Property | Type | Description |
|---|---|---|
status |
enum | idle, running, paused, error |
isPaused |
boolean | Whether agent is paused |
currentAction |
string | Current action description |
thinking |
string | Current thinking content |
Actions:
| Action | Description |
|---|---|
setStatus(status) |
Update agent status |
pause() |
Request pause |
resume() |
Request resume |
setThinking(text) |
Update thinking text |
clearThinking() |
Clear thinking display |
State:
| Property | Type | Description |
|---|---|---|
messages |
Message[] | All chat messages |
isLoading |
boolean | Command being processed |
Actions:
| Action | Description |
|---|---|
addMessage(msg) |
Add new message |
updateMessage(id, data) |
Update existing message |
loadHistory() |
Load messages from backend |
clearMessages() |
Clear all messages |
State:
| Property | Type | Description |
|---|---|---|
currentTask |
Task | Currently executing task |
history |
Task[] | Completed tasks |
Actions:
| Action | Description |
|---|---|
submitTask(command) |
Submit new task |
setCurrentTask(task) |
Set current task |
completeTask(result) |
Mark task complete |
loadHistory() |
Fetch task history |
State:
| Property | Type | Description |
|---|---|---|
isConnected |
boolean | Backend connected |
isLoggedIn |
boolean | GCP login complete |
sessionId |
string | Current session ID |
Actions:
| Action | Description |
|---|---|
connect() |
Establish session |
confirmLogin() |
Confirm GCP login |
disconnect() |
End session |
Purpose: REST API client for backend communication.
Endpoints Wrapped:
| Method | Description |
|---|---|
sessions.start() |
Start new session |
sessions.current() |
Get current session |
sessions.confirmLogin() |
Confirm login complete |
tasks.submit(command) |
Submit new task |
tasks.list() |
Get task history |
tasks.get(id) |
Get task details |
tasks.pause(id) |
Pause task |
tasks.resume(id) |
Resume task |
chat.getMessages() |
Get chat history |
screenshots.getForTask(taskId) |
Get task screenshots |
Purpose: Manages WebSocket connection for real-time updates.
Features:
- Auto-reconnect on disconnect
- Event subscription system
- Message parsing and routing
- Heartbeat handling
Event Handlers:
| Event | Handler |
|---|---|
agent.thinking |
Update chatStore and agentStore |
agent.action |
Add action message to chat |
agent.status_change |
Update agentStore status |
task.completed |
Update taskStore, add message |
screenshot.captured |
Notify for history refresh |
session.login_required |
Show login modal |
Purpose: Wrapper around window.electronAPI for type safety.
Methods:
| Method | Description |
|---|---|
updateBrowserBounds(rect) |
Update BrowserView position |
forwardClick(x, y) |
Forward click to browser |
onBackendReady(callback) |
Listen for backend ready |
isElectron() |
Check if running in Electron |
Purpose: Hook for WebSocket connection and events.
Returns:
isConnected: Connection statussubscribe(event, handler): Subscribe to eventssend(event, data): Send messagereconnect(): Force reconnect
Purpose: Hook for agent status and control.
Returns:
status: Current statusisPaused: Pause statethinking: Current thinkingpause(): Pause agentresume(): Resume agent
Purpose: Hook for chat functionality.
Returns:
messages: All messagessendCommand(text): Send new commandisLoading: Command processing
Purpose: Hook for browser view management.
Returns:
containerRef: Ref for container divhandleClick(event): Click handler for overlay
| Element | Color | Hex |
|---|---|---|
| Background | Dark Gray | #0f0f0f |
| Surface | Slightly lighter | #1a1a1a |
| Border | Subtle gray | #2a2a2a |
| Primary | Blue accent | #3b82f6 |
| Success | Green | #22c55e |
| Warning | Yellow | #eab308 |
| Error | Red | #ef4444 |
| Text Primary | White | #ffffff |
| Text Secondary | Gray | #9ca3af |
| Element | Font | Size | Weight |
|---|---|---|---|
| App Title | Inter | 18px | 600 |
| Tab Labels | Inter | 13px | 400 |
| Chat Messages | Inter | 14px | 400 |
| Command Input | Inter | 14px | 400 |
| Status Text | Inter | 12px | 500 |
| Timestamps | Inter | 11px | 400 |
| Element | Dimension |
|---|---|
| Browser Area Width | 70% of window |
| Chat Panel Width | 30% of window |
| Minimum Chat Width | 300px |
| Tab Bar Height | 40px |
| Control Bar Height | 48px |
| Message Padding | 12px |
| Border Radius | 8px (cards), 4px (buttons) |
| State | Background | Icon | Cursor |
|---|---|---|---|
| Playing | Green | ⏸ | pointer |
| Paused | Yellow | ▶ | pointer |
| Disabled | Gray | ▶ | not-allowed |
| Hover | Lighter shade | - | - |
| Type | Background | Border | Text Color |
|---|---|---|---|
| User Command | #1e3a5f |
Blue left | White |
| Agent Thinking | #1f1f1f |
None | Gray italic |
| Agent Action | #2d2a1f |
Yellow left | White |
| Agent Response | #1f2d1f |
Green left | White |
| Error | #2d1f1f |
Red left | White |
1. User launches app
↓
2. Splash screen shows "Starting..."
↓
3. Backend subprocess starts
↓
4. Backend health check passes
↓
5. Main window appears
↓
6. Browser view initializes
↓
7. Navigates to GCP console
↓
8. Login modal appears: "Please log into Google Cloud"
↓
9. User logs in manually in browser view
↓
10. User clicks "I've logged in" button
↓
11. System verifies login (checks URL/cookies)
↓
12. Chat panel shows "Ready! Enter a command."
↓
13. User can now enter commands
1. User types command: "Create a new VM named test-vm"
↓
2. User clicks Send (or presses Enter)
↓
3. CommandInput disabled, shows loading
↓
4. User command appears in chat
↓
5. Agent starts, status changes to "Running"
↓
6. WebSocket sends "agent.thinking" events
↓
7. Chat shows animated thinking messages
↓
8. WebSocket sends "agent.action" events
↓
9. Chat shows action messages (Click, Type, Navigate)
↓
10. Browser view shows actions happening
↓
11. Screenshots captured at key moments
↓
12. Task completes, status returns to "Ready"
↓
13. Chat shows completion message
↓
14. CommandInput re-enabled
1. Agent is running task
↓
2. User clicks Pause button
↓
3. Button changes to Play icon
↓
4. Status shows "Paused"
↓
5. Agent stops at next safe point
↓
6. Chat shows "Agent paused by user"
↓
7. User can click in browser if needed
↓
8. User clicks Play button
↓
9. Agent resumes from where it stopped
↓
10. Status returns to "Running"
1. Agent is running or paused
↓
2. User clicks on browser area
↓
3. BrowserOverlay captures click coordinates
↓
4. Coordinates sent to main process via IPC
↓
5. Main process forwards to BrowserView
↓
6. Click executed in actual browser
↓
7. Agent observes page change (if relevant)
| Error Type | Display Method |
|---|---|
| Connection Lost | Toast + status indicator |
| Task Failed | Chat error message + modal option |
| Backend Crash | Full-screen error with restart button |
| Login Required | Modal prompt |
| Invalid Command | Chat error message |
┌────────────────────────────────────────────────┐
│ ❌ Error │
│ │
│ Task failed: Permission denied │
│ │
│ The agent could not create the VM because │
│ you don't have compute.instances.create │
│ permission in this project. │
│ │
│ [View Details] [Dismiss] │
└────────────────────────────────────────────────┘
| Error | Recovery |
|---|---|
| WebSocket disconnect | Auto-reconnect with backoff |
| Backend crash | Show restart dialog |
| Agent failure | Allow retry or new command |
| Browser crash | Restart browser session |
| Area | Strategy |
|---|---|
| Chat Messages | Virtual scrolling for 100+ messages |
| Screenshots | Lazy loading in history view |
| WebSocket | Debounce rapid thinking updates |
| BrowserView | Throttle bounds updates on resize |
- Clear old thinking messages after task completes
- Limit in-memory message history (paginate older)
- Cleanup screenshot blobs after viewing
| Key | Action |
|---|---|
| Tab | Navigate between controls |
| Enter | Submit command / Confirm dialogs |
| Space | Toggle play/pause |
| Escape | Close modals |
| Ctrl+H | Open history |
- Proper ARIA labels on all controls
- Live regions for status updates
- Meaningful alt text for screenshots
# Install dependencies
npm install
# Start Next.js dev server
npm run dev
# Start Electron
npm run electron:dev# Build Next.js
npm run build
# Package Electron
npm run electron:build| Platform | Format |
|---|---|
| Windows | .exe installer |
| macOS | .dmg disk image |
| Linux | .AppImage, .deb |
- Electron-updater integration
- Update notifications in app
- Background download and install
{
"next": "^14.0.0",
"react": "^18.2.0",
"react-dom": "^18.2.0",
"electron": "^28.0.0",
"zustand": "^4.4.0",
"socket.io-client": "^4.7.0",
"tailwindcss": "^3.4.0",
"@tanstack/react-virtual": "^3.0.0",
"lucide-react": "^0.300.0"
}{
"typescript": "^5.3.0",
"electron-builder": "^24.9.0",
"@types/react": "^18.2.0",
"concurrently": "^8.2.0",
"wait-on": "^7.2.0"
}- Context isolation enabled
- Node integration disabled in renderer
- Preload script for safe API exposure
- CSP headers configured
- Validate all IPC messages
- Whitelist allowed IPC channels
- No arbitrary code execution
- BrowserView sandboxed
- No cross-origin access from renderer
- CDP connection localhost only
- Light theme option
- Customizable layout (panel sizes)
- Command history (up arrow)
- Keyboard shortcuts panel
- Export chat/history to PDF
- Multi-monitor support
- Custom accent colors
- Zoom controls for browser view
End of Frontend Design Specification