Skip to content

Karan-05/Cloud_Automation_Agent-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloud Console Manager

Cloud Console Automation Agent – submit natural-language cloud console tasks, review an automatically generated plan, confirm login, and watch a mock (or real) browser-use agent execute the run while capturing evidence.

Version License Python Node


📋 Table of Contents


Overview

Cloud Console Manager is an intelligent desktop application that allows you to control Google Cloud Platform through natural language commands. Instead of manually clicking through the GCP Console interface, simply tell the agent what you want to do:

  • "Create a new VM instance named test-vm in us-central1"
  • "List all Cloud Storage buckets"
  • "Delete unused VM instances"
  • "Set up a load balancer for my production VMs"

The application embeds a real Chrome browser showing the GCP Console, while an AI agent (powered by Google's Gemini) autonomously performs the requested actions. You can watch it work in real-time, pause execution, or manually intervene when needed.

Key Benefits

Natural Language Control: Use plain English instead of complex CLI commands ✅ Visual Feedback: Watch the agent work in an embedded browser ✅ Safe & Controllable: Pause/resume execution, manual intervention ✅ Full History: Review past tasks with screenshots and logs ✅ Session Persistence: Automatic login cookie management


Architecture

Cloud Console Manager is a three-tier desktop solution:

  • Django orchestration backend (/backend) exposes the REST/WebSocket APIs, task queue, governance/audit subsystems, and agent lifecycle controls.
  • Next.js + Electron wizard (/frontend) renders the 4-step Submit → Review → Login → Run UI plus role selection, approvals, comments, and evidence cards.
  • Browser-Use automation stack (bundled under /browser_use) drives Playwright/Chromium (or a mock agent when AGENT_USE_MOCK=true) to perform user commands.

Key platform capabilities now include:

  • Role-aware sessions with per-role permissions and filtered task views.
  • Natural-language task submission with auto-generated plans, dry-run previews, and policy-driven approvals.
  • Governance primitives: audit logs, comments, evidence bundles, CSV exports, and monitoring metrics.

The components interact as follows:

┌─────────────────────────────────────────────────────────────────┐
│                    ELECTRON DESKTOP APP                          │
│  ┌───────────────────────────┬───────────────────────────────┐  │
│  │   Embedded Browser        │   Chat Interface              │  │
│  │   (GCP Console)           │   - Send commands             │  │
│  │   - Real Chrome           │   - Agent thinking            │  │
│  │   - User can click        │   - Action logs               │  │
│  │   - Agent controlled      │   - Play/Pause control        │  │
│  └───────────────────────────┴───────────────────────────────┘  │
└──────────────────────┬──────────────────────────────────────────┘
                       │ REST API + WebSocket
┌──────────────────────▼──────────────────────────────────────────┐
│                    DJANGO BACKEND                                │
│  - Agent lifecycle management                                    │
│  - Task queuing & execution                                      │
│  - WebSocket real-time updates                                   │
│  - Session & cookie persistence                                  │
│  - Chat history & screenshot storage                             │
└──────────────────────┬──────────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────────┐
│                BROWSER-USE AGENT FRAMEWORK                       │
│  - AI-powered browser automation                                 │
│  - Gemini 2.5 Flash LLM integration                              │
│  - Chrome DevTools Protocol (CDP)                                │
│  - DOM analysis & action execution                               │
└──────────────────────────────────────────────────────────────────┘

Technology Stack

Backend:

  • Django 5.x (REST API & WebSocket)
  • Django Channels (WebSocket communication)
  • SQLite (data persistence)
  • browser-use (AI agent framework)
  • Google Vertex AI (Gemini LLM)

Frontend:

  • Electron 28+ (desktop framework)
  • Next.js 14+ (React framework)
  • TypeScript (type safety)
  • Tailwind CSS (styling)
  • Zustand (state management)
  • Socket.io (WebSocket client)

Agent:

  • browser-use library (async Python)
  • cdp-use (Chrome DevTools Protocol)
  • Gemini 2.5 Flash (LLM)

Features

🤖 AI-Powered Automation

  • Natural language command processing
  • Intelligent GCP Console navigation
  • Automatic action execution with retries
  • Context-aware decision making

🖥️ Desktop Experience

  • Native desktop application (Windows, macOS, Linux)
  • Embedded browser view of GCP Console
  • Real-time agent thinking and action display
  • Chat-style command interface

🎮 User Control

  • Play/Pause: Stop agent execution anytime
  • Manual Intervention: Click in browser when needed
  • Task Queue: Submit multiple commands
  • Action Review: See every step the agent takes

📊 History & Monitoring

  • Complete task history
  • Screenshot timeline for each task
  • Detailed execution logs
  • Success/failure tracking

🔐 Security & Persistence

  • Encrypted cookie storage
  • Automatic GCP login management
  • Local data storage (no cloud sync)
  • Session persistence across restarts

Prerequisites

System Requirements

  • Operating System: Windows 10+, macOS 11+, or Linux (Ubuntu 20.04+)
  • Display: Minimum 1200x700 resolution
  • Memory: 4GB RAM minimum, 8GB recommended
  • Storage: 2GB free space

Software Requirements

Backend:

  • Python 3.11 or higher
  • Chrome/Chromium browser installed
  • pip or uv package manager

Frontend:

  • Node.js 18 or higher
  • npm or yarn package manager

Google Cloud:

  • GCP project with Vertex AI enabled
  • Service account with appropriate permissions
  • Service account credentials JSON file

Containerized Quickstart

Prerequisite: Install Docker Desktop (or Docker Engine) with Docker Compose Plugin enabled.

  1. From the repo root run docker compose up --build (or make dev to use the Makefile wrapper).
  2. Open http://localhost:3000 for the Next.js wizard and http://localhost:8000/api/tasks/ for the Django API.
  3. Containers default to mock agent mode (AGENT_USE_MOCK=true) and persist SQLite + media data inside the backend-data volume.
  4. Stop or inspect with make stop / make logs.
  5. Run backend integration tests in-container anytime via make test-backend; run frontend unit tests with make test-frontend.

Switching to a real agent: edit docker-compose.yml (or create docker-compose.override.yml) to set AGENT_USE_MOCK=false, provide your AGENT_MODEL, Vertex/Gemini credentials, and any Playwright-specific env like BROWSER_CDP_PORT. The backend image already includes Chromium + Playwright dependencies—just ensure the required API keys are mounted/available before docker compose up.


Quick Start

Local Development (Mock Mode by Default)

  1. Clone & bootstrap
git clone https://github.com/dpraj007/CC_Manager-.git
cd CC_Manager-
uv venv --python 3.11
source .venv/bin/activate  # Windows: .venv\Scripts\activate
uv pip install -r backend/requirements.txt
uv pip install -e .
cp backend/.env.example backend/.env
# Optional (only for real browser automation): uvx playwright install chromium
  1. Run the Django API
cd backend
python manage.py migrate
python manage.py runserver 0.0.0.0:8000

Leave the terminal running. You can confirm readiness anytime with:

curl http://localhost:8000/health
  1. Run the Next.js wizard
cd frontend
npm install
# Optionally point at a different backend:
# export NEXT_PUBLIC_API_URL=http://localhost:8000
npm run dev

Visit http://localhost:3000 and walk through Submit → Review → Login → Run. The UI pings /health before enabling actions, so if the backend is down you’ll see a “Retry connection” banner instead of silent failures. AGENT_USE_MOCK=true in backend/.env, so mock evidence flows out-of-the-box; flip it to false plus real credentials when you are ready for a full browser automation run.

  1. First run script

  2. Enter a natural-language command on the Submit step (for example, “List all VM instances in my project”).

  3. Confirm the command on the Review step or go back to edit it.

  4. Manually log into the embedded Google Cloud Console, then use the Login step to check/confirm authentication. By default AGENT_USE_MOCK=true, so local runs use a safe mock agent—set it to false in backend/.env when you are ready for the full browser-use automation.

  5. Click Run task to trigger execution. The Run step polls /api/tasks/{id}/ for status transitions and automatically fetches /api/screenshots/?task_id=... once the task completes.

Demo Script (Submit → Review → Login → Run)

  1. Start backend

    source .venv/bin/activate
    python backend/manage.py runserver

    Ensure backend/.env has AGENT_USE_MOCK=true so the flow does not require real cloud credentials.

  2. Start frontend

    cd frontend
    npm run dev

    Visit http://localhost:3000.

Supported Real Automations (AGENT_USE_MOCK=false)

The production agent now parses a small, safe catalog of cloud insights through a planner → structured executor → evidence pipeline. Commands outside this catalog remain “unsupported” and the Run button stays disabled. See docs/capabilities.md for the full capability matrix.

Capability Natural-language cues Requirements Returned Data
list_vms “how many vm instances”, “list the GCE instances” include project <id> (zone/region optional) VM rows with status, zone, machine type, IPs
list_projects “list projects”, “show all projects” none Project id, display name, state
list_service_accounts “service accounts in project X” include project <id> account display name, email, disabled flag
check_bigquery_api “is BigQuery API enabled for X” include project <id> API enablement state + timestamp

Every successful run persists a structured JSON payload (task.result_payload) and the UI renders it in the Run + History views alongside the action timeline and screenshots. Mock mode emits believable data for each capability so demos feel identical to live mode.

Enabling the real agent

  1. Provide Google Cloud credentials and agent env vars:
    AGENT_USE_MOCK=false
    AGENT_MODEL=ChatBrowserUse
    GOOGLE_CLOUD_PROJECT=your-project
    GOOGLE_CLOUD_LOCATION=us-central1
    GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcp-creds.json
    BROWSER_CDP_PORT=9222
    SECRET_REDACT_HINTS=password,secret,api_key
    
  2. Install Playwright in the backend image (python -m playwright install chromium) – already baked into the provided Dockerfile.
  3. When using Docker Compose, mount the service-account JSON plus persistent browser artifacts:
    services:
      backend:
        volumes:
          - backend-data:/data
          - ./secrets/gcp-creds.json:/secrets/gcp-creds.json:ro
          - ./storage/screenshots:/data/storage/screenshots
          - ./storage/profiles:/data/storage/profiles
  4. From the Next.js wizard, submit one of the supported tasks. The Review step shows the recognized plan type + parameters, and the Run step streams structured action logs, metrics, and real screenshots captured from Chromium.

Every real run logs planner decisions, task state transitions, and agent actions via apps.governance.audit_logger. Sensitive values (keywords listed in SECRET_REDACT_HINTS or registered secrets) are automatically redacted before persisting logs, comments, or action metadata.

  1. Submit – Type a natural-language instruction (e.g., “Create a mock VM named demo”), optionally add a business summary, and choose whether to run as a dry-run preview.

  2. Review – Inspect the generated plan steps, see whether policy requires approval, and (if you are an Admin) approve sensitive tasks.

  3. Login – Switch to the embedded browser window, complete the manual Google login (in mock mode you can simply click “I’ve logged in”). The wizard polls /api/sessions/login-status/ and surfaces the session ID + login flag.

  4. Run – Click “Run task”. The frontend calls POST /api/tasks/{id}/execute/, then polls /api/tasks/{id}/ until it transitions from pending → running → completed/failed. When completed, it fetches /api/tasks/{id}/evidence/ to render screenshots, comments, and logs.

  5. Collaborate & Report – Add review comments, refresh audit logs/metrics, and download the CSV report for business stakeholders.

Containerized Development

# All commands from repo root
docker compose up --build             # or: make dev
# backend → http://localhost:8000
# frontend → http://localhost:3000

The compose file (and Makefile) default to mock mode, mount sqlite/screenshots into the backend-data volume, and wire the frontend to the backend via NEXT_PUBLIC_API_URL=http://backend:8000. Stop the stack with docker compose down, tail logs with make logs, and run backend tests in-container via make test-backend. When you are ready for live automation, override AGENT_USE_MOCK=false and provide the relevant Vertex/Gemini credentials through docker-compose.override.yml or environment variables.

Running Example Scripts (Optional)

The project includes example scripts for direct browser automation without the full desktop application. The main example script is gcp_manual_login_persistent_new17.py:

Prerequisites:

  • Ensure you have installed dependencies using uv sync --all-extras (or uv sync if you prefer)
  • Credentials Setup:
    • The actual Google Cloud service account credentials are stored in the JSON file (e.g., nice-script-404403-9d12fbf6a127.json in the project root)
    • The .env file does NOT store credentials - it only contains a path reference to the JSON file
    • Configure your .env file to point to the credentials JSON file:
      GOOGLE_CLOUD_PROJECT=your-project-id
      GOOGLE_CLOUD_LOCATION=us-central1
      GOOGLE_APPLICATION_CREDENTIALS=/home/ubuntu/SE/All_project/BROWSER_USE_TEST/cloud_console_automation_agent/nice-script-404403-9d12fbf6a127.json
    • Important: Update GOOGLE_APPLICATION_CREDENTIALS with the absolute path to your nice-script-404403-9d12fbf6a127.json file. The credentials themselves remain in the JSON file, not in .env.

Run the example script:

# From the project root directory
uv run python examples/gcp_manual_login_persistent_new17.py

What the script does:

  1. Opens a browser and navigates to GCP Console sign-in page
  2. Waits for you to manually log in (handles 2FA, security checks)
  3. After login, accepts natural language commands interactively
  4. Executes GCP console tasks using AI agents (Gemini 2.5 Flash)
  5. Maintains a persistent browser session across multiple commands
  6. Captures screenshots and logs all actions in runs/ directory

Example usage:

$ uv run python examples/gcp_manual_login_persistent_new17.py
[Agent navigates to GCP console...]

Log in manually, then press Enter …
[Complete login in browser]
<Enter>

Next GCP action (or 'exit'): List all VM instances
[Agent executes task...]

Next GCP action (or 'exit'): Create a new storage bucket
[Agent executes task...]

Next GCP action (or 'exit'): exit

Note: Use uv run to ensure you're using the correct Python environment with all dependencies installed. Alternatively, you can activate the virtual environment first:

source .venv/bin/activate  # On Windows: .venv\Scripts\activate
python examples/gcp_manual_login_persistent_new17.py

For detailed documentation about this script, see examples/README_gcp_manual_login_persistent_new17.md.


Project Structure

CC_Manager-/
├── backend/                        # Django backend server
│   ├── config/                    # Django configuration
│   │   ├── settings/             # Environment-specific settings
│   │   ├── urls.py               # URL routing
│   │   └── asgi.py               # WebSocket support
│   ├── apps/                     # Django applications
│   │   ├── agents/               # Agent lifecycle management
│   │   ├── tasks/                # Task execution & queuing
│   │   ├── chat/                 # Chat history storage
│   │   ├── sessions/             # Session & cookie management
│   │   └── screenshots/          # Screenshot capture & storage
│   ├── core/                     # Shared utilities
│   ├── storage/                  # File storage (cookies, screenshots)
│   ├── tests/                    # Backend tests
│   ├── manage.py                 # Django management
│   └── requirements.txt          # Python dependencies
│
├── frontend/                      # Electron + Next.js frontend
│   ├── main/                     # Electron main process
│   │   ├── index.ts              # Application entry point
│   │   ├── window.ts             # Window management
│   │   ├── browser-view.ts       # Embedded browser controller
│   │   ├── backend-launcher.ts   # Backend process manager
│   │   └── ipc-handlers.ts       # IPC communication
│   ├── renderer/                 # Next.js renderer process
│   │   ├── app/                  # Next.js App Router pages
│   │   ├── components/           # React components
│   │   │   ├── browser/          # Browser view components
│   │   │   ├── chat/             # Chat interface
│   │   │   ├── controls/         # Play/pause controls
│   │   │   └── history/          # Task history
│   │   ├── hooks/                # Custom React hooks
│   │   ├── stores/               # Zustand state stores
│   │   ├── services/             # API & WebSocket clients
│   │   └── types/                # TypeScript types
│   ├── package.json              # Node dependencies
│   └── electron-builder.yml      # Build configuration
│
├── browser_use/                   # browser-use agent framework
│   ├── agent/                    # Agent orchestration
│   ├── browser/                  # Browser session management
│   ├── dom/                      # DOM extraction & analysis
│   ├── llm/                      # LLM integration layer
│   └── tools/                    # Action registry
│
├── docs/                          # Documentation
│   ├── backend_design_spec.md    # Backend architecture
│   ├── frontend_design_spec.md   # Frontend architecture
│   └── testing_plan.md           # Testing strategy
│
├── tests/                         # Integration tests
├── examples/                      # Usage examples
├── docker/                        # Docker configuration
│
├── README.md                      # This file
├── CLAUDE.md                      # Instructions for Claude Code
├── LICENSE                        # MIT License
└── pyproject.toml                # Python project config

User Story Coverage

The running list of specification epics / stories and their implementation state lives in
docs/user_story_coverage.md. Update that file whenever a story’s status changes or when the missing specification artifacts are committed.


Development

Backend Development

cd backend

# Activate virtual environment
source venv/bin/activate

# Run development server with auto-reload
uvicorn config.asgi:application --reload --port 8000

# Run tests
python manage.py test

# Create database migrations
python manage.py makemigrations
python manage.py migrate

# Access admin panel
python manage.py createsuperuser
# Visit http://localhost:8000/admin/

Frontend Development

cd frontend

# Start Next.js dev server (hot reload)
npm run dev:next

# Start Electron (restart on main process changes)
npm run dev:electron

# Run concurrently (both at once)
npm run dev

# Build for production
npm run build
npm run electron:build

# Type checking
npm run type-check

# Linting
npm run lint

Development Workflow

  1. Backend changes: Edit Python files, server auto-reloads
  2. Frontend UI: Edit renderer/ components, hot reload active
  3. Electron main: Edit main/ files, restart Electron app
  4. Database changes: Create migrations, apply with migrate
  5. API changes: Update both backend endpoints and frontend services

Documentation

Detailed documentation is available in the /docs directory:

Additional resources:

  • CLAUDE.md: Instructions for working with this codebase using Claude Code
  • Examples: Usage examples and code samples
  • Browser-Use Docs: browser-use framework documentation

Testing

Backend Tests

cd backend

# Run all tests
python manage.py test

# Run specific app tests
python manage.py test apps.agents
python manage.py test apps.tasks

# Run with coverage
pip install coverage
coverage run --source='.' manage.py test
coverage report

Frontend Tests

cd frontend

# Run unit tests (when implemented)
npm test

# Run E2E tests (when implemented)
npm run test:e2e

# Type checking
npm run type-check

Integration Tests

# From repository root
cd tests/

# Run integration test suite (when implemented)
pytest -v

Deployment

Desktop Application Build

Build for current platform:

cd frontend
npm run build
npm run electron:build

Platform-specific builds:

npm run electron:build:mac     # macOS (.dmg)
npm run electron:build:win     # Windows (.exe)
npm run electron:build:linux   # Linux (.AppImage, .deb)

Output location:

  • Windows: frontend/dist/win-unpacked/ and .exe installer
  • macOS: frontend/dist/mac/ and .dmg disk image
  • Linux: frontend/dist/linux-unpacked/ and .AppImage

Distribution

The built application is self-contained:

  • Includes Django backend bundled
  • Includes Python runtime
  • Includes Node.js/Electron runtime
  • Requires only Chrome/Chromium on target system

Installation:

  1. Download platform-specific installer
  2. Run installer (may require admin/sudo)
  3. Launch "Cloud Console Manager"
  4. Configure Google Cloud credentials on first run
  5. Log into GCP Console
  6. Start automating!

Troubleshooting

Backend Issues

Backend won't start:

  • Check Python version: python3 --version (needs 3.11+)
  • Verify dependencies: pip install -r requirements.txt
  • Check port 8000 not in use: lsof -i :8000 (macOS/Linux)
  • Review backend logs for errors

Agent execution fails:

  • Verify Google Cloud credentials in .env
  • Ensure Vertex AI API is enabled in GCP project
  • Check service account has necessary permissions
  • Confirm AGENT_MODEL is valid (default: gemini-2.5-flash)

Database errors:

  • Delete db.sqlite3 and run migrations again
  • Check file permissions on database file
  • For SQLite locked errors, ensure only one backend instance running

Frontend Issues

Electron won't start:

  • Check Node.js version: node --version (needs 18+)
  • Clear node_modules and reinstall: rm -rf node_modules && npm install
  • Check backend is running on port 8000
  • Review Electron main process logs in terminal

Browser view not showing:

  • Ensure Chrome/Chromium installed on system
  • Check IPC communication logs
  • Verify browser-view bounds calculation
  • Try restarting application

WebSocket connection fails:

  • Confirm backend WebSocket endpoint: ws://localhost:8000/ws/agent/
  • Check firewall not blocking WebSocket connections
  • Verify NEXT_PUBLIC_WS_URL in .env.local
  • Review browser console for errors

Common Issues

"Permission denied" errors in GCP:

  • Service account needs proper IAM roles
  • Check project-level permissions
  • Some operations require Owner/Editor role

Browser automation not working:

  • GCP Console UI may have changed (agents adapt but may need updates)
  • Clear browser profile: rm -rf backend/storage/profiles/
  • Check CDP port not in use: lsof -i :9222

Login session expires:

  • Manually log in again when prompted
  • Cookie encryption key changed (check DJANGO_SECRET_KEY)
  • Clear stored cookies: rm -rf backend/storage/cookies/

Contributing

Contributions are welcome! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes:
    • Backend: Follow Django best practices, add tests
    • Frontend: Follow React/TypeScript conventions, add types
    • Documentation: Update relevant .md files
  4. Test thoroughly:
    • Run backend tests: python manage.py test
    • Run frontend type check: npm run type-check
    • Test manually in application
  5. Commit changes: git commit -m "Add amazing feature"
  6. Push to branch: git push origin feature/amazing-feature
  7. Open Pull Request

Code Style

Python (Backend):

  • Use tabs for indentation (not spaces)
  • Type hints with modern syntax (str | None, not Optional[str])
  • Async/await for concurrent operations
  • Django conventions for models/views/serializers

TypeScript (Frontend):

  • Strict mode enabled
  • Explicit types for function parameters/returns
  • React hooks for state/effects
  • Zustand for global state

See CLAUDE.md for detailed development guidelines.


Roadmap

v1.0 (Current)

  • ✅ Basic agent execution
  • ✅ Embedded browser view
  • ✅ Chat interface
  • ✅ Task history
  • ✅ Cookie persistence

v1.1 (Planned)

  • Multi-cloud support (AWS, Azure)
  • Task templates/presets
  • Video recording of sessions
  • Export chat/history to PDF
  • Keyboard shortcuts

v2.0 (Future)

  • Multi-user support
  • Cloud sync of task history
  • Browser-Use Cloud integration
  • Custom agent prompts
  • Plugin system

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments

  • browser-use: AI browser automation framework
  • Django: Python web framework
  • Electron: Desktop application framework
  • Next.js: React framework
  • Google Cloud: Vertex AI and Gemini LLM

Support


Made with ❤️ for GCP automation

⭐ Star this repo | 🐛 Report Bug | 💡 Request Feature

# Cloud_Automation_Agent-

About

Electron + Django agent that executes cloud console tasks with guardrails

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors