ββββββ βββββββ ββββββββββββ βββββββββββββββ βββββββ
ββββββββββββββββ βββββββββββββ βββββββββββββββββββββββ
βββββββββββ ββββββββββ ββββββ βββ βββ ββββββ
βββββββββββ βββββββββ ββββββββββ βββ ββββββ
βββ βββββββββββββββββββββββ ββββββ βββ βββββββββββ
βββ βββ βββββββ βββββββββββ βββββ βββ βββ βββββββ
Build intelligent systems that can think, decide and act independently.
A fully-featured Windows desktop AI agent that controls a real humanoid robot head over Bluetooth.
Voice conversations Β· Live lip-sync Β· Robot animations Β· OS automation Β· Web search Β· Email Β· Vision Β· Screen share Β· Deep research Β· File control Β· ADB mobile link Β· and more
Brutus has a companion mobile app β a full-featured Android AI assistant with its own robot BLE control, Gemini Live voice, and 25+ tools.
Brutus β final build with glowing eyes & servo face |
Face close-up β eye & mouth servos |
Early assembly β servo layout & wiring |
Completed assembly β Arduino + HM-10 + servos |
Windows desktop β Brutus command center |
Windows desktop β Brutus executing commands |
Brutus is two things in one:
-
π₯οΈ A Windows Desktop AI Agent β Built with Electron + React, powered by open-source LLMs via Groq (LLaMA 3) and local inference via Xenova Transformers. Real-time voice conversations using a fully open-source STT β LLM β TTS pipeline, OS automation, vision, screen control, emails, deep research, and 40+ tools β all through natural speech or text.
-
π€ A Physical Robot Head β An Arduino-powered humanoid face with 4 servos (eyes X/Y, eyelid, mouth), an LED, and a sound sensor. The desktop app drives the robot's expressions, lip-syncs its mouth to the TTS voice output, and triggers named animation sequences β all over Bluetooth Low Energy.
When Brutus talks to you, his robot face moves its mouth in sync, changes expressions based on the emotion in its speech, and nods, winks, or laughs on command.
Looking for the Android / mobile version? β Brutus Mobile App
| Metric | Value |
|---|---|
| π οΈ AI Tools | 40+ callable tools |
| π Robot Animations | 20 (10 macros + 10 tricks) |
| π Expressions | 6 (+ intensity slider 0β100%) |
| π© Servos | 4 Γ SG90 (eyes X/Y, eyelid, mouth) |
| π‘ BLE Commands | 11 command types |
| π¦ NPM Dependencies | 50+ packages |
| π§ AI Providers | Groq (LLaMA 3), HuggingFace, Tavily, local Xenova |
| ποΈ Voice Stack | Whisper STT + Meta MMS / Kokoro TTS (open-source) |
| ποΈ Lines of Code (approx.) | 15,000+ |
| ποΈ Architecture | Electron (main) + React 19 (renderer) + IPC bridge |
| πΎ Vector DB | LanceDB (embedded, local-first) |
| π± Mobile Link | ADB over Wi-Fi (Android deep control) |
| π± Mobile Companion | Brutus Android App |
You speak a command
β
βΌ
ββββββββββββββββββββββββββββ
β Whisper STT (local/API) β ββββ speech β text transcription
ββββββββββββ¬ββββββββββββββββ
β
βΌ
ββββββββββββββββββββ βββββββββββββββββββββ
β LLaMA 3 / Groq β ββββΊ β Vision / Screen β
β (reasoning LLM) β β (screenshots) β
ββββββββ¬ββββββββββββ βββββββββββββββββββββ
β
βββββββββ΄βββββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββββ ββββββββββββββββββββββββ
β Meta MMS / Kokoro β β Tool Calls (40+) β
β TTS (voice output) β β (OS, web, files, β
ββββββββββββ¬ββββββββββββ β ADB, email, etc.) β
β ββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββ
β Robot (BLE via HM-10) β
β lip-sync + emotion β
β + LED patterns β
ββββββββββββββββββββββββββ
Brutus is designed to run a fully open-source, self-hosted voice pipeline β no proprietary voice APIs required. The architecture is modular, so each component can be swapped or fine-tuned independently.
| Model | Source | Why It Fits |
|---|---|---|
| Meta MMS-TTS (VITS) | facebook/mms-tts | Facebook's Massively Multilingual Speech β VITS-based, 1100+ languages, fine-tuneable via HuggingFace Trainer |
| Kokoro-82M | hexgrad/Kokoro-82M | 82M-param open-weight TTS, Apache 2.0, near-commercial quality, runs on CPU |
| StyleTTS 2 | yl4579/StyleTTS2 | Human-level TTS via style diffusion β zero-shot speaker cloning, emotion control |
| Coqui XTTS-v2 | coqui-ai/TTS | Voice cloning from 6s reference clip, 17 languages, actively maintained forks |
| Piper TTS | rhasspy/piper | Ultra-fast local neural TTS, runs offline on low-end hardware, great for real-time lip-sync |
| Model | Source | Why It Fits |
|---|---|---|
| OpenAI Whisper | openai/whisper | MIT-licensed, multilingual, runs fully local β whisper-base to whisper-large-v3 |
| Whisper.cpp | ggerganov/whisper.cpp | C++ port of Whisper β extremely fast on CPU, ideal for real-time desktop use |
| Meta MMS-ASR | facebook/mms-1b-fl102 | Wav2Vec2-based ASR for 100+ languages, fine-tuneable with adapter modules |
| WhisperSpeech | WhisperSpeech/WhisperSpeech | Inverted Whisper β both STT and TTS from the same architecture |
The voice pipeline is built to be personalized. To train a custom "Brutus voice":
# 1. Fine-tune Meta MMS-TTS on your voice dataset using HuggingFace
git clone https://github.com/ylacombe/finetune-hf-vits
pip install -r requirements.txt
# 2. Prepare your audio dataset (10β30 min of clean speech recommended)
# Dataset format: audio files + transcript CSV
# 3. Launch training
python train.py \
--model_name_or_path "facebook/mms-tts-eng" \
--dataset_path "./your_voice_dataset" \
--output_dir "./brutus-voice-model"
# 4. Load the fine-tuned model in Brutus via HuggingFace InferenceThe
@xenova/transformerspackage already bundled in Brutus can load fine-tuned Whisper and MMS models directly in Node.js β no Python runtime needed at runtime.
When the LLM produces a response, Brutus automatically:
| AI Status | Voice Action | Robot Behavior |
|---|---|---|
| π§ Listening | Whisper STT active, VAD gating | Eyes center, LED solid |
| π€ Thinking | LLM inferencing via Groq | Eyes drift up-left, LED pulse |
| π£οΈ Speaking | MMS/Kokoro TTS β audio chunks β lip-sync | Mouth angle from audio amplitude |
| βΈοΈ Idle | Voice pipeline paused | Eyes center, LED pulse |
| β Error | TTS error tone | Sad expression, LED fast blink |
| Feature | Description |
|---|---|
| Real-time voice | Open-source STT β LLaMA 3 reasoning β open-source TTS pipeline |
| Local inference | Xenova Transformers runs Whisper and small models in-process |
| Groq fast inference | LLaMA 3 / Mixtral via Groq for ultra-low latency responses |
| Text fallback | Full text chat interface when voice isn't practical |
| Live transcripts | See what you and Brutus are saying in real time |
| Chat history | Persisted locally via electron-store |
| Context awareness | Maintains conversation context across sessions |
| Barge-in | Interrupt Brutus mid-sentence and the TTS stops immediately |
| Feature | Description |
|---|---|
| Screenshot vision | Brutus captures your screen and understands what it sees via multimodal AI |
| Screen Peeler (OCR) | Instantly extract text from any visible UI element using Tesseract.js |
| Ghost Coder | Inline IDE generation triggered by Ctrl+Alt+Space |
| Gallery analysis | Point at any local image β Brutus describes and reasons about it |
|
|
|
|
|||||||||||||||||||||||||||||||||||
|
|
|
|
- Tailwind CSS v4 with a Neon Emerald aesthetic
- Framer Motion + GSAP for cinematic UI animations
- Three.js + React Three Fiber for 3D neural visualizations
- React 19 component-based frontend
- Floating desktop widgets that live on top of your workflow
- Dark-mode map via Leaflet + OpenStreetMap
- Syntax-highlighted Monaco Editor for code output
- XTerm.js embedded terminal for live shell output
| Capability | Brutus AI | ChatGPT Desktop | Copilot | Standard Chatbots |
|---|---|---|---|---|
| Physical robot face w/ lip-sync | β | β | β | β |
| Emotion-driven servo expressions | β | β | β | β |
| 20 named animation macros | β | β | β | β |
| Fully open-source voice pipeline | β | β | β | β |
| Fine-tuneable custom voice | β | β | β | β |
| Real OS file & app control | β | β | β | |
| Ghost typing / tap automation | β | β | β | β |
| ADB mobile deep link | β | β | β | β |
| Screen vision (live OCR) | β | β | β | β |
| Gmail read + compose | β | β | β | β |
| Deep multi-source research | β | β | β | |
| RAG over your own documents | β | β | β | β |
| LanceDB local vector store | β | β | β | β |
| Biometric face-lock vault | β | β | β | β |
| Fully open-source & self-hostable | β | β | β | β |
| Bring-your-own API keys | β | β | β | β |
| Android companion app | β | β | β | β |
β οΈ = partial / requires additional setup
Brutus has a physical humanoid face that brings the AI to life. The robot head uses 4 micro servos, an LED, a sound sensor, and an HM-10 BLE module β all controlled by an Arduino Uno.
| Component | Qty | Pin | Purpose |
|---|---|---|---|
| Arduino Uno (or Nano) | 1 | β | Main controller |
| HM-10 BLE Module | 1 | D10 (RX), D11 (TX) | Wireless communication with desktop |
| SG90 Micro Servo β Eye L/R | 1 | D3 | Horizontal eye movement |
| SG90 Micro Servo β Eye U/D | 1 | D5 | Vertical eye movement |
| SG90 Micro Servo β Eyelid | 1 | D6 | Eyelid open/close + blink |
| SG90 Micro Servo β Mouth | 1 | D9 | Jaw / lip-sync |
| LED (any color) | 1 | D8 | Status indicator / emotion display |
| Sound Sensor (analog) | 1 | A0 | Mic for idle mode autonomous lip-sync |
| 5V Power Supply (2A+) | 1 | β | Power for servos (USB alone isn't enough) |
π° Estimated Build Cost: ~$15β25 USD (Arduino clone + 4Γ SG90 + HM-10 + LED + misc)
Custom PCB layout β Arduino + HM-10 BLE + servo headers + power rails |
π§ 3D Model β The full Brutus head assembly is available as a
.glbfile you can inspect interactively on GitHub:β View Brutus-1.glb in 3D on GitHub
(GitHub renders
.glbfiles with a built-in 3D viewer β pan, rotate, and zoom the full head assembly directly in the browser)
ββββββββββββββββββββββββ
β Arduino Uno β
β β
HM-10 TXD ββββββΊ β D10 (SoftSerial RX) β
HM-10 RXD ββββββ β D11 (SoftSerial TX) β β use 5Vβ3.3V voltage divider!
β β
Eye L/R Servo βββ β D3 (PWM) β
Eye U/D Servo βββ β D5 (PWM) β
Eyelid Servo βββ β D6 (PWM) β
Mouth Servo βββ β D9 (PWM) β
β β
LED βββ β D8 (Digital) β
Sound Sensor βββΊ β A0 (Analog) β
β β
5V (external) βββΊ β 5V β
GND βββββββββββββ β GND (common ground) β
ββββββββββββββββββββββββ
β οΈ Important: The HM-10's RXD pin is 3.3V logic. Use a voltage divider (1kΞ© + 2kΞ©) between Arduino D11 (5V TX) and HM-10 RXD. TXD β Arduino D10 is fine without a divider.
The desktop app communicates with the robot over BLE GATT serial (UUID 0000FFE1). Commands are newline-terminated ASCII:
| Command | Description | Example |
|---|---|---|
E<n> |
Set expression (0β5) | E0 = Happy |
E<n>,<i> |
Expression with intensity (0β100) | E1,50 = slightly angry |
M<a> |
Mouth angle (0β180) for lip-sync | M140 |
L<lr>,<ud> |
Eye look-at (both axes, 0β180) | L60,70 |
B |
Trigger a blink | B |
I<0|1> |
Idle fallback on/off | I1 |
S<0|1> |
Freeze mode (disable all autonomous) | S1 |
A<n> |
Play animation macro (0β9) | A3 = Wink |
W<n> |
Play movement trick (0β9) | W5 = Jaw Drop |
C<n> |
LED pattern (0=off, 1=solid, 2=pulse, 3=fast) | C2 |
H |
Heartbeat β replies OK\n |
H |
| Index | Expression | Description |
|---|---|---|
| 0 | π Happy | Relaxed eyes, slight smile |
| 1 | π Angry | Squinted eyes, jaw clenched |
| 2 | π’ Sad | Droopy eyes, averted gaze, frown |
| 3 | π€ Thinking | Eyes up-left, neutral mouth |
| 4 | π΄ Sleepy | Nearly closed eyes, relaxed |
| 5 | π² Surprised | Max wide eyes + mouth open |
Each expression can be dialed from 0% (neutral) to 100% (full) using the intensity parameter. The formula: servo_target = 90 + (preset - 90) Γ intensity / 100.
10 pre-baked multi-step animation sequences stored on the Arduino. Each runs as a non-blocking keyframe sequence β the robot stays responsive to new commands while animating.
| Index | Name | What It Does |
|---|---|---|
| A0 | π Nod | Head bobs up/down (yes) |
| A1 | π Shake | Head turns left/right (no) |
| A2 | π Look Around | Dramatic room scan |
| A3 | π Wink | Quick eyelid close-open with smile |
| A4 | π₯± Yawn | Big mouth, sleepy eyes, slow close |
| A5 | π Laugh | Rapid mouth flutter with happy eyes |
| A6 | π Eye Roll | Dramatic circular eye sweep |
| A7 | π¬ Mouth Cycle | Rhythmic open-close |
| A8 | ποΈ Eye Cycle | Eyelids open-close rhythmically |
| A9 | πΊ Wiggle | Playful side-to-side jiggle |
| Index | Name | What It Does |
|---|---|---|
| W0 | 𫨠Crazy Eyes | Rapid random eye darting |
| W1 | π¦· Chatter | Teeth-chattering mouth |
| W2 | π Slow Scan | Dramatic slow left-to-right pan |
| W3 | π Peek-a-boo | Eyes shut tight β surprise pop open |
| W4 | β¨ Double Blink | Two quick blinks |
| W5 | π± Jaw Drop | Dramatic slow mouth open + shock face |
| W6 | π΄ Drowsy | Drift to sleep, then snap awake |
| W7 | π Side Eye | Suspicious side glance |
| W8 | π€© Happy Bounce | Excited bouncing motion |
| W9 | π€ Confused | Uncertain tilting and looking around |
The LLM can trigger robot animations through natural speech via tool calls:
"Brutus, nod your head" β plays Nod animation
"Wink at them" β plays Wink animation
"Do crazy eyes" β plays Crazy Eyes trick
"Act confused" β plays Confused trick
brutus-ai/
βββ src/
β βββ main/ # Electron Main Process (Node.js)
β β βββ index.ts # App entry, IPC registration, BLE manager
β β βββ handlers/ # IPC tool handlers (PhantomControl, ScreenPeeler, SmartDropZone)
β β βββ logic/ # Core logic modules (40+ tools)
β β β βββ adb-manager.ts # ADB over Wi-Fi mobile control
β β β βββ ghost-control.ts # Phantom typing & keyboard injection
β β β βββ telekinesis.ts # Desktop window management
β β β βββ reality-hacker.ts # Puppeteer DOM manipulation
β β β βββ permanent-memory.ts # LanceDB vector memory
β β β βββ gmail-manager.ts # Gmail read/compose
β β β βββ file-ops.ts # File system operations
β β β βββ ... # 20+ more logic modules
β β βββ auto/
β β βββ website-builder.ts # Agentic GSAP/Tailwind site gen
β β βββ widget-manager.ts # Floating desktop widget spawner
β βββ preload/ # Context isolation + IPC bridge
β βββ renderer/ # React 19 frontend
β βββ src/
β β βββ components/ # UI components (widgets, visualizations)
β β βββ pages/ # Feature screens
β β βββ store/ # Zustand global state
β β βββ styles/ # Tailwind v4 + custom CSS
βββ assets/ # Screenshots, build photos, Arduino files
β βββ Display_Emotion.ino # Arduino firmware for robot face
β βββ eyes.h # Eye servo constants
βββ resources/ # App icons
βββ .env.example # API key template
βββ electron.vite.config.ts # Vite split-process config
βββ electron-builder.yml # Windows .exe packaging config
| Layer | Technology |
|---|---|
| Desktop runtime | Electron 41.x + electron-vite |
| Frontend | React 19 + Tailwind CSS v4 |
| State | Zustand |
| Animations | Framer Motion + GSAP 3 |
| 3D visuals | Three.js + React Three Fiber |
| LLM reasoning | Groq SDK (LLaMA 3 / Mixtral) |
| Local inference | @xenova/transformers (Whisper, small LMs) |
| TTS (voice out) | Meta MMS-TTS / Kokoro-82M / StyleTTS2 via @huggingface/inference |
| STT (voice in) | OpenAI Whisper via @xenova/transformers or whisper.cpp |
| Image generation | @huggingface/inference (SDXL / Stable Diffusion) |
| Vector DB | LanceDB (embedded, local-first) |
| Web automation | Puppeteer + puppeteer-extra-stealth |
| OS automation | Nut.js (mouse, keyboard, coordinates) |
| OCR | Tesseract.js (eng.traineddata) |
| Code editor | Monaco Editor |
| Terminal | XTerm.js |
| Maps | Leaflet + React Leaflet (OpenStreetMap) |
| Charts | Recharts |
| Auth / Google | @google-cloud/local-auth + googleapis |
| Notion | @notionhq/client |
| Web search | @tavily/core |
| Face recognition | face-api.js |
| BLE (robot) | Node.js BLE via serial bridge to HM-10 |
Brutus connects to your Android phone wirelessly using ADB over Wi-Fi (TCP/IP). You only need a USB cable once for first-time setup.
Prerequisites: Your PC and phone must be on the same Wi-Fi network, and
adbmust be installed.
Download Android Platform Tools and add the extracted folder to your Windows PATH.
- Go to Settings β About Phone
- Tap Build Number 7 times rapidly
- Go to Settings β Developer Options β Enable USB Debugging
Plug your phone in. Approve the "Allow USB debugging?" dialog on your phone.
adb tcpip 5555You should see: restarting in TCP mode port: 5555
Go to Settings β Wi-Fi β tap your network β IP Address (e.g. 192.168.1.47)
- Unplug USB
- Open Brutus β PHONE tab β NEW DEVICE
- Enter your phone's IP and port
5555 - Click ESTABLISH CONNECTION
Brutus will remember and auto-reconnect on next launch.
| Problem | Fix |
|---|---|
| "Connection refused" | You skipped Step 3 β run adb tcpip 5555 via USB first |
Can't find adb |
Download Platform Tools, extract, add folder to PATH |
| IP keeps changing | Set a static IP in your phone's Wi-Fi settings |
| Phone not detected | Try a different USB cable (data cable, not charge-only) |
- Node.js 18+
- Windows 10 / 11
- A Groq API key (free) from Groq Console for LLaMA 3
- (For robot) Arduino IDE + hardware listed above
git clone https://github.com/Aditya060806/Brutus.git
cd Brutusnpm installCopy the template and fill in your keys:
cp .env.example .envMinimum required in .env:
MAIN_VITE_GROQ_API_KEY="your_groq_api_key" # LLaMA 3 reasoning
VITE_BRUTUS_AI_API_KEY="your_gemini_api_key" # optional fallbackFull setup (unlocks all features):
VITE_IMAGE_AI_API_KEY="your_huggingface_api_key" # image gen + MMS/Kokoro TTS
VITE_TAVILY_API_KEY="your_tavily_api_key" # web search + research
VITE_NOTION_API_KEY="your_notion_key" # Notion sync
VITE_NOTION_DATABASE_ID="your_notion_database_id"For Gmail / Google auth, set up a backend server (see backend.env.example):
PORT=4000
GOOGLE_CLIENT_ID="your_google_client_id"
GOOGLE_CLIENT_SECRET="your_google_client_secret"
GOOGLE_CALLBACK_URL="http://localhost:4000/users/google/callback"
JWT_ACCESS_SECRET="your_jwt_access_secret"
JWT_REFRESH_SECRET="your_jwt_refresh_secret"Add
http://localhost:4000/users/google/callbackas an Authorized redirect URI in Google Cloud Console.
npm run devnpm run build:win- Open
assets/Display_Emotion.inoin Arduino IDE - Select Arduino Uno (or your board)
- Upload the sketch
- Power servos with an external 5V 2A+ supply
- In Brutus, go to Robot Control β Scan β tap your HM-10 device to connect
Note: The HM-10 typically advertises as
HMSoft,BT05, orMLT-BT05. No pairing needed β it's BLE, not classic Bluetooth.
| Key | Required | Purpose | Get it |
|---|---|---|---|
MAIN_VITE_GROQ_API_KEY |
β | LLaMA 3 / Mixtral reasoning (fast, free tier) | Groq Console |
VITE_IMAGE_AI_API_KEY |
β | HuggingFace β MMS TTS + image gen | HuggingFace Tokens |
VITE_TAVILY_API_KEY |
π‘ | Deep web research | Tavily Portal |
VITE_BRUTUS_AI_API_KEY |
π‘ | Gemini AI fallback (optional) | Google AI Studio |
VITE_NOTION_API_KEY |
π‘ | Notion database sync | Notion Integrations |
| Google OAuth | π‘ | Gmail read/compose | Google Cloud Console |
| Component | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 | Windows 11 |
| RAM | 4 GB | 8 GB (for heavy RAG indexing + local TTS) |
| Storage | 3.5 GB | 5 GB+ (for vector DB + TTS model weights) |
| Node.js | 18.x | 20.x LTS |
| GPU | Not required | CUDA GPU speeds up local Whisper + StyleTTS2 |
- π€ Custom wake word ("Hey Brutus") via Porcupine or openWakeWord
- π¦ Fine-tuned Brutus voice model (Meta MMS-TTS trained on custom dataset)
- π StyleTTS2 integration for emotion-expressive voice output
- π§ Fully offline mode β local Whisper + local LLM (Ollama / llama.cpp)
- π macOS + Linux support
- π Plugin marketplace for community tools
- πΈοΈ Memory graph visualization
- π€ Multi-agent collaboration mode
- π Neopixel RGB LED strip for true color emotions
- π¦Ύ Neck pan/tilt servo for head tracking
- βοΈ Desktop + Cloud hybrid sync
- π± Deeper integration with Brutus Android app
- 100% BYOK β Bring Your Own Keys. Your API keys never leave your machine.
- Local encryption β Keys stored via OS keychain /
electron-store. - Zero-trust β No external key storage, no telemetry, no phone-home.
- Face-lock vault β Optional biometric face recognition via
face-api.jsto restrict access. - Open-source voice β No audio sent to proprietary voice APIs. STT and TTS run locally.
Contributions are welcome! Feel free to open issues or submit pull requests.
- Fork the repository
- Create your feature branch:
git checkout -b feature/amazing-feature - Copy
.env.exampleβ.envand fill in your keys - Match existing patterns (Tailwind for UI, strict IPC typing for the backend)
- Test thoroughly β ensure tools do not block the Electron main thread
- Commit:
git commit -m 'feat: add amazing feature (#45)' - Push:
git push origin feature/amazing-feature - Open a Pull Request with a clear description and screenshots if UI is changed
Read the full Contribution Guide before submitting.
| Project | Platform | Description |
|---|---|---|
| Brutus AI (this repo) | π₯οΈ Windows Desktop | Electron + React desktop agent with robot BLE control |
| Brutus Mobile | π± Android | Flutter app with Gemini Live, robot BLE, and 25+ tools |
Brutus has deep system-level execution capabilities β file writes, OS automation, ADB mobile control, and web automation. Use responsibly. The maintainers are not liable for misuse.
Aditya Pandey β AI Systems Engineer
- GitHub: @Aditya060806
- LinkedIn: Aditya Pandey
- Email: aditya060806@gmail.com
This project is licensed under the MIT License. See LICENSE for details.
Built with β€οΈ using Electron, React, LLaMA, Meta MMS, and Arduino
Brutus AI β Because your AI assistant deserves a face.






