Skip to content

AVADSA25/codec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

145 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CODEC

CODEC

Open-Source Intelligent Command Layer

What if your computer could actually hear you, see what you see, and speak back?

Your voice. Your computer. Your rules.

One AI system that replaces the 15 apps you click through every day — drafts emails, runs research, writes code, controls your lights, reads your screen. 50+ skills. 12 autonomous agents. Zero tabs.

opencodec.org · AVA Digital LLC · Get Started · Support · Enterprise


The Problem

You spend 4+ hours a day switching tabs, copy-pasting, formatting emails, checking calendars, re-reading Slack threads, Googling the same things, clicking through the same 15 steps to get one task done.

AI chatbots don't fix this. They're another tab. You still copy, paste, switch, click.


7 Products. One System.

Product One-liner
1 CODEC Core 50+ voice skills, keyboard shortcuts, direct chat replies — your Mac's AI command layer
2 CODEC Dictate Hold a key, speak, release — text appears in any app instantly
3 CODEC Instant Select text, right-click — AI rewrites, translates, replies in-place. Instant.
4 CODEC Chat 250K-context chat with file uploads, vision, web search, and 12 autonomous agent crews
5 CODEC Vibe AI coding IDE — describe what you want, watch it build, live preview in browser
6 CODEC Voice Call your AI like a phone call — speak to interrupt, live webcam feed, it acts while you talk
7 CODEC Overview Your AI dashboard — every tool, every agent, one screen

What hits different

Reply to any message without typing. Slack pings. WhatsApp buzzes. iMessage lights up. You don't switch apps, don't open keyboards. Press a key, say "reply saying I'll be there in 10, casual tone" — CODEC reads the conversation, drafts the reply, and pastes it directly into the chat input. You hit send. Done.

Keyboard shortcuts that replace entire workflows. Hold F18 → speak → release. Your voice becomes a command. Double-tap * → CODEC screenshots your screen and tells you what it sees. Select any text → right-click → CODEC Instant rewrites it, translates it, proofreads it, explains it. No app to open. No tab to switch to. It's already there.

Your screen is the context. CODEC reads what's in front of you — your IDE, your browser, your email. "What's wrong with this code?" It sees the error. "Summarize this article" — it reads the page. "Fill in this form" — it types into the fields. No copy-paste. No explaining what you're looking at.

Live webcam, not just screenshots. Click the camera icon on any page — a draggable picture-in-picture feed appears in the corner. Point it at your whiteboard, your front door, a document on your desk. Hit the snapshot button and CODEC analyzes what it sees. The feed stays open while you keep working.

AI agents that work while you don't. Not chat responses — full autonomous workflows. "Deep research AI in healthcare" → 8 agents fan out, run 20+ searches, write a 10,000-word report with images, deliver it to your Google Docs. Schedule any crew to run on repeat — morning briefings, competitor analysis, inbox triage — all on cron.

Nothing leaves your machine. Run Qwen, Llama, or Mistral locally. Conversations stored in local SQLite. No cloud. No telemetry. No analytics. End-to-end encrypted between your browser and server (AES-256-GCM + ECDH key exchange). Your data is yours. Period.


Real Workflows That Save Hours

Instead of this... With CODEC
Open Gmail → scan 47 emails → draft 3 replies → format → send "Check my email, flag anything urgent, draft replies"
Open Google Docs → blank page → research → write 10 pages → images → format "Deep research AI in healthcare, save to Docs" — 10,000-word report delivered
Slack notification → switch app → read thread → type reply → proofread → send Press key → "Reply saying I'll review it tonight" → pasted into Slack
Copy error → browser → paste → Stack Overflow → try fix → repeat "Read my screen and fix this error" — never leaves your editor
Open competitor site → notes → pricing → write analysis "Run competitor analysis on [company]" — SWOT delivered to Docs
Open 6 tabs → read news → check industry → take notes Runs automatically at 8am — briefing waiting in your Drive
Select text → copy → open translator → paste → copy result → paste back Right-click → CODEC Translate — replaced in-place, one click
Write LinkedIn post → rewrite 4 times → check tone → proofread "Write a LinkedIn post about [topic]" — polished, ready to post

Screenshots

Quick Chat
Chat — ask anything, drag & drop files, full conversation history

Chat with File Analysis
Deep Chat — upload files, select agents, get structured analysis

Voice Call
Voice Call — real-time conversation with live transcript

Vibe Code
Vibe Code — describe what you want, get working code with live preview

Deep Research Report
Deep Research — multi-agent reports delivered to Google Docs

Tasks & Schedules
Scheduled automations — morning briefings, competitor analysis, on cron

More screenshots

Settings
Settings — LLM, TTS, STT, hotkeys, wake word configuration

Agent Options
12 specialized agent crews

Authentication
Touch ID + PIN + 2FA authentication

Right-Click Menu
Right-click integration — CODEC in every app

Terminal
50+ skills loaded at startup


Quick Start

git clone https://github.com/AVADSA25/codec.git
cd codec
./install.sh        # one-line setup wizard
python3 codec.py    # start CODEC

Or step by step:

pip3 install -r requirements.txt
python3 setup_codec.py    # guided 9-step configuration
python3 codec.py

Requires macOS. Python 3.10+. Linux support planned.


How It Works

Voice → Action Pipeline

You speak → Whisper STT → intent dispatch → skill / agent crew → action on your Mac

Triggers:

Input What happens
Hold F18, speak, release Voice command — say it and it's done
Double-tap F18 PTT Lock — hands-free recording, tap again to stop
F16 / F9 Type a command instead of speaking
Double-tap * * Screenshot + AI reads your screen
Double-tap + + Analyze document in clipboard
Camera icon (any page) Live webcam PIP — drag around, snapshot anytime
Select text → right-click 8 AI services in context menu

50+ Skills

Grouped by what they do, not marketing categories:

Your day: Google Calendar, Gmail, Google Tasks, Google Keep, daily briefing, timer, pomodoro Your files: Google Drive, Google Docs, Google Sheets, Google Slides, file search, clipboard Your browser: open sites, search, read pages, fill forms, extract data, scroll, manage tabs, automate morning routines Your writing: draft emails, proofread, elevate, translate, explain, reply, LinkedIn posts Research: web search, URL summarizer, deep research (10,000-word multi-agent reports), competitor analysis Your Mac: process manager, network info, brightness, screenshot OCR, terminal commands, AX bridge (click any button in any app) Coding: Vibe IDE with Monaco editor, live preview, inspect mode, Skill Forge (auto-generate plugins) Smart home: Philips Hue lights — on/off, brightness, colors, scenes, room targeting Meta: memory search, skill marketplace (install/publish), scheduler (cron agents)

12 Agent Crews

Not single prompts — full multi-step AI workflows that run autonomously:

Crew What you get
Deep Research 10,000-word report with images → Google Docs
Daily Briefing Morning industry news + your calendar → Google Docs
Competitor Analysis SWOT + competitive positioning → Google Docs
Trip Planner Full itinerary with hotels, flights, activities → Google Docs
Email Handler Triage inbox, draft replies, summarize threads
Social Media Platform-specific posts for Twitter, LinkedIn, Instagram
Code Review Bug hunt + security audit + clean code suggestions
Data Analysis Gather data, find trends, write insights report
Content Writer Blog posts, articles, marketing copy
Meeting Summarizer Extract action items from transcripts
Invoice Generator Create and send professional invoices
Custom Agent Build your own — define role, tools, task

Schedule any crew: "Run competitor analysis every Monday at 9am"

Right-Click Services (CODEC Instant)

Select text anywhere → right-click:

Service Result
Proofread Grammar, spelling, clarity — fixed and replaced
Elevate Rewritten at executive level
Translate Translated to English (or configured language)
Explain Plain-English explanation
Reply Smart reply with optional :tone syntax
Prompt Optimized as an LLM prompt
Read Aloud Spoken via Kokoro TTS
Save Saved to Google Keep or local notes

MCP Server — CODEC Inside Claude, Cursor, VS Code

CODEC exposes tools as an MCP server. Any MCP-compatible client can invoke CODEC skills directly:

{
  "mcpServers": {
    "codec": {
      "command": "python3",
      "args": ["/path/to/codec-repo/codec_mcp.py"]
    }
  }
}

Then in Claude Desktop: "Use CODEC to check my calendar for tomorrow."

Skills opt-in to MCP exposure with SKILL_MCP_EXPOSE = True.


Privacy & Security

This isn't a marketing section. It's the architecture.

Your data never leaves. CODEC runs on your machine. Conversations, files, calendar data, memory — stored locally in SQLite. No cloud sync. No analytics endpoint. No telemetry. Check the source.

Run any LLM locally. Qwen, Llama, Mistral, Gemma — via MLX, Ollama, or LM Studio. Zero API calls if you want. Or use cloud APIs (OpenAI, Claude, Gemini) — your choice.

5-layer security stack for remote access. Cloudflare Zero Trust tunnel (or Tailscale VPN — no domain needed) → PIN → Touch ID biometrics → TOTP 2FA (Google Authenticator / Authy) → E2E encryption (AES-256-GCM + ECDH P-256 key exchange). Every request between browser and server is encrypted end-to-end on top of TLS.

Command safety. Dangerous command blocklist. Subprocess isolation with resource limits (512MB RAM, 120s CPU). Review-and-approve gate before any script runs. LLM-generated skills require human review.

Memory is yours. Full-text search (SQLite FTS5) across every conversation — but only on your machine. Parameterized queries prevent injection. No external memory service.

Layer Protection
Network Cloudflare Zero Trust tunnel or Tailscale VPN, CORS restricted origins
Encryption AES-256-GCM + ECDH P-256 key exchange, per-session keys
Auth Touch ID + PIN + TOTP 2FA, timing-safe token comparison
Sessions SameSite=Strict, CSRF tokens, conditional Secure flag
Execution Subprocess isolation, resource limits, command blocklist
Skills Blocked imports, human review gate, SHA-256 marketplace verification
Data Local SQLite, parameterized queries, FTS5 sanitization

Supported LLMs

Model How to run
Qwen 3.5 35B (recommended) mlx-lm.server --model mlx-community/Qwen3.5-35B-A3B-4bit
Llama 3.3 70B mlx-lm.server --model mlx-community/Llama-3.3-70B-Instruct-4bit
Mistral 24B mlx-lm.server --model mlx-community/Mistral-Small-3.1-24B-Instruct-2503-4bit
Gemma 3 27B mlx-lm.server --model mlx-community/gemma-3-27b-it-4bit
GPT-4o (cloud) "llm_url": "https://api.openai.com/v1"
Claude (cloud) OpenAI-compatible proxy
Ollama (any model) "llm_url": "http://localhost:11434/v1"

Configure in ~/.codec/config.json:

{
  "llm_url": "http://localhost:8081/v1",
  "model": "mlx-community/Qwen3.5-35B-A3B-4bit"
}

Keyboard Shortcuts

Extended keyboard (F13-F18):

Key Action
F13 Toggle CODEC ON/OFF
F18 (hold) Record voice → release to send
F18 (double-tap) PTT Lock — hands-free recording
F16 Text input dialog
* * Screenshot + AI analysis
+ + Document mode

Laptop (F1-F12): F5 = toggle, F8 = voice, F9 = text input

Custom shortcuts in ~/.codec/config.json. Restart after changes: pm2 restart open-codec


Troubleshooting

Keys don't work
  • Laptop? Run python3 setup_codec.py → select "Laptop / Compact" in Step 4
  • macOS stealing F-keys? System Settings → Keyboard → "Use F1, F2, etc. as standard function keys"
  • After config change: pm2 restart open-codec
Wake word doesn't trigger
  • Check Whisper: pm2 logs whisper-stt --lines 5 --nostream
  • Check mic permission: System Settings → Privacy → Microphone
  • Say "Hey CODEC" clearly — 3 distinct syllables
  • 4-layer noise gate handles most backgrounds, but loud music near the mic can interfere
No voice output
  • Check Kokoro TTS: curl http://localhost:8085/v1/models
  • Fallback: "tts_engine": "say" in config.json (macOS built-in)
  • Disable: "tts_engine": "none"
Dashboard not loading
  • Check: curl http://localhost:8090/
  • Restart: pm2 restart codec-dashboard
  • Remote via Cloudflare: pm2 logs cloudflared --lines 3 --nostream
  • Remote via Tailscale: install Tailscale on your Mac and phone — access CODEC at http://100.x.x.x:8090 with no domain or tunnel setup needed
Skills not loading
  • Check: pm2 logs open-codec --lines 20 --nostream | grep -i skill
  • Count: ls ~/.codec/skills/*.py | wc -l
Agents failing
  • First run takes 2-5 min — multi-step research
  • Check: pm2 logs codec-dashboard --lines 30 --nostream | grep Agents
  • Agents run as background jobs — no Cloudflare timeout

Project Structure

codec.py              — Entry point
codec_config.py       — Configuration + transcript cleaning
codec_keyboard.py     — Keyboard listener, PTT lock, wake word
codec_dispatch.py     — Skill matching and dispatch
codec_agent.py        — LLM session builder
codec_agents.py       — Multi-agent crew framework (12 crews)
codec_voice.py        — WebSocket voice pipeline
codec_voice.html      — Voice call UI
codec_dashboard.py    — Web API + dashboard (60+ endpoints)
codec_dashboard.html  — Dashboard UI
codec_chat.html       — Chat UI
codec_vibe.html       — Vibe Code IDE
codec_auth.html       — Authentication (Touch ID + PIN + TOTP 2FA)
codec_textassist.py   — 8 right-click services
codec_search.py       — DuckDuckGo + Serper search
codec_mcp.py          — MCP server
codec_memory.py       — FTS5 memory search
codec_heartbeat.py    — Health monitoring + task auto-execution
codec_scheduler.py    — Cron-like agent scheduling
codec_marketplace.py  — Skill marketplace CLI
ax_bridge/            — Swift AX accessibility bridge
swift-overlay/        — SwiftUI status bar app
skills/               — 50+ built-in skills
tests/                — 212+ pytest tests
install.sh            — One-line installer
setup_codec.py        — Setup wizard (9 steps)

What's Coming

  • Linux support
  • Windows via WSL
  • Multi-machine sync (skills + memory across devices)
  • iOS app (dictation + remote dashboard)
  • Streaming voice responses (first token plays while rest generates)
  • Multi-LLM routing (fast model for simple, strong model for complex)

Contributing

All skill contributions welcome. 50+ built-in, marketplace growing.

git clone https://github.com/AVADSA25/codec.git
cd codec && ./install.sh
python3 -m pytest   # all tests must pass

See CONTRIBUTING.md.


Support the Project

If CODEC saves you time:


Professional Setup

Need CODEC configured for your business, integrated with your tools, or deployed across a team?

Contact AVA Digital for professional setup and custom skill development.


Built by AVA Digital LLC · MIT License