VoxAI_Chat_API

██╗   ██╗ ██████╗ ██╗  ██╗       █████╗ ██╗
██║   ██║██╔═══██╗╚██╗██╔╝      ██╔══██╗██║
██║   ██║██║   ██║ ╚███╔╝ █████╗███████║██║
╚██╗ ██╔╝██║   ██║ ██╔██╗ ╚════╝██╔══██║██║
 ╚████╔╝ ╚██████╔╝██╔╝ ██╗      ██║  ██║██║
  ╚═══╝   ╚═════╝ ╚═╝  ╚═╝      ╚═╝  ╚═╝╚═╝

A Hybrid Local/Cloud Architecture for Large Language Models.

VoxAI is a specialized Python-based interface designed to seamlessly bridge the gap between local hardware (APUs/Consumer GPUs) and high-performance cloud infrastructure (RunPod). It allows users to run efficient models locally while instantly "bursting" to the cloud for massive 70B+ parameter models, all within a single, unified chat interface.

🌟 Key Features

🛡️ Hybrid Engine (Local + Cloud)

Local Mode: Optimized for consumer hardware (e.g., AMD RX 6600, NVIDIA RTX 3060). Uses llama-cpp-python with Vulkan/CUDA acceleration for standard models like Qwen 7B and Llama 3.
Cloud Mode: Instant uplink to RunPod serverless GPUs (A40, A6000, A100) for running heavyweights like Midnight-Miqu 70B and Qwen2.5 72B.

🔄 Smart Swapping & Auto-Healing

In-Pod Swapping: Hot-swaps models inside the running container to reuse the GPU, reducing switch time from ~5 minutes to seconds.
Zombie Process Protection: If an in-pod swap fails (e.g., library mismatch), the system automatically detects the failure, kills the "zombie" pod, and spins up a fresh compatible instance.

💰 Cost Efficiency

Tiered GPU Selection: Automatically selects the cheapest viable GPU.
- Small Models (<30B): Rents an RTX A40/A6000 (~$0.30/hr).
- Ultra Models (70B+): Rents an A100 80GB (~$1.79/hr).
Auto-Shutdown: Prevents billing accidents by terminating cloud resources on exit.

🚀 Quick Start

1. Installation

Clone the repository and install dependencies:

git clone https://github.com/yourusername/VoxAI_Chat_API.git
cd VoxAI_Chat_API
pip install python==3.10
pip install -r requirements.txt

2. Configuration

Rename the template and add your API keys:

mv config.example.py config.py

Edit config.py:

API_KEY = "YOUR_RUNPOD_API_KEY"
POD_ID = "YOUR_POD_ID" # Optional: Initial Pod ID

3. Usage

Launch the unified start script:

start.bat

You will be prompted to select your environment:

[1] LOCAL (RX 6600) | [2] CLOUD (RunPod)

🛠️ Customization

Adding Your Own Models

To add a new model to the menu, simply edit the MODEL_MAP in your config.py file.

Format: "Menu Display Name": "Local Filename"

MODEL_MAP = {
    # ... existing models ...
    "My New Cool Model": "my-model-v1.gguf",
}

Note: Ensure the .gguf file is placed inside the models/ directory.

🎮 Usage Examples

Local GGUF Loading (Hardware Handshake)

VoxAI performs a hardware handshake on boot to determine optimal thread counts and GPU layers. Performance: Local 14B models typically respond in <15s on mid-range hardware.

[LOCAL] 🛡️ Loading GGUF: models/Qwen3-VL-8B-Instruct.gguf...
[HANDSHAKE] Detected 4 Physical Cores. Mode: APU (Hybrid/Vulkan)
[LOCAL] 🟢 Backend drivers loaded manually.
[LOCAL] ✅ Engine Online.

The "Zombie" Recovery (Self-Healing)

During testing, swapping from a 70B model (A100) to a 14B model (A40) caused a library mismatch. VoxAI self-corrected in <45s.

[PHOENIX] 🔥 Initiating Swap...
[PHOENIX] ♻️  Optimizing: Reusing active GPU...
[DEBUG] Valid Endpoint, but ID mismatch (1/5): 'openai/gpt-oss-20b' != 'Qwen/Qwen3-14B'
[PHOENIX] ❌ Swap Verification Failed: Persistent Old Model Detected.
[PHOENIX] ⚠️ In-Pod Swap failed. Retrying with fresh pod...
[PHOENIX] ☠️ Terminating old pod...
[PHOENIX] 🐣 Renting NVIDIA A40...
[PHOENIX] ✅ Online! Serving: Qwen/Qwen3-14B

Creative Generation

User: "Can u write me a song Like AJR kinda song?" VoxAI (Qwen 14B):

Title: "Echoes in the Static" Verse 1: Driving through the city, dashboard lights are glowing... Chorus: I hear echoes in the static, whispers in the noise...

📂 Project Structure

start.bat - Main entry point.
vox_core_chat.py - The brain. Handles input, local inference, and cloud orchestration.
runpod_interface.py - The driver. Manages RunPod API, renting, and swapping.
machine_engine_handshake.py - Hardware detection logic for local optimization.
config.py - User settings (GitIgnored).

📄 License

This project is open-source. Feel free to fork and modify!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
runpod_backend		runpod_backend
.gitignore		.gitignore
.wrapper_version		.wrapper_version
README.md		README.md
benchmark_vs_ollama.py		benchmark_vs_ollama.py
check_gated.py		check_gated.py
config.example.py		config.example.py
debug_engine.py		debug_engine.py
find_qwen.py		find_qwen.py
ggml-base.dll		ggml-base.dll
ggml-cpu-alderlake.dll		ggml-cpu-alderlake.dll
ggml-cpu-cannonlake.dll		ggml-cpu-cannonlake.dll
ggml-cpu-cascadelake.dll		ggml-cpu-cascadelake.dll
ggml-cpu-cooperlake.dll		ggml-cpu-cooperlake.dll
ggml-cpu-haswell.dll		ggml-cpu-haswell.dll
ggml-cpu-icelake.dll		ggml-cpu-icelake.dll
ggml-cpu-ivybridge.dll		ggml-cpu-ivybridge.dll
ggml-cpu-piledriver.dll		ggml-cpu-piledriver.dll
ggml-cpu-sandybridge.dll		ggml-cpu-sandybridge.dll
ggml-cpu-sapphirerapids.dll		ggml-cpu-sapphirerapids.dll
ggml-cpu-skylakex.dll		ggml-cpu-skylakex.dll
ggml-cpu-sse42.dll		ggml-cpu-sse42.dll
ggml-cpu-x64.dll		ggml-cpu-x64.dll
ggml-cpu-zen4.dll		ggml-cpu-zen4.dll
ggml-rpc.dll		ggml-rpc.dll
ggml-vulkan.dll		ggml-vulkan.dll
ggml.dll		ggml.dll
known_models.json		known_models.json
libomp140.x86_64.dll		libomp140.x86_64.dll
llama.dll		llama.dll
machine_engine_handshake.py		machine_engine_handshake.py
main.py		main.py
mtmd.dll		mtmd.dll
remote_client.py		remote_client.py
requirements.txt		requirements.txt
runpod_gpt_oss_loading_report.md		runpod_gpt_oss_loading_report.md
runpod_interface.py		runpod_interface.py
runpodctl.exe		runpodctl.exe
standalone_chat.py		standalone_chat.py
start.bat		start.bat
vox_api.py		vox_api.py
vox_core_chat.py		vox_core_chat.py
walkthrough.md		walkthrough.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoxAI_Chat_API

📋 Table of Contents

🌟 Key Features

🛡️ Hybrid Engine (Local + Cloud)

🔄 Smart Swapping & Auto-Healing

💰 Cost Efficiency

🚀 Quick Start

1. Installation

2. Configuration

3. Usage

🛠️ Customization

Adding Your Own Models

🎮 Usage Examples

Local GGUF Loading (Hardware Handshake)

The "Zombie" Recovery (Self-Healing)

Creative Generation

📂 Project Structure

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoxAI_Chat_API

📋 Table of Contents

🌟 Key Features

🛡️ Hybrid Engine (Local + Cloud)

🔄 Smart Swapping & Auto-Healing

💰 Cost Efficiency

🚀 Quick Start

1. Installation

2. Configuration

3. Usage

🛠️ Customization

Adding Your Own Models

🎮 Usage Examples

Local GGUF Loading (Hardware Handshake)

The "Zombie" Recovery (Self-Healing)

Creative Generation

📂 Project Structure

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages