This repository provides AI services for the AutismyVR platform, including a Flask REST API and a Streamlit Debug UI, both integrated with a PostgreSQL database and LLM models.
- API: Flask REST API with Swagger documentation (
/apidocs). - UI: Streamlit interface for chat and debugging.
- Database: PostgreSQL for storing chat sessions and interactions.
- Services: Shared logic in
src/for consistent behavior across API and UI. - Models: Abstraction layer for LLM (Ollama) and other AI models.
src/
├── models/ # SQLAlchemy database models
│ ├── __init__.py
│ └── chat_models.py
├── services/ # Business logic layer
│ ├── __init__.py
│ ├── chat_service.py
│ ├── audio_service.py
│ └── title_service.py
├── clients/ # External service clients
│ ├── __init__.py
│ ├── ollama_client.py
│ ├── whisper_client.py
│ ├── tts_client.py
│ └── liveportrait_client.py
├── controllers/ # RESTful API controllers
│ ├── __init__.py
│ ├── chat_controller.py
│ └── audio_controller.py
├── auth.py # Firebase authentication
└── db.py # Database configuration
api/
├── routes.py # Route registration
└── app.py # Flask application factory
- Docker and Docker Compose
- Python 3.12 (for local development)
To start the entire stack (Postgres, API, UI, Whisper, TTS, LivePortrait):
docker-compose up -d --buildServices:
- API Swagger: http://localhost:5001/apidocs
- Streamlit UI: http://localhost:8501
- Whisper API: http://localhost:8000
- TTS API: http://localhost:8001
- LivePortrait API: http://localhost:8002
Note: For Mac users, GPU services will run on CPU. For NVIDIA GPU support on Windows/Linux, ensure Docker Desktop has GPU support enabled.
-
Install Dependencies:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt -
Run Tests:
export PYTHONPATH=. pytest --cov=src --cov=api tests/ -
Run API:
export PYTHONPATH=. python api/app.py -
Run Streamlit:
export PYTHONPATH=. streamlit run app.py
We chose HTTP (REST) over gRPC to minimize server complexity and resource usage. While gRPC offers lower latency for streaming, HTTP is stateless, easier to scale, and sufficient for the current "User Speaks -> Processing -> Response" flow where client-side VAD (Voice Activity Detection) handles the cutoff.
We implemented a Custom Python Service using direct Ollama Client integration rather than n8n Agent nodes.
- Why: We require low-latency, granular control over the context window and chat history which "black box" agent nodes often abstract away.
- Implementation: We use a pure
OllamaClientinsrc/modelsto interact directly with the LLM. This allows strict typing of input/output and simpler debugging integration with the rest of our application logic. - Benefit: Easier testing (100% coverage), better performance, and ensures the "brain" of the VR avatar behaves exactly as defined without overhead from an orchestration platform.
The API follows RESTful design principles, using clear resource-based endpoints and appropriate HTTP methods. All endpoints are documented using Swagger. Once running, access it at /apidocs.
Text Chat:
POST /chat- Create a new text chat sessionPOST /chat/:session_uuid- Send a message to an existing text session
Audio Chat:
POST /audio2audio- Create a new audio chat session (receives audio file)POST /audio2audio/:session_uuid- Send audio to an existing audio session
History:
GET /history- List all sessions (dev: all interactions, stag/prod: session list)GET /history/:session_uuid- Get full history of a specific session
All API endpoints require Firebase Authentication in staging and production environments. Include the Firebase ID token in the Authorization header as Bearer <token>. See Firebase Setup Guide for configuration details.
Development Mode: When ENV_LEVEL=dev (default), Firebase authentication is bypassed for testing purposes. In this mode, all requests are automatically authenticated with a mock user (dev-user). This allows testing the API without Firebase credentials. Note: Authentication bypass is only available in development. Staging (ENV_LEVEL=stag) and production (ENV_LEVEL=prod) environments require valid Firebase tokens.
LivePortrait can be enabled via:
- Environment variable:
LIVEPORTRAIT_ENABLED=true(default:false) - Query parameter:
?liveportrait=true(overrides environment variable)
When enabled, audio responses include LivePortrait avatar animation data.
The Docker Compose configuration supports GPU acceleration for AI services:
- NVIDIA GPUs (Windows/Linux): Automatically uses GPU via Docker GPU runtime
- Mac (Metal): Services run on CPU by default. For Metal acceleration, configure services to run directly on host or use Metal-compatible containers.
Services that use GPU:
- Whisper: Audio transcription
- TTS (Piper): Text-to-speech synthesis
- LivePortrait: Avatar animation generation
The project includes Integration and BDD tests.
- Integration:
tests/integration/ - BDD:
tests/bdd/checking features intests/bdd/features/
CI/CD is configured via GitHub Actions.