Status: Fully functional / deployment-ready with Vertex AI Vector Search
A mental health chatbot with intelligent memory powered by Google Cloud Vertex AI Vector Search, providing scalable, high-performance semantic memory retrieval. The system stores encrypted conversation summaries in Firestore while leveraging Vertex AI's dedicated vector search infrastructure for lightning-fast similarity matching.
New: The conversation logic is now clinically-grounded, using the Transtheoretical Model (TTM) of Change to guide the user through five distinct therapeutic stages, with all responses rooted in Cognitive Behavioural Therapy (CBT) principles.
Features end-to-end encryption for sensitive data using Google Cloud KMS, secure Firebase OAuth authentication, guest sessions, dynamic time-aware conversations, and both command-line interface and modern web frontend.
Traditional chatbots either forget everything after each session or store everything indiscriminately, leading to irrelevant responses or privacy concerns. Those that do implement memory typically use inefficient full-database scans for each query, resulting in poor performance and scalability issues. Additionally, most mental health chatbots store sensitive conversation data in plaintext, creating significant security and compliance risks.
This chatbot implements a production-grade, scalable memory architecture that combines:
- Clinically-Grounded Conversation: A TTM/CBT framework guides the conversation through 5 stages (from Relationship Building to Intervention) for effective support.
- Intelligent curation of meaningful therapeutic conversations.
- Vertex AI Vector Search for high-performance semantic retrieval.
- End-to-end encryption using Google Cloud KMS for HIPAA-grade security.
- User-specific namespace filtering ensuring complete privacy isolation between users.
- Temporal awareness for contextually appropriate responses.
The result is a chatbot that provides personalized, continuous mental health support with enterprise-level performance and security.
- Vertex AI Vector Search Integration: Dedicated, scalable vector search infrastructure.
- O(log n) Performance: Sub-100ms memory retrieval regardless of database size.
- User Namespace Isolation: Each user's vectors are isolated in their own namespace.
- Automatic Failover: Gracefully handles Vector Search unavailability.
- Hybrid Storage: Encrypted metadata in Firestore, vectors in Vector Search.
- Firebase OAuth Authentication: Secure Google sign-in with guest session support.
- Multiple Authentication Options: Google OAuth or anonymous guest sessions.
- Privacy-First Design: Explicit user consent required before storing any conversations.
- User Data Control: Users can delete all their memories and change consent settings anytime.
- Instruction Reset: Users can clear all custom instructions given to the chatbot, which also resets their conversation stage back to Stage 1.
- End-to-End Encryption: All sensitive data encrypted at rest using Google Cloud KMS.
- HIPAA-Grade Security: Enterprise-level encryption for mental health data protection.
- Complete User Isolation: Users can only access their own memories via namespace filtering.
- Clinically-Grounded Conversation (TTM & CBT): The AI actively manages the conversation's flow based on the Transtheoretical Model (TTM) / Stages of Change. It tracks the user's
current_stage(saved in their profile) and adapts its goals, from Stage 1 (Relationship Building) to Stage 4 (Intervention), ensuring all responses are appropriate and therapeutically aligned with CBT principles. - Multilingual Support: Serena is multilingual and will automatically respond in whatever language the user is using, allowing for natural conversation in the user's preferred language.
- Intelligent Memory: Only saves conversations with significant therapeutic value (determined by a separate AI analysis call) to prevent memory clutter.
- Semantic Memory Retrieval: Uses 768-dimensional vector embeddings (text-embedding-004) for accurate similarity matching.
- Global User Instructions: Users can provide direct instructions (e.g., "Always call me 'Captain'") that the chatbot will remember and follow in all future conversations.
- Encrypted Storage: Conversation summaries, PII, and internal therapeutic context are encrypted before storage.
- Plaintext Processing: Vector embeddings generated from plaintext for accurate similarity matching.
- Granular Memory Timestamps: Each retrieved memory includes precise temporal context (e.g., "2 days ago", "5 minutes ago").
- Temporal Conversation Flow: Recognizes time patterns and provides contextually appropriate responses. If a user is inactive for over 24 hours, the bot automatically resets to "Stage 1: Relationship Building" to re-establish rapport.
- Cloud KMS Integration: Military-grade encryption using Google Cloud Key Management Service.
- Selective Encryption: All sensitive fields encrypted including PII, conversation summaries, user instructions, and internal therapeutic context.
- Performance Optimized: Embeddings stored in Vector Search for fast semantic search.
- Automatic Decryption: Transparent decryption during memory retrieval and context loading.
- Key Rotation Support: Compatible with KMS key rotation policies.
- Dynamic Time-Aware Greetings: Automatically adapts opening messages based on actual time elapsed since last interaction.
- Dynamic Response Generation: Explicitly varies phrasing and avoids repetitive opening lines.
- Context-Aware Responses: Seamlessly integrates relevant past memories into conversations.
- React Web Frontend: Modern, responsive web interface for seamless user experience.
- Gemini Integration: Uses latest Google Gemini 2.5 Flash model for empathetic, stage-aware responses.
User Input
↓
[1. Generate Embedding] (text-embedding-004, 768 dimensions)
↓
[2. Query Vector Search] (with user namespace filter)
↓ (returns memory IDs + similarity scores)
↓
[3. Hydrate from Firestore] (batch get encrypted summaries)
↓
[4. Decrypt Summaries & Context] (Google Cloud KMS)
↓
[5. Generate AI Response (Call 1)] (Gemini 2.5 Flash with TTM/CBT prompt)
↓ (returns empathetic reply_text + new_stage + updated_context)
↓
[6. Update User Profile] (Save new_stage and encrypt/save updated context)
↓
[7. Analyze Conversation (Call 2)] (Separate, non-empathetic AI call)
↓ (returns significance, summary, instruction)
↓
[8. If Significant:]
├─ Encrypt Summary (KMS)
├─ Store Metadata (Firestore)
└─ Upsert Vector (Vector Search with user namespace)
↓
Response to User
- User sends message → AI generates response (Call 1)
- Internal context updated and encrypted before saving
- Conversation analyzed for significance (Call 2)
- If significant:
- Generate embedding from plaintext summary (768-dim vector)
- Encrypt summary using Cloud KMS
- Save encrypted summary + metadata to Firestore
- Upsert vector to Vertex AI Vector Search with user namespace restriction
- Memory ID links Firestore document to Vector Search datapoint
- User sends new message
- Retrieve user's profile from Firestore (includes encrypted
context) - Decrypt user's
contextfield (internal TTM/CBT notes) using Cloud KMS - Retrieve user's
current_stagefrom profile - Generate query embedding (768-dim vector)
- Query Vertex AI Vector Search with user namespace filter
- Receive top-k similar memory IDs and distances
- Batch-fetch corresponding documents from Firestore
- Decrypt conversation summaries using Cloud KMS
- Add temporal context ("2 days ago")
- Include all context (decrypted internal notes, memories, stage) in TTM/CBT prompt for AI (Call 1)
- Each vector stored with user_id namespace restriction
- Queries filtered to only search within user's namespace
- Impossible for User A to retrieve User B's memories
- Double-protected: Firestore subcollections + Vector Search namespaces
- Authentication: Firebase Auth with ID token verification
- Backend: Flask server with Vertex AI integration and token-based security
- Database (Metadata): Google Cloud Firestore for user profiles and encrypted memory summaries
- Vector Search: Vertex AI Vector Search for high-performance semantic similarity
- Encryption: Google Cloud KMS for at-rest encryption of sensitive data
- AI Models:
- Gemini 2.5 Flash for conversations (TTM/CBT logic)
- text-embedding-004 for 768-dimensional memory vectors
- Frontend: React + Vite web interface
The project leverages Vertex AI Vector Search for its high-performance, scalable architecture:
- Vectors are stored in dedicated, optimized Vector Search infrastructure.
- It uses approximate nearest neighbor (ANN) algorithms, providing O(log n) query complexity.
- Similarity computation is hardware-accelerated.
- Performance remains constant (typically 50-200ms) regardless of the total memory count.
- This architecture also provides storage cost reduction by not storing large embedding arrays in Firestore.
The application uses a hybrid storage architecture combining Firestore and Vertex AI Vector Search.
users (collection)
└── {sanitized_user_id} (document) - e.g., "user_abc123def"
|
├── profile (map) - Contains the user's metadata
│ ├── username (string) 🔒 ENCRYPTED
│ ├── username_encrypted (boolean)
│ ├── email (string) 🔒 ENCRYPTED
│ ├── email_encrypted (boolean)
│ ├── consent (boolean) - Memory storage permission
│ ├── is_anonymous (boolean)
│ ├── user_instructions (list) 🔒 ENCRYPTED - A list of global instructions from the user.
│ ├── context (string) 🔒 ENCRYPTED - The bot's internal TTM/CBT tracking notes.
│ ├── context_encrypted (boolean) - Flag indicating context encryption status
│ ├── current_stage (string) - The user's TTM stage (e.g., "Stage 1: Relationship Building")
│ ├── created_at (string) - ISO 8601 timestamp
│ └── updated_at (string) - ISO 8601 timestamp (last interaction)
|
└── memories (sub-collection) - Encrypted summaries only
|
└── {memory_id} (document) - e.g., "mem_1758..."
├── user_id (string)
├── summary (string) 🔒 ENCRYPTED
├── summary_encrypted (boolean)
├── metadata (map) - {"topic": "...", "session_id": "..."}
└── created_at (string) - ISO 8601 timestamp
❌ embedding array REMOVED (now in Vector Search)
🔒 = Encrypted at rest using Google Cloud KMS
Vector Search Index: "chatbot-memory-index"
├── Dimensions: 768 (text-embedding-004)
├── Distance Measure: Dot Product
└── Algorithm: Tree-AH (approximate nearest neighbors)
Datapoints:
└── {memory_id} (e.g., "mem_1758...")
├── datapoint_id: mem_1758... (links to Firestore)
├── feature_vector: [0.123, -0.456, ...] (768 dimensions)
└── restricts: [
{
namespace: "user_id",
allow_list: ["user_abc123def"]
}
]
🔐 = User namespace isolation ensures privacy
- Write: Firestore document created → Context encrypted before save → Vector upserted to Vector Search (with namespace)
- Read: Load profile → Decrypt context → Vector Search query → Returns memory IDs → Firestore batch fetch → Decrypt summaries
- Delete: Firestore documents deleted → Vectors removed from Vector Search
- User profile PII (username, email)
- Global user instructions
- Internal bot context (TTM/CBT tracking notes) 🔒 FULLY ENCRYPTED
- Conversation summaries stored in memories
- Any personally identifiable information
- Vector embeddings (stored in Vector Search, not human-readable)
- Timestamps (needed for temporal processing)
- User IDs (already pseudonymized Firebase UIDs)
- Boolean flags (consent, anonymous status, encryption status flags)
current_stage(non-PII, needed for logic)
- Embeddings must be generated from plaintext to ensure accurate similarity matching. The workflow is:
- Generate embedding from plaintext summary
- Encrypt the summary text
- Store encrypted summary in Firestore (without embedding)
- Store vector in Vector Search with user namespace
- Internal therapeutic context (
contextfield) is fully encrypted to protect the bot's clinical notes about the user's therapeutic progress and observations - During retrieval: Load profile → Decrypt context → Query Vector Search → Get memory IDs → Fetch from Firestore → Decrypt summaries
- This ensures security (encrypted storage), privacy (namespace isolation), and performance (fast vector search).
- Python 3.11 installed
- Node.js (LTS version)
- Google Cloud Project with:
- Vertex AI API enabled
- Vertex AI Vector Search (index + endpoint deployed)
- Firestore database (Native mode)
- Firebase Authentication (Google OAuth configured)
- Cloud KMS API (for encryption)
- Service account with roles:
- Vertex AI User (roles/aiplatform.user)
- Cloud Datastore User (roles/datastore.user)
- Firebase Admin SDK Administrator
- Cloud KMS CryptoKey Encrypter/Decrypter
- Service account key (for local development only)
- Firebase project configuration
- KMS key ring and encryption key
- Vector Search Infrastructure:
- Vector Search Index created (768 dimensions, Dot Product similarity)
- Index Endpoint deployed
- Deployed Index ID noted
# Clone/download the project and navigate to it
cd genAI
# Create Python 3.11 virtual environment
python3.11 -m venv venv
# Activate virtual environment
# Windows:
.\venv\Scripts\Activate.ps1
# macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# Set up authentication (local development only)
# Windows:
$env:GOOGLE_APPLICATION_CREDENTIALS=".\service-account-key.json"
# macOS/Linux:
export GOOGLE_APPLICATION_CREDENTIALS="./service-account-key.json"Create a firebase.env file in your project root:
# firebase.env
FIREBASE_API_KEY=your_api_key_here
FIREBASE_AUTH_DOMAIN=your-project.firebaseapp.com
FIREBASE_PROJECT_ID=your-project-id
FIREBASE_STORAGE_BUCKET=your-project.appspot.com
FIREBASE_MESSAGING_SENDER_ID=123456789
FIREBASE_APP_ID=1:123456789:web:abcdef123456# Enable Cloud KMS API
gcloud services enable cloudkms.googleapis.com --project=your-project-id
# Create key ring (one-time setup)
gcloud kms keyrings create chatbot-encryption \
--location=your-region \
--project=your-project-id
# Create encryption key
gcloud kms keys create memory-encryption-key \
--location=your-region \
--keyring=chatbot-encryption \
--purpose=encryption \
--project=your-project-id
# Grant permissions
gcloud kms keys add-iam-policy-binding memory-encryption-key \
--location=your-region \
--keyring=chatbot-encryption \
--member="serviceAccount:your-service-account@your-project.iam.gserviceaccount.com" \
--role="roles/cloudkms.cryptoKeyEncrypterDecrypter" \
--project=your-project-id# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com --project=your-project-id
# Note: Vector Search Index and Endpoint must be created via Cloud Console
# See "Vector Search Configuration" section belowSet these environment variables for your application:
# Required
export GOOGLE_CLOUD_PROJECT="your-project-id"
export REGION="your-region"
# Vector Search Configuration (get these from Cloud Console)
export VECTOR_SEARCH_ENDPOINT_ID="projects/YOUR_PROJECT_NUMBER/locations/YOUR_REGION/indexEndpoints/YOUR_ENDPOINT_ID"
export DEPLOYED_INDEX_ID="your-deployed-index-id"
export VECTOR_SEARCH_INDEX_ID="your-index-id"
# Optional
export LLM_MODEL="gemini-2.5-flash"
export EMBEDDING_MODEL="text-embedding-004"# Terminal 1: Start backend
python main.py
# Terminal 2: Start frontend
cd genai-frontend
npm install
npm run dev
# Open browser to http://localhost:5173-
Create Vector Search Index (via Cloud Console):
- Go to: Vertex AI → Vector Search → Indexes
- Click "Create Index"
- Display name: chatbot-memory-index
- Region: your-region
- Dimensions: 768
⚠️ CRITICAL (must match text-embedding-004) - Distance measure: Dot Product
- Update method: Streaming updates
- Algorithm: Tree-AH
-
Create Index Endpoint (via Cloud Console):
- Go to: Vertex AI → Vector Search → Index Endpoints
- Click "Create Endpoint"
- Display name: chatbot-memory-endpoint
- Region: your-region
-
Deploy Index to Endpoint:
- Select your endpoint
- Click "Deploy Index"
- Select your index
- Deployed index ID: serena_memory_deployed (or your choice)
- Machine type: e2-standard-2
- Min replicas: 1
- Max replicas: 2
-
Note Your Configuration:
- After deployment, note these values for your environment variables:
- VECTOR_SEARCH_ENDPOINT_ID: Full endpoint resource name
- DEPLOYED_INDEX_ID: The deployed index ID you chose
- VECTOR_SEARCH_INDEX_ID: Your index ID
- After deployment, note these values for your environment variables:
genAI/
├── main.py # Flask backend with TTM/CBT logic & Vector Search
├── encryption.py # KMS encryption/decryption service
├── requirements.txt # Python dependencies
├── service-account-key.json # Service account key (local dev only)
├── firebase.env # Firebase configuration
├── venv/ # Python virtual environment
└── genai-frontend/ # React web interface
├── package.json
├── vite.config.js
└── src/
├── App.jsx
├── pages/
│ ├── Onboarding.jsx
│ ├── Chat.jsx
│ ├── Users.jsx
│ └── Settings.jsx
└── lib/
├── api.js
└── storage.js
All endpoints require Firebase ID token authentication (except /health):
GET /health- Service health check
POST /login- Verify token and create/retrieve encrypted user profilePOST /dialogflow-webhook- Main chat endpoint with TTM logic and Vector SearchPOST /consent- User consent managementPOST /delete_memories- Delete from both Firestore and Vector SearchPOST /reset_instructions- Clears custom instructions and resets TTM stage
-
POST /dialogflow-webhook- Core Conversation Flow:- Receives user message
- Loads and decrypts user profile (including internal
context) - Retrieves
current_stageand memories (via Vector Search + Firestore) - Performs Call 1 (Main Prompt) to get the empathetic
reply_text,new_stage, andupdated_context - Encrypts and saves the
updated_contextandnew_stageto the user's profile - Performs Call 2 (Analysis Prompt) to determine significance, summary, and instructions
- If significant: encrypts summary, stores in Firestore, upserts to Vector Search
-
POST /delete_memories- Complete Data Removal:- Retrieves all memory IDs from Firestore
- Deletes vectors from Vector Search (by datapoint IDs)
- Batch-deletes documents from Firestore
- Returns deletion count
-
POST /reset_instructions- Clear Custom Instructions:- Clears the
user_instructionsarray in the user's profile - Resets the
current_stageto 'Stage 1: Relationship Building' - Clears the encrypted
contextfield (internal therapeutic notes)
- Clears the
- Start backend:
python main.py - Navigate to genai-frontend/
- Run
npm installthennpm run dev - Open http://localhost:5173
- Sign in with Google or continue as guest
- Complete consent and start chatting
- Experience fast, context-aware responses powered by Vector Search
- Chat naturally with Serena
- The conversation is guided by a clinical framework (TTM)
- Significant conversations are automatically remembered
- Previous context seamlessly integrated into responses
- Time-aware greetings ("Welcome back, it's been 3 days...")
- Instant responses regardless of conversation history length
- Every message triggers profile load with context decryption
- Vector Search query executed (50-200ms)
- Top 3 most similar memories retrieved
- User's
current_stageand decryptedcontextfed into TTM/CBT prompt (Call 1) - Response (
reply_text+new_stage+updated_context) generated and parsed - Updated context encrypted and saved to profile
- Separate analysis (Call 2) runs to check for significance
- New significant exchanges saved to Firestore + Vector Search
The web interface works locally with:
- Flask development server
- Vector Search integration (if configured)
- Full encryption capabilities (if KMS configured)
- Automatic failover to Firestore if Vector Search unavailable
# Build container
gcloud builds submit --tag gcr.io/YOUR_PROJECT_ID/genai-chatbot
# Deploy with generic placeholders
gcloud run deploy genai-chatbot \
--image gcr.io/YOUR_PROJECT_ID/genai-chatbot \
--region=YOUR_REGION \
--allow-unauthenticated \
--platform managed \
--service-account=YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com \
--set-env-vars GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID,\
REGION=YOUR_REGION,\
VECTOR_SEARCH_ENDPOINT_ID=projects/YOUR_PROJECT_NUMBER/locations/YOUR_REGION/indexEndpoints/YOUR_ENDPOINT_ID,\
DEPLOYED_INDEX_ID=your-deployed-index-id,\
VECTOR_SEARCH_INDEX_ID=your-index-id
# Deploy front end
cd genai-frontend
npm run build
firebase deploy --only hostingRequired environment variables in Cloud Run:
- GOOGLE_CLOUD_PROJECT: Your GCP project ID
- REGION: Your region (e.g., asia-south1)
- VECTOR_SEARCH_ENDPOINT_ID: Full endpoint resource name
- DEPLOYED_INDEX_ID: Your deployed index ID
- VECTOR_SEARCH_INDEX_ID: Your index ID
- LLM_MODEL: gemini-2.5-flash (optional, has default)
- EMBEDDING_MODEL: text-embedding-004 (optional, has default)
-
Encryption Standard:
- Algorithm: AES-256-GCM (Google Cloud KMS)
- Key Management: Google Cloud Key Management Service
- Key Storage: Hardware Security Modules (HSMs)
- Key Rotation: Automatic via KMS
-
Data Classification:
- Encrypted at Rest:
- User profiles: PII (username, email)
- User instructions (global preferences)
- Internal therapeutic context (bot's TTM/CBT clinical notes) 🔒
- Conversation summaries (stored in memories subcollection)
- Not Encrypted:
- Vector embeddings (mathematical representations, not human-readable, stored in Vector Search)
- Timestamps (needed for temporal processing)
- User IDs (already pseudonymized Firebase UIDs)
- Boolean flags (consent, anonymous status, encryption flags)
current_stage(non-PII, needed for stage logic)
- Encrypted in Transit: All API calls use HTTPS/TLS 1.3
- Never Stored: Raw chat messages (only curated summaries)
- Encrypted at Rest:
-
Privacy Isolation:
- Each user's vectors tagged with unique namespace
- Query filters prevent cross-user data access
- Double-protected: Firestore subcollections + Vector Search namespaces
- Impossible for User A to access User B's memories or context
- HIPAA-ready architecture (requires BAA with Google Cloud)
- GDPR-compliant data handling
- Right to erasure via /delete_memories endpoint
- Explicit consent management
- Audit logging via Cloud Console
- Never commit service-account-key.json to version control
- Use environment variables for all sensitive configuration
- Enable audit logging for KMS and Vector Search operations
- Regular key rotation via Cloud KMS
- Monitor Vector Search performance metrics
- Backup Firestore before major changes
MIT License - Ensure compliance with privacy regulations when handling user data. This application handles encrypted personal mental health conversations with high-performance vector search and requires appropriate privacy safeguards, security audits, and compliance verification in production environments.
Security Note: While this implementation provides strong encryption at rest (including full encryption of internal therapeutic context) and complete user isolation via namespaces, full HIPAA compliance requires additional measures including Business Associate Agreements (BAA) with Google Cloud, comprehensive audit logging, access controls, and regular security assessments. Consult with legal and security professionals before handling Protected Health Information (PHI).