A production-ready Retrieval-Augmented Generation (RAG) server that uses Vertex AI Search and the Discovery Engine API to serve the Answer method - a conversational search experience with generative answers grounded on document data.
- 🤖 Fully-managed RAG pipeline: Stateful multi-turn conversational search with generative answers
- ⚡ Production-capable performance: Autoscaling, concurrency, and regional redundancy via multiple Cloud Run services
- 🔗 Integration flexibility: Authenticated external HTTPS endpoint using Google-managed TLS
- 📊 Resource observability: Single-pane-of-glass Cloud Monitoring dashboard with customizable metrics and alerts
- 🔍 Explainable and debuggable results: Investigate generative answers using the full RAG pipeline results logged to BigQuery
- 📈 Data-driven LLM-ops: Tune the conversational search agent using question/answer pairs labelled with user feedback
- 🛡️ Identity-Aware Proxy: Secure access control
- 👤 Google OAuth: Personalized sessions with user authentication
- 🚀 Automated deployments: One-click install and uninstall with Terraform and Cloud Build
- The Global External Application Load Balancer provides planet-scale availability
- The load balancer backend service interfaces with regional serverless network endpoint group backends composed of Cloud Run services
- Zonal failover: Cloud Run services replicate across multiple zones within a Compute region to prevent outages for a single zonal failure
- Autoscaling: add/remove instances to match demand and maintain a minimum instance count for high availability
- Concurrency: instances process multiple requests simultaneously
- Regional redundancy: services can span multiple regions to optimize latency and optionally deliver higher availability in case of regional outages.
- The Vertex AI Search Search App and Data Store automate document preparation for semantic search and retrieval
- The Conversational Search Service (the interface for the Answer method) uses Gemini-based answer generation models to power grounded generative answers
- The application asynchronously writes the full session data and user feedback responses to BigQuery for offline analysis
- Google Cloud Project with Owner permissions
- Terraform and
gcloudCLI installed
See detailed deployment prerequisites →
-
Configure OAuth for user authentication
📖 Complete OAuth setup guide → -
Deploy the application
source scripts/install.sh -
Enable Vertex AI Agent Builder in the Cloud Console and import your documents
-
Configure Identity-Aware Proxy to secure access
📖 View complete installation guide →
- ✅ Prerequisites - Environment setup
- 🔐 OAuth Setup Guide - Step-by-step OAuth client configuration
- 📋 Deployment - Deployment and Post-deployment steps
- 🧪 Development Guide - Local development, testing, and Docker usage
- 📖 API Reference - Answer method configuration options
- 🏷️ Version Management - Automated semantic release and versioning
- 🏗️ Terraform Overview - General Terraform patterns and best practices (reusable)
- 🚀 Bootstrap Process - Initial project setup and service accounts
- ☁️ Cloud Build Automation - Automated deployments and CI/CD
- 🔄 Rollbacks - Rolling back deployments and managing revisions
- ⚙️ Infrastructure Changes - Applying infrastructure-only changes
- 🛠️ Helper Scripts - Automation scripts reference
- ❗ Known Issues - Common problems and solutions
Remove all resources:
source scripts/uninstall.shThis project uses:
- Python 3.13+ with Poetry for dependency management
- FastAPI backend with an example Streamlit frontend
- Terraform for infrastructure as code
- pytest for testing
answer-app/
├── src/
│ ├── answer_app/ # FastAPI backend service
│ ├── client/ # Streamlit frontend application
│ └── package_scripts/ # Helper scripts (OAuth secrets)
├── terraform/
│ ├── bootstrap/ # Initial project setup
│ ├── main/ # Main infrastructure deployment
│ └── modules/ # Reusable Terraform modules
├── docs/ # Modular documentation
│ ├── installation/ # Setup guides
│ ├── infrastructure/ # Infrastructure documentation
│ ├── development/ # Development & API docs
│ └── troubleshooting/ # Known issues & solutions
├── scripts/ # Automation scripts
├── tests/ # Unit tests
└── assets/ # Documentation screenshots
This project is licensed under the MIT License - see the LICENSE file for details.
