Use a dataset of Airbnb listings with associated descriptions and geospatial metadata (longitude/ latitude). Combine spatial filtering (find properties in a specific area) with semantic search (e.g., "garden", "3 bedrooms") using OpenAI embeddings and DocumentDB's native vector search.
Dataset link: https://insideairbnb.com/get-the-data/
- Vector Search: DocumentDB's native
cosmosSearchwith IVF (Inverted File Index) for efficient similarity search - Geospatial Queries: Find properties within a radius using MongoDB-compatible 2dsphere indexes
- Semantic Search: OpenAI embeddings for natural language understanding
- Hybrid Search: Combine vector similarity with filters (amenities, location, price)
- Click the badge above or create a new Codespace from this repository
- Set your OpenAI API key (choose one method):
- Codespaces Secret (recommended):
- Go to GitHub Settings → Codespaces
- Add secret:
OPENAI_API_KEY=your-key-here
- Or edit
.envfile in the Codespace
- Codespaces Secret (recommended):
- Wait for the devcontainer to build (DocumentDB will start automatically)
- Open
contoso-booking.ipynband run cells to load data - Start the backend:
cd src/api && uvicorn main:app --reload - Start the frontend:
cd src/frontend && npm start
The Codespace includes:
- ✅ DocumentDB running on port 10260
- ✅ Python 3.11 + Node.js 18
- ✅ All dependencies pre-installed
- ✅ DocumentDB VS Code extension
- ✅ Ports auto-forwarded (3000, 8000, 10260)
The easiest way to run the complete application locally:
# Quick start (builds and runs everything)
./scripts/quickstart.sh
# Or use make commands
make up # Start all services
make down # Stop all services
make logs # View logsThis will start:
- Frontend: http://localhost:3000 (React app)
- Backend API: http://localhost:8000/docs (FastAPI with Swagger UI)
- DocumentDB: mongodb://admin:password123@localhost:10260
See make help for all available commands.
- Python 3.8+
- Node.js 16+
- Docker (for DocumentDB)
- OpenAI API Key from https://platform.openai.com/
Pull and run the DocumentDB Docker image:
# Pull the latest DocumentDB image
docker pull ghcr.io/documentdb/documentdb/documentdb-local:latest
# Tag for convenience
docker tag ghcr.io/documentdb/documentdb/documentdb-local:latest documentdb
# Run the container
docker run -dt -p 10260:10260 --name documentdb-container documentdb \
--username admin --password password123Note: Replace admin and password123 with your desired credentials. You must set these when creating the container for authentication to work.
Copy the .env.example file and rename it to .env:
cp .env.example .envUpdate the .env file with your values:
DOCUMENTDB_CONNECTION_STRING=mongodb://admin:password123@localhost:10260/?tls=true&tlsAllowInvalidCertificates=true&authMechanism=SCRAM-SHA-256
OPENAI_API_KEY=your-openai-api-key-here
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_CHAT_MODEL=gpt-3.5-turboNote on Connection String Hostnames:
localhost: Use when running in Codespaces/DevContainer (sincenetwork_mode: service:documentdbshares the network)host.docker.internal: Use when running in a separate Docker container that needs to reach DocumentDB on the hostlocalhost: Also works when running the app directly on your machine (not in a container)
Open and run the contoso-booking.ipynb notebook to:
- Connect to DocumentDB
- Create vector search index using
cosmosSearch(IVF algorithm) - Create geospatial and amenity indexes
- Load the Airbnb listing data
- Generate OpenAI embeddings for each listing
The notebook will create:
- Vector Index:
vector-ivfwith cosine similarity for semantic search - Geospatial Index:
2dspherefor location-based queries - Amenity Index: For fast filtering by amenities
cd src/api && pip install -r ../../requirements.txt
cd ../frontend && npm installTerminal 1 (Backend):
cd src/api && uvicorn main:app --reloadTerminal 2 (Frontend):
cd src/frontend && npm run startThe application will be available at http://localhost:3000
- Backend: FastAPI with OpenAI for embeddings and chat completions
- Database: DocumentDB with native vector search support
- Vector Search:
cosmosSearchoperator with IVF indexing - Geospatial: MongoDB 2dsphere indexes
- Filters: Compound queries combining vector similarity, location, and amenities
- Vector Search:
- Frontend: React with Leaflet/OpenStreetMap integration
- Search Flow:
- User query → OpenAI embedding generation
- DocumentDB
cosmosSearchfinds similar listings - Filters applied (location radius, amenities)
- Results ranked by similarity score
DocumentDB supports MongoDB-compatible vector search through the cosmosSearch operator:
- Index Types:
vector-ivf: Inverted File Index (used in this project)vector-hnsw: Hierarchical Navigable Small World
- Similarity Metrics: Cosine (COS), L2 distance, Inner Product (IP)
- Dimensions: Supports up to 2000+ dimensions
- Performance: Optimized for large-scale vector similarity search