PastPortals is an intelligent, AI-powered museum guide system developed as a response to limitations in traditional and existing digital museum information systems. The platform integrates Correction + Retrieval-Augmented Generation (CRAG), natural language processing, multimodal interaction, vector-based retrieval, voice-first conversational AI, and continuous self-improving feedback loops to deliver accurate, context-aware, and engaging cultural heritage experiences.
Traditional museum systems rely on static methods—printed labels, brochures, audio guides, and system suffers from:-
- Hallucination: Generation of inaccurate or unsupported information
- Lack of domain grounding: Insufficient knowledge of historical and cultural contexts
- No transparency: Inability to verify information sources
- Limited multimodal support: Restricted to single interaction modes
- Poor scalability: Inefficient handling of simultaneous visitors
PastPortals implements Correction + Retrieval-Augmented Generation (CRAG) to bridge this gap by:
- Retrieving verified information from curated knowledge bases
- Validating and correcting generated content through fact-checking mechanisms
- Supporting multimodal interaction (text, voice, image, video)
- Enabling voice-first conversational AI for hands-free cultural exploration
- Implementing intelligent feedback loops that refine system behavior with each user interaction
- Enabling multilingual communication across 18+ languages
- Ensuring scalability and continuous improvement for high-traffic museum environments
┌─────────────────────────────────────────────────────────────────────┐
│ Frontend Application Layer │
│ React 18 | Document Upload | Voice Interface │
└──────────────────────────────┬──────────────────────────────────────┘
│ HTTP/REST API
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Content Processing Layer │
│ Document Extraction | OCR | Video Analysis | Voice │
│ (PyMuPDF | python-docx | Tesseract | OpenCV) │
└──────────────────────────────┬──────────────────────────────────────┘
│ Extracted Content + Metadata
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Retrieval & Ranking Layer │
│ FAISS Vector Search | Wikipedia API │
│ Historical Content Classification │
└──────────────────────────────┬──────────────────────────────────────┘
│ Contextually Relevant Information
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Generation & Response Layer │
│ Google Gemini 2.5 Flash | Fallback Enrichment │
│ Fact Validation | Response Synthesis │
└──────────────────────────────┬──────────────────────────────────────┘
│ Generated Response with Metadata
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Response Delivery Layer │
│ Markdown Rendering | Audio Output | Related Topics │
└─────────────────────────────────────────────────────────────────────┘
| Component | Technology | Purpose |
|---|---|---|
| UI Framework | React 18.2 | Component-based interface with virtual DOM rendering |
| Routing | React Router 6 | Client-side navigation and state management |
| Animations | Framer Motion | Smooth transitions and interactive UI elements |
| Icons | Lucide React | Comprehensive, accessible icon system |
| HTTP Client | Axios | RESTful API communication with request/response interceptors |
| Testing | Jest + React Testing Library | 40+ component tests with coverage reporting |
| Component | Technology | Purpose |
|---|---|---|
| API Framework | FastAPI | High-performance async REST API framework |
| Language | Python 3.13 | Primary backend language with modern features |
| Async Processing | asyncio + uvicorn | Non-blocking concurrent request handling |
| Testing | pytest | 50+ unit tests with comprehensive coverage |
| API Documentation | Pydantic + Swagger | Auto-generated interactive API documentation |
| Component | Technology | Purpose |
|---|---|---|
| PDF Extraction | PyMuPDF (fitz) | High-fidelity text and metadata extraction |
| Word Documents | python-docx | Structured parsing of DOCX format |
| Optical Character Recognition | pytesseract + Tesseract | Text extraction from images and scanned documents |
| Video Analysis | OpenCV (cv2) | Frame sampling and temporal processing (8 frames/video) |
| Voice Processing | Web Speech API | Real-time speech-to-text transcription |
| Component | Technology | Purpose |
|---|---|---|
| LLM Generation | Google Gemini 2.5 Flash | Advanced language generation with low latency |
| Retrieval-Augmented | CRAG (Correction Module) | Fact validation and hallucination correction |
| Vector Similarity | FAISS | Fast approximate nearest neighbor search |
| Sentence Embeddings | Sentence Transformers | Dense vector representation of content |
| Domain Classification | Historical Keyword Analysis | Context-aware content categorization |
| Component | Technology | Purpose |
|---|---|---|
| Speech-to-Text | Google Cloud Speech-to-Text / Web Speech API | Multilingual voice input processing |
| Natural Language Understanding | LLM + RAG Pipeline | Intent extraction and query comprehension |
| Text-to-Speech | Google Cloud Text-to-Speech | Natural-sounding response delivery |
| Voice Assistant Framework | Custom voice conversation bot | Context-aware dialogue management |
| Real-time Streaming | WebSocket support | Continuous voice interaction without latency |
| Component | Technology | Purpose |
|---|---|---|
| Vector Database | FAISS with in-memory indexing | Millisecond-level similarity search |
| Knowledge Bases | Wikipedia API + Smithsonian Open Access | Curated historical content retrieval |
| Domain Datasets | Custom museum collections | Institution-specific artifact metadata |
| Feedback Storage | JSON + structured logs | User interaction tracking for improvement |
| Cache Layer | Redis (optional) | Response caching and session management |
| Component | Technology | Purpose |
|---|---|---|
| User Interaction Tracking | Event logging pipeline | Capture queries, dwell time, user ratings |
| Feedback Collection | Implicit + explicit signals | Track relevance, accuracy, and satisfaction |
| Vector Similarity Refinement | Weight adjustment algorithms | Dynamically tune ranking for domain-specific queries |
| Model Adaptation | Online learning mechanisms | Continuous improvement of retrieval quality |
| Performance Monitoring | Metrics & analytics dashboard | Track system improvement across sessions |
The development of PastPortals targets the following key objectives:
- Accuracy & Reliability: Ground all responses in trusted, curated datasets with fact-checking mechanisms to reduce hallucination and enhance credibility
- Voice-First Interaction: Enable seamless voice-based conversational interfaces for hands-free cultural exploration with natural language understanding
- Continuous Self-Improvement: Implement intelligent feedback loops that refine retrieval ranking, response quality, and domain understanding from every user interaction
- Multilingual Support: Provide accessibility across 18+ languages for diverse visitor populations with cultural context preservation
- Multimodal Delivery: Process and respond to diverse input modalities (text, voice, image, video) while delivering content in preferred formats
- Scalability & Performance: Handle multiple concurrent users without degradation using FastAPI async architecture
- Accessibility & Cultural Sensitivity: Maintain authenticity in heritage interpretation while supporting diverse learning styles and accessibility requirements
| Feature | Implementation |
|---|---|
| Document Processing | PDF, DOCX, TXT, MD, JSON, CSV, HTML extraction |
| Image Recognition | Tesseract-based OCR for photographic content |
| Video Analysis | Frame sampling with temporal OCR processing |
| Voice Interaction | WebRTC recording + transcription pipeline |
| Unified API | Single endpoint supporting all input modalities |
| Progress Tracking | Real-time upload status visualization (0-100%) |
| Content Validation | Format and size limit enforcement with user feedback |
| Fallback Responses | Wikipedia-enriched responses for API unavailability |
| Comprehensive Testing | 50+ backend + 40+ frontend unit tests |
| Museum Integration | Curated museum data and virtual tour content |
| Category | Maximum Size | Supported Formats |
|---|---|---|
| Documents | 50 MB | PDF, DOCX, TXT, MD, CSV, JSON, HTML, HTM |
| Images | 25 MB | PNG, JPG, JPEG, WEBP, BMP, TIFF, TIF |
| Video | 500 MB | MP4, MOV, AVI, MKV, WEBM, M4V |
| Voice | N/A | Real-time recording via WebRTC |
Every user interaction represents an opportunity for system learning. PastPortals v2 incorporates a sophisticated feedback pipeline that continuously refines retrieval accuracy, response relevance, and domain understanding.
Stage 1: User Feedback Captured
- Explicit ratings and implicit signals (re-queries, dwell time) logged per interaction
- Domain context stored with each query-response pair
- User satisfaction metrics tracked across museum exhibition types
Stage 2: Ranking Model Updated
- Feedback dynamically adjusts vector similarity weights
- Domain classifier confidence thresholds refined based on user validation
- Historical accuracy data incorporated into retrieval ranking
Stage 3: System Evolution
- Pipeline gets measurably smarter with each user session
- Adaptive behavior emerges from aggregated feedback signals
- Cultural context understanding deepens through continuous learning
- Adaptive Responses: Museum guides learn visitor preferences and knowledge levels
- Domain Refinement: Historical accuracy improves through expert feedback integration
- Personalization: Interaction quality increases for returning visitors
- Continuous Validation: User corrections automatically retrain ranking models
PastPortals v2 delivers a seamless, hands-free cultural exploration experience through intelligent voice-first conversational AI.
A museum guide that listens, reasons, and speaks back in real time, turning every artifact into a conversation instead of a static label.
- Hands-free discovery: Ask questions naturally and get spoken answers without typing or navigating menus.
- Grounded responses: Every reply is filtered through CRAG so the assistant stays accurate, contextual, and museum-ready.
- Multilingual conversations: Visitors can interact in 18+ languages, making the experience accessible and global.
| Feature | Technology | Implementation |
|---|---|---|
| Speech-to-Text Input | Google Cloud Speech-to-Text / Web Speech API | Converts user voice into text queries in real-time |
| AI Understanding | LLM + RAG + CRAG Pipeline | Processes natural language intent with cultural context |
| Text-to-Speech Output | Google Cloud Text-to-Speech | Delivers responses as natural, human-like voice |
| Real-Time Interaction | WebSocket streaming protocol | Instant conversational feedback without latency |
| Context-Aware Dialogue | Domain-aware conversation state | Adapts responses based on museum location and artifact |
| Multilingual Support | 18+ language voice processing | Bilingual interactions for international visitors |
- Voice Input: Web Speech API + Whisper transcription
- Voice Processing: TensorFlow Lite for on-device optimization
- Response Generation: Gemini 2.5 Flash with domain context
- Voice Output: Google Cloud TTS with natural prosody
- Conversation Management: State machine for dialogue flow
PastPortals v2 represents a complete data journey, from diverse user inputs to intelligent, verified outputs, constantly refining itself through feedback.
- User Input Acquisition → Text, Voice, Image, or Video submission
- Multimodal Processing → Speech-to-Text, OCR, Frame Extraction, Document Parsing
- Domain Classification → Historical/cultural context detection
- Vector Retrieval → FAISS semantic search of curated knowledge bases
- LLM Generation → Google Gemini 2.5 Flash response synthesis
- Fact Validation → CRAG correction module validates accuracy
- Output Delivery → Markdown-formatted response + voice synthesis
- Feedback Collection → User interaction logged for continuous improvement
- System Refinement → Ranking and understanding models updated
- Node.js: v16 or higher
- Python: v3.10 or higher
- Tesseract OCR: System-level installation required
- Virtual Environment: Python venv or equivalent
# Activate virtual environment
& .venv\Scripts\Activate.ps1
# Install backend dependencies
pip install -r backend/requirements.txt
# Install frontend dependencies
cd frontend
npm install
# Configure environment variables
# Root .env file:
# GEMINI_API_KEY=your_api_key
# CORS_ORIGINS=http://localhost:3001
# frontend/.env file:
# PORT=3001
# REACT_APP_API_URL=http://localhost:5000Terminal 1 - Backend Server:
cd backend
# FastAPI server (async support for concurrent requests)
uvicorn app:app --reload --port 5000
# Or using Python directly (if configured)
python app.py
# Server runs on http://localhost:5000 with auto-generated docs at http://localhost:5000/docsTerminal 2 - Frontend Application:
cd frontend
npm start
# Application accessible at http://localhost:3001Navigate to http://localhost:3001/multimodal to access the multimodal input interface.
# Activate the project virtualenv first, then execute all backend tests
./.venv/bin/python -m pytest -q backend
# Generate coverage report
./.venv/bin/python -m pytest -q backend --cov=backend.utils --cov=backend.routes --cov-report=html
# Test specific modules
./.venv/bin/python -m pytest -q backend/tests/test_multimodal_utils.py
./.venv/bin/python -m pytest -q backend/tests/test_multimodal_routes.pyIf you already have the virtualenv activated, pytest -q backend also works from the repository root.
Test Coverage:
test_multimodal_utils.py: 35+ tests (content extraction, OCR validation, response generation)test_multimodal_routes.py: 15+ tests (API endpoint validation, error handling)- Aggregate Coverage: 90%+ of core functionality
cd frontend
npm test # Execute all component tests
npm test -- --coverage # Generate coverage report
npm test MultimodalPanel # Test specific componentTest Coverage:
MultimodalPanel.test.jsx: 40+ tests (file validation, upload workflow, results display)- Framework: Jest + React Testing Library
Endpoint: POST /api/multimodal/analyze
Request Format:
Content-Type: multipart/form-data
Parameters:
- file (optional): File object (document/image/video)
- question (required): User query string
- mode (required): Input modality (document|image|video|voice)
Response Schema:
{
"success": boolean,
"mode": "document|image|video|voice",
"method": "text-file|pdf|docx|ocr-image|ocr-video|generic-text",
"extracted_text": "Full text extracted from input",
"response": "Generated or fallback response (900-1100 words)",
"metadata": {
"filename": "original_filename.ext",
"extension": ".pdf|.jpg|.mp4|...",
"size_bytes": number,
"processing_method": "extraction_method_used"
},
"notes": ["Processing note 1", "Processing note 2"],
"related_topics": [
{
"title": "Topic Title",
"extract": "Brief description from Wikipedia"
}
],
"fallback": false
}This repository includes comprehensive technical documentation:
Detailed technical documentation covering:
- CRAG architecture and the 4-stage retrieval, generation, validation, and correction flow
- Multimodal integration points and response structure
- Validation metrics, configuration defaults, and troubleshooting guidance
- Practical testing notes, including the existing multimodal test files and recommended CRAG cases
Intended Audience: Developers, code maintainers, technical architects, and QA engineers
- 3D Artifact Visualization: Interactive 3D models of museum pieces with voice guidance
- Mobile Voice Assistant: Dedicated mobile app with voice-first experience
- Knowledge Graph Integration: Semantic relationship mapping for cultural artifacts
- Performance Optimization: Latency reduction to <500ms for voice interactions
- Collaborative Annotation: Visitor annotations that improve cultural understanding
| Metric | Value |
|---|---|
| Total Lines of Code | 8,500+ |
| Test Coverage | 90%+ |
| API Endpoints | 15+ |
| Supported File Types | 20+ |
| Supported Languages | 18+ (planned) |
| Museum Partnerships | 6 institutions |
| Development Duration | 3+ months |
- Content: Wikipedia Foundation (en.wikipedia.org)
- Historical Images: Wikimedia Commons
- Museum Data: Smithsonian Open Access, Louvre API, British Museum Collections
- AI Generation: Google Generative AI (Gemini 2.5 Flash)
- OCR Engine: Tesseract Open Source OCR
- Vector Search: Facebook FAISS
- Video Processing: OpenCV Foundation
Please submit issues via GitHub Issues with:
- Detailed description and reproduction steps
- Environment specifications (OS, Python version, Node version)
- Error logs and stack traces
- Screenshots or relevant attachments
- Create feature branch:
git checkout -b feature/feature-name - Implement changes and execute tests locally
- Commit with descriptive messages following conventional commits
- Push to remote and create pull request
- Submit for code review and CI/CD validation
{
"success": true,
"mode": "document",
"method": "pdf_extraction",
"extracted_text": "The Roman Empire was one of the most influential civilizations in human history, spanning over 500 years...",
"response": "The Roman Empire, originating from the Italian peninsula, became a dominant force that transformed Western civilization. From 27 BCE to 476 CE, Rome developed sophisticated administrative systems, advanced architectural techniques, and influential legal frameworks. Key achievements include the construction of infrastructure such as aqueducts, roads, and amphitheaters, alongside the development of Latin as a universal language. The Roman military was renowned for its organization and effectiveness, while Roman law established principles that continue to influence modern legal systems.",
"metadata": {
"filename": "roman_history.pdf",
"extension": ".pdf",
"size_bytes": 2048576,
"processing_method": "pdf_extraction"
},
"notes": [
"PDF extracted successfully with 8 keywords identified",
"Content grounded in Wikipedia historical data"
],
"related_topics": [
{
"title": "Roman Republic",
"extract": "The Roman Republic was the period of Roman history when the state operated as a republic..."
},
{
"title": "Julius Caesar",
"extract": "Gaius Julius Caesar was a Roman military general and statesman who played a critical role..."
}
],
"fallback": false
}Search Results with Historical Context

This project is distributed under the MIT License for educational and research purposes.
Data Attribution:
- Historical Content: Wikipedia Foundation
- Imagery: Wikimedia Commons (Creative Commons License)
- Museum Information: Official institutional APIs
- AI Capabilities: Google Gemini API
- OCR Technology: Tesseract OCR Project
- Yash Kumar Kalirawan
Artificial Intelligence · Museums · Conversational Agents · Retrieval-Augmented Generation · Multimodal AI · Visitor Engagement · Cultural Heritage · Natural Language Processing · Vector Databases · OCR Technology
PastPortals v2 — Advancing Cultural Heritage Interpretation Through Intelligent Technology
Contributions are welcome!
- Fork the repository.
- Create a feature branch.
- Commit your changes.
- Push to the branch.
- Open a Pull Request.
⭐ If you like my work, drop a ⭐ and let's connect!


