🎓 WUEB Document Assistant

A sophisticated semantic search chatbot for Wroclaw University of Economics and Business (WUEB) governing documents. This system uses OpenAI embeddings and GPT models to provide accurate, non-hallucinated answers based on university policies and procedures.

✨ Features

🔍 Semantic Search: Uses OpenAI embeddings for understanding document meaning
🛡️ Anti-Hallucination: Only uses information from provided documents
🇵🇱 Polish Language Support: Handles Polish text extraction and processing
📊 Confidence Scoring: Shows confidence levels for transparency
🗄️ Vector Database: ChromaDB for efficient similarity search
🌐 Beautiful UI: Streamlit interface with chat experience
📚 Document Management: Easy loading and reloading of PDFs

🏗️ System Architecture

📄 PDF Documents (Polish)
    ↓
🔍 Text Extraction & Cleaning
    ↓
✂️ Semantic Chunking (1000 chars, 200 overlap)
    ↓
🧠 OpenAI Embeddings (text-embedding-ada-002)
    ↓
🗄️ ChromaDB Vector Database
    ↓
❓ User Query → Embedding → Similarity Search
    ↓
📋 Context Retrieval (Top 5 results)
    ↓
🤖 GPT-3.5-turbo Response Generation
    ↓
✅ Accurate Answer with Confidence Score

🚀 Quick Start

Prerequisites

Python 3.8 or higher
OpenAI API key
WUEB PDF documents

Installation

Clone the repository

git clone https://github.com/harshitha/arch/wueb-chatbot.git
cd wueb-chatbot

Install dependencies
```
pip install -r requirements.txt
```

Set up environment

cp env_example.txt .env
# Edit .env file and add your OpenAI API key

Add your PDF documents

mkdir pdfs
# Copy your WUEB PDF documents to the pdfs/ directory

Run the application

python quick_start.py
# or
streamlit run app.py

📁 Project Structure

wueb-chatbot/
├── 📄 app.py                 # Main Streamlit application
├── 📄 chatbot.py            # Core chatbot logic
├── 📄 config.py             # Configuration settings
├── 📄 data_loader.py        # PDF processing pipeline
├── 📄 pdf_processor.py      # Text extraction & chunking
├── 📄 vector_store.py       # Vector database operations
├── 🧪 test_chatbot.py       # System testing
├── 🚀 quick_start.py        # Automated setup
├── 📋 requirements.txt       # Python dependencies
├── 📖 README.md             # Project documentation
├── 📝 USAGE_GUIDE.md        # Detailed usage guide
├── ⚙️ setup.py              # Package installation
├── 🚫 .gitignore            # Git ignore rules
├── 📝 env_example.txt       # Environment template
├── 📁 pdfs/                 # PDF documents directory
└── 📁 vector_db/            # ChromaDB vector database

🎯 Usage Examples

Basic Questions

Q: "What are the admission requirements?"
Q: "How do I apply for a program?"
Q: "What are the tuition fees?"

Advanced Queries

Q: "Tell me about the university structure and governance"
Q: "What are the academic calendar dates for 2024?"
Q: "Explain the student rights and responsibilities"

Polish Queries

Q: "Jakie są wymagania rekrutacyjne?"
Q: "Ile kosztuje czesne?"
Q: "Jakie są prawa studentów?"

⚙️ Configuration

Key settings can be modified in config.py:

# Chunking Settings
CHUNK_SIZE = 1000          # Characters per chunk
CHUNK_OVERLAP = 200        # Overlap between chunks

# Search Settings  
TOP_K_RESULTS = 5          # Number of similar documents
SIMILARITY_THRESHOLD = 0.7  # Minimum similarity score

# Model Settings
OPENAI_MODEL = "gpt-3.5-turbo"
EMBEDDING_MODEL = "text-embedding-ada-002"
MAX_TOKENS = 1000
TEMPERATURE = 0.1          # Low for factual responses

🧪 Testing

Run the comprehensive test suite:

python test_chatbot.py

This tests:

✅ System initialization
✅ PDF directory validation
✅ Document loading
✅ Chatbot queries
✅ System information

🔧 API Usage

Use the chatbot programmatically:

from chatbot import WUEBChatbot
from data_loader import DataLoader

# Initialize
chatbot = WUEBChatbot()
data_loader = DataLoader()

# Load documents
data_loader.load_documents()

# Ask questions
result = chatbot.process_query("What are the admission requirements?")
print(result['response'])
print(f"Confidence: {result['confidence']}")

🛡️ Security & Privacy

✅ API keys stored in environment variables
✅ No sensitive data logged
✅ User queries validated
✅ Vector database stored locally
✅ PDF documents remain private

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI for providing the embedding and language models
ChromaDB for the vector database
Streamlit for the web interface framework
WUEB for the governing documents

📞 Support

For issues and questions:

Check the USAGE_GUIDE.md
Run python test_chatbot.py for diagnostics
Review configuration in config.py
Open an issue on GitHub

🎓 Ready to help with WUEB questions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 WUEB Document Assistant

✨ Features

🏗️ System Architecture

🚀 Quick Start

Prerequisites

Installation

📁 Project Structure

🎯 Usage Examples

Basic Questions

Advanced Queries

Polish Queries

⚙️ Configuration

🧪 Testing

🔧 API Usage

🛡️ Security & Privacy

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
USAGE_GUIDE.md		USAGE_GUIDE.md
app.py		app.py
chatbot.py		chatbot.py
config.py		config.py
data_loader.py		data_loader.py
env_example.txt		env_example.txt
pdf_processor.py		pdf_processor.py
quick_start.py		quick_start.py
requirements.txt		requirements.txt
setup.py		setup.py
test_chatbot.py		test_chatbot.py
vector_store.py		vector_store.py

Folders and files

Latest commit

History

Repository files navigation

🎓 WUEB Document Assistant

✨ Features

🏗️ System Architecture

🚀 Quick Start

Prerequisites

Installation

📁 Project Structure

🎯 Usage Examples

Basic Questions

Advanced Queries

Polish Queries

⚙️ Configuration

🧪 Testing

🔧 API Usage

🛡️ Security & Privacy

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages