An intelligent document retrieval and chat system that integrates Groq's free Llama 3.3 70B model with Model Context Protocol (MCP) tools for seamless document search and web browsing capabilities.
MCP.RAG.Video.Demo.mp4
- π€ Free Powerful LLM: Groq's Llama 3.3 70B model (completely free!)
- π Smart Document Retrieval: Hybrid RAG with reranking from your Qdrant database
- π Real-time Web Search: Integrated Tavily API for current information
- π§ MCP Integration: Model Context Protocol for seamless tool calling
- π¬ Intelligent Chat: LLM automatically decides when to use tools
- π¨ Beautiful UI: Modern Gradio interface with multiple tabs
- β‘ Fast Performance: Sub-second document retrieval, 2-5s total response time
Ask questions like:
- "Who is Jawher Khalifa?" β Uses document retrieval
- "What are the latest AI trends in 2025?" β Uses web search
- "Tell me about the ML projects in the resume" β Uses both tools
The LLM intelligently chooses which tools to use based on your question!
Before deployment, ensure you have:
- Python 3.8+ installed
- Qdrant Database running (local or remote)
- Groq API Key (free from console.groq.com)
- Tavily API Key (optional, for web search)
- Documents loaded in your Qdrant collection
# Navigate to your project directory
git clone https://github.com/jawherkh/MCP-RAG-Agent.git
cd "MCP-RAG-Agent"
# Install required packages
pip install groq gradio python-dotenv qdrant-client langchain-qdrant langchain-huggingface tavily-python sentence-transformers- Visit console.groq.com
- Sign up for a free account
- Go to "API Keys" section
- Create a new API key
- Copy the key (starts with
gsk_)
- Visit tavily.com
- Sign up and get your API key
- Copy the key (starts with
tvly-)
manually create/edit .env file:
GROQ_API_KEY=gsk_your_groq_api_key_here
TAVILY_API_KEY=tvly_your_tavily_api_key_heredocker run -p 6333:6333 -v ${pwd}/qdrant_data:/qdrant/storage qdrant/qdrant# Download and run Qdrant locally
# Follow instructions at: https://qdrant.tech/documentation/quick-start/If you haven't loaded documents yet:
# Open and run the chunking notebook
jupyter notebook data/chunking.ipynb# Start the Gradio web app
python llm_app.pyThen open: http://localhost:7861
# Start the CLI chat
python llm_mcp_client.pyGroqMCPClient.chat(message, system_prompt)- Main chat functionretrieve_tool(query, k)- Document retrievalwebsearch_tool(query, k)- Web search
| Variable | Required | Description |
|---|---|---|
GROQ_API_KEY |
Yes | Groq API key for Llama access |
TAVILY_API_KEY |
No | Tavily API key for web search |
QDRANT_URL |
No | Qdrant URL (default: localhost:6333) |
QDRANT_API_KEY |
No | Qdrant API key if using cloud |
MCP RAG/
βββ llm_app.py # Gradio web interface for chat and tool testing
βββ llm_mcp_client.py # LLM client that integrates Groq with MCP tools (CLI)
βββ mcp_tools.py # Shared tools for both server and client (retrieval, websearch)
βββ server.py # MCP server exposing tools via FastMCP
βββ requirements.txt # Python dependencies
βββ README.md # Documentation and deployment instructions
βββ .env # Environment variables (API keys, not committed)
βββ .env.example # Example environment file for deployment
βββ deploy.py # Automated deployment script (optional)
βββ Dockerfile # Docker container configuration (optional)
βββ docker-compose.yml # Docker Compose for app + Qdrant (optional)
βββ utils/
β βββ retrievers.py # Document retrieval logic (hybrid, reranking)
β βββ ranker.py # Reranking utilities
βββ data/
β βββ chunking.ipynb # Notebook for document chunking and loading into Qdrant
β βββ docs/ # Folder for your PDF and other documents
βββ __pycache__/ # Python cache files (ignored)
βββ ... # Other supporting files
- llm_app.py: Main Gradio app for chatting with the LLM and using tools via UI.
- llm_mcp_client.py: Command-line client for LLM + MCP tools (for testing or automation).
- mcp_tools.py: Core logic for retrieval and websearch, shared by both server and client.
- server.py: Runs the MCP server, exposing tools for LLM or other clients.
- utils/: Custom retrieval and reranking logic for hybrid RAG.
- data/: Notebooks and document storage for chunking/loading into Qdrant.
- requirements.txt: All Python dependencies for the project.
- .env / .env.example: API keys and environment configuration.
- Dockerfile / docker-compose.yml: For containerized and production deployments.
- Fork the repository
- Create a feature branch
- Add your improvements
- Test thoroughly
- Submit a pull request
This project is for educational and research purposes. Please respect the terms of service for all APIs used.
Your MCP RAG system with Groq Llama 3.3 70B is now ready for deployment. The LLM will automatically use document retrieval and web search to provide intelligent, contextual responses.
Need help? Check the troubleshooting section or create an issue.
Enjoy your free, powerful AI assistant! π