This project is a Retrieval-Augmented Generation (RAG) chatbot that extracts and processes data from any website through its URL. The chatbot allows users to query website-related information and receive AI-generated responses based on scraped content.
- Language: Python
- Web Scraping: Reader, Requests
- Data Processing: LangChain (Text Splitting)
- Vector Storage: FAISS (Facebook AI Similarity Search)
- Embeddings: Google Generative AI API
- Frontend: Streamlit
- Deployment: Streamlit Cloud
git clone https://github.com/sanaa9012/scrappy.git
cd scrappypip install -r requirements.txtCreate a .env file and add:
GOOGLE_API_KEY=your_google_api_key
ANY_WEBSITE_URL=(add URL)
JINA_API = "https://r.jina.ai"The project scrapes content from the website using Reader AI
Once the data is extracted, it is split into smaller chunks using LangChain
The cleaned and split text is converted into vector embeddings using the Google Generative AI API, then stored in FAISS
When a user asks a question, the query is embedded and compared against stored vectors in FAISS
The retrieved content is passed to Google Generative AI for response generation
The chatbot is deployed with Streamlit
streamlit run app.py- Deploy on Streamlit Cloud.
- Ensure API keys are added as environment variables in deployment settings.
This project efficiently integrates web scraping, vector search, and AI-powered text generation to create an interactive chatbot that provides real-time Aptos-related insights. By leveraging RAG, the chatbot ensures accurate and contextually relevant responses, enhancing user experience.