Skip to content

Camille-Ferrell/rag-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hello Summer 2 folks! Hope you're enjoying Sydney!!!! Here is the current state of the RAG project.

Brief TL;DR of RAG Benchmark & Structure

The purpose of this project is to time how long it takes to access various databases used to store additional data that an LLM can query. Our project has several components:

  • Custom UI found in /src/flet_gui_connected.py and /src/flet_gui_option.py
  • An LLM (currently using Mistral-7b) found in /src/llm.py
  • 4 Suggested Databases foundin/src/Adapters
  • Simulating data "streaming" in /src/DataIngestion
  • Simulating data querying in /src/simulate_queries.py

How to run (from scratch) using venv

  1. Create virtual environment
python -m venv .venv
  1. Activate virtual environment
.\.venv\Scripts\activate
  1. Install dependencies (might take a hot second, highly suggest doing this with the wifi at the university as the hotel wifi is iffy)
pip install -r requirements.txt
  1. Run main
python main.py

Note: Using Python 3.11.3. Not sure how strict versions are.

Note: Need a HuggingFace token which you can get (for free) by making an account on the HuggingFace website.

Current State & Next Steps

LLM

The LLM we use is Mistral-7b, however, it takes forever to query it. We download the model locally and send queries there, however, a team member suggested hosting the model remotely and got better results that way. I tried finding the code for it but coudln't find it but I'm sure it's here somewhere..... :C Try looking for comments from github user jmunen.

Benchmarking

Currently, we time how long it takes to query whichever database being used and then record it to a CSV. Next steps would be to gather data for all four databases and analyze which databases perform best. Additionally, you can find more metrics to measure such as how long it takes for the LLM to respond and what not.

Databases

We have implementations to add data and query all four databases LOCALLY, however, we were only able to remotely host ClickHouse (and maybe Cassandra? I'm not sure). This was one of our biggest challenges to do using free trials or free student credits.

ClickHouse can be hosted through ClickHouse's website but the others need to be hosted by other services. Azure has potential but is extremely frustrating to work with and we didn't explore AWS too much.

Afterwards, you may be able to use NetApp (the stakeholder) to host some of the databases but that would be later on.

UI

UI is mostly complete, but feel free to play around. There is a branch that has additional features (like a light/dark mode) on the origin/ui branch.

Other Branches

There are quite a few other branches but most of them just contain components for things on the main branch. I didn't want to delete them as they may contain useful componenets but nothing immediately comes to mind. You should probably be fine working off main but if you're ever lost feel free to peruse the other branches.

How to Start

First, install the requirements (which will take a while because of PyTorch). Then, try to get main running. Then, use the .env.example as a framework to make your .env file. You will need to create free student accounts and such to populate the fields.

Acknowledging the Other Repo

There is an additional repo that another team member created. Ultimately, we found as a group (minus said other team member) that the work was too confusing and not a team effort and decided to stick with this current repo. If you are up for a deciphering challenge, feel free and try to look into it. However, part our goal for this current repo is to be much more digestible and hopefully not over-complicate the goal of the project

As a lesson from our semester, we cannot explain enough how important it is to have group discussions on next steps. We highly encourage you to never make big decisions on your own and make sure everyone is updated on the current state of the project. Do not be afraid to challenge ideas and always ask clarifying questions.

Best of luck!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 11

Languages