Skip to content

scattering/RAG-ICP-CLUSTERING

Repository files navigation

RAG-ICP-CLUSTERING

Applications of Large Language Models and AI to Neutron Scattering

Author: Aditya Purohit (Richard Montgomery High School)

Mentor: Dr. William Ratcliff, NIST Center for Neutron Research (NCNR)

Overview

This project integrates Large Language Models (LLMs) and AI-powered tools into neutron scattering research workflows at the NIST Center for Neutron Research.

The NCNR serves a diverse user base, many of whom are not deeply familiar with specific instruments or related prior work.

Our system uses the Model Context Protocol (MCP) to connect LLMs with validated NCNR tools and datasets, improving research efficiency and accessibility.

Key Components

1. Instrument Control Program (ICP) Tool

  • Computes motor angles for triple-axis spectrometers based on crystallographic inputs.
  • Allows natural language descriptions of experimental setups via a chat interface.
  • Benefits:
    • Reduces setup complexity.
    • Speeds up configuration.
    • Makes instrumentation more accessible to non-expert users.

2. BT7 Retrieval-Augmented Generation (RAG) Tool

  • Uses a database of NCNR publications (2016–2017) to answer natural language queries.
  • Links queries to prior BT7 experiments, showing setups, parameters, and results.
  • Benefits:
    • Speeds up literature searches.
    • Improves accuracy of experiment planning.

3. Proposal Classifier & Clustering Tool

  • Applies semantic search and RAG-based clustering to incoming NCNR proposals.
  • Groups proposals by scientific theme to streamline review.
  • Benefits:
    • Highlights thematic overlaps.
    • Creates clusters with descriptive keywords.

Methodology

  • Data Integration: NCNR publications and proposal documents ingested into searchable formats.
  • MCP Interface: LLMs connected to specialized tools through a secure API.
  • Tool Development: ICP Tool for instrument setup; RAG Tool for targeted literature retrieval; Clustering tool for thematic grouping of proposals.

Results

  • Faster literature searches for experiment planning.
  • Reduced setup complexity in instrument configurations.
  • Streamlined proposal review through thematic clustering.
  • Increased accessibility of advanced NCNR tools to non-experts.

Future Directions

  • Expand RAG database to include all NCNR instruments and external labs.
  • Integrate clustering outputs with reviewer assignment systems.
  • Create seamless, multi-tool workflows connecting setup, search, and review.

Acknowledgements

  • Mentor: Dr. William Ratcliff
  • Infrastructure & Review: Dr. Paul Kienzle
  • Administrative Support: Dr. Julie Borchers
  • Program Support: NIST SHIP Program
  • Personal Support: Parents and all who contributed.

Getting Started

git clone https://github.com/scattering/RAG-ICP-CLUSTERING.git
cd RAG-ICP-CLUSTERING

Create and activate a conda environment

conda env create -f environment.yml conda activate rag-env

Use the provided scripts to run the ICP, RAG, or clustering tools. Documentation and examples will be added for each module.

Create a .env file in the project root directory and add your RChat API key like this:

RCHAT_API_KEY= "your_rchat_api_key_here"

Replace your_rchat_api_key_here with your actual RChat API key.


Configure the MCP server

The file config.json contains a list of MCP servers that will be made available for your local Open WebUI instance:

search_database.py: Python tool that searches a RAG vector database lattice_calculator.py: Python tool that provides motor coordinates based on reciprocal space coordinates PDFtoMD.py: Python tool that converts PDF documents to Markdown createdb_AI.py: Python tool that creates and initializes an AI database control_AI.py: Python tool that manages embeddings and database control operations remove_credits.py: Python tool that removes credits from documents or data entries onechunkdoc.py: Python tool that combines document chunks into a single chunk for processing clustering.py: Python tool that performs document clustering clusterhierarchy.py: Python tool that generates hierarchical clustering of documents

Start the MCP server

Run: cd RAG-ICP-CLUSTERING/nist-chat-main/resources/mcp_server Navigate to the config.json file. Change the "command" for each tool to a file path that goes to a python version in your own conda environment. This is very important, as it will NOT work without this change. From the mcp_server folder, with your python env activated, run the following command:

uvx mcpo --port 8081 --api-key "CHANGE_ME" --config ./config.json

Notes:

  • api-key should be a random string, this is NOT your Rchat API KEY,
  • an api-key is provided in the mcpo documentation as an example and should be changed.

Start Python Embedding

Run the following command to start the Python embedding in a new terminal in RAG_ICP_CLUSTERING:

python BAAI_LARGE.py
Note: To run this model effectively, a graphics card should be used. We have tested it while running one 3090, which gave fast and accurate results, but smaller hardware may be used, most likely with increased processing time.

Start Open Web UI

Navigate to the nist-chat-main/rag_execution directory and run the frontend with:

cd RAG-ICP-CLUSTERING/nist-chat-main/rag_execution
pip install open-webui
python launch_frontend.py

Register the tools in your local Open WebUI

In a browser, go to your local Open WebUI instance, most likely at http://localhost:8080, and add the tools in the admin dashboard.

Each MCP server from the config.json file needs to be added (2 in this case):


  
PDFtoMD
URL: http://localhost:8081/PDFtoMD
API KEY: the api-key set in the mcpo command
Name: PDFtoMD

Database
URL: http://localhost:8081/database
API KEY: the api-key set in the mcpo command
Name:CreateDB

Embedding Control
URL: http://localhost:8081/embedding_control
API KEY: the api-key set in the mcpo command
Name: ControlEmbeddings

Lattice Calculator
URL: http://localhost:8081/LatticeCalculator
API KEY: the api-key set in the mcpo command
Name: Lattice Calculator

Remove Credits
URL: http://localhost:8081/remove_credits
API KEY: the api-key set in the mcpo command
Name: Remove Credits

OneChunkDoc
URL: http://localhost:8081/onechunkdoc
API KEY: the api-key set in the mcpo command
Name: One Doc Chunking

Clustering
URL: http://localhost:8081/clustering
API KEY: the api-key set in the mcpo command
Name: Clustering

Query
URL: http://localhost:8081/query
API KEY: the api-key set in the mcpo command
Name: SearchDB

Hierarchy
URL: http://localhost:8081/hierarchy
API KEY: the api-key set in the mcpo command
Name: Hierarchical Clustering

Then you should be ready to use the tools in the Open-Webui interface, and toggling them to activate, allowing you to prompt them.

There are videos showing how each tool works, along with prompt examples that can be used to validate if your tools are working correctly. These videos are located under the "Documentation" folder, along with slides and poster about this project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •