Author: Aditya Purohit (Richard Montgomery High School)
Mentor: Dr. William Ratcliff, NIST Center for Neutron Research (NCNR)
This project integrates Large Language Models (LLMs) and AI-powered tools into neutron scattering research workflows at the NIST Center for Neutron Research.
The NCNR serves a diverse user base, many of whom are not deeply familiar with specific instruments or related prior work.
Our system uses the Model Context Protocol (MCP) to connect LLMs with validated NCNR tools and datasets, improving research efficiency and accessibility.
- Computes motor angles for triple-axis spectrometers based on crystallographic inputs.
- Allows natural language descriptions of experimental setups via a chat interface.
- Benefits:
- Reduces setup complexity.
- Speeds up configuration.
- Makes instrumentation more accessible to non-expert users.
- Uses a database of NCNR publications (2016–2017) to answer natural language queries.
- Links queries to prior BT7 experiments, showing setups, parameters, and results.
- Benefits:
- Speeds up literature searches.
- Improves accuracy of experiment planning.
- Applies semantic search and RAG-based clustering to incoming NCNR proposals.
- Groups proposals by scientific theme to streamline review.
- Benefits:
- Highlights thematic overlaps.
- Creates clusters with descriptive keywords.
- Data Integration: NCNR publications and proposal documents ingested into searchable formats.
- MCP Interface: LLMs connected to specialized tools through a secure API.
- Tool Development: ICP Tool for instrument setup; RAG Tool for targeted literature retrieval; Clustering tool for thematic grouping of proposals.
- Faster literature searches for experiment planning.
- Reduced setup complexity in instrument configurations.
- Streamlined proposal review through thematic clustering.
- Increased accessibility of advanced NCNR tools to non-experts.
- Expand RAG database to include all NCNR instruments and external labs.
- Integrate clustering outputs with reviewer assignment systems.
- Create seamless, multi-tool workflows connecting setup, search, and review.
- Mentor: Dr. William Ratcliff
- Infrastructure & Review: Dr. Paul Kienzle
- Administrative Support: Dr. Julie Borchers
- Program Support: NIST SHIP Program
- Personal Support: Parents and all who contributed.
git clone https://github.com/scattering/RAG-ICP-CLUSTERING.git
cd RAG-ICP-CLUSTERING
Create and activate a conda environment
conda env create -f environment.yml
conda activate rag-env
Use the provided scripts to run the ICP, RAG, or clustering tools. Documentation and examples will be added for each module.
Create a .env file in the project root directory and add your RChat API key like this:
RCHAT_API_KEY= "your_rchat_api_key_here"
Replace your_rchat_api_key_here with your actual RChat API key.
Configure the MCP server
The file config.json contains a list of MCP servers that will be made available for your local Open WebUI instance:
search_database.py: Python tool that searches a RAG vector database
lattice_calculator.py: Python tool that provides motor coordinates based on reciprocal space coordinates
PDFtoMD.py: Python tool that converts PDF documents to Markdown
createdb_AI.py: Python tool that creates and initializes an AI database
control_AI.py: Python tool that manages embeddings and database control operations
remove_credits.py: Python tool that removes credits from documents or data entries
onechunkdoc.py: Python tool that combines document chunks into a single chunk for processing
clustering.py: Python tool that performs document clustering
clusterhierarchy.py: Python tool that generates hierarchical clustering of documents
Start the MCP server
Run: cd RAG-ICP-CLUSTERING/nist-chat-main/resources/mcp_server
Navigate to the config.json file. Change the "command" for each tool to a file path that goes to a python version in your own conda environment. This is very important, as it will NOT work without this change.
From the mcp_server folder, with your python env activated, run the following command:
uvx mcpo --port 8081 --api-key "CHANGE_ME" --config ./config.json
Notes:
- api-key should be a random string, this is NOT your Rchat API KEY,
- an api-key is provided in the
mcpo documentation as an example and should be changed.
Start Python Embedding
Run the following command to start the Python embedding in a new terminal in RAG_ICP_CLUSTERING:
python BAAI_LARGE.py
Note: To run this model effectively, a graphics card should be used. We have tested it while running one 3090, which gave fast and accurate results, but smaller hardware may be used, most likely with increased processing time.
Start Open Web UI
Navigate to the nist-chat-main/rag_execution directory and run the frontend with:
cd RAG-ICP-CLUSTERING/nist-chat-main/rag_execution
pip install open-webui
python launch_frontend.py
Register the tools in your local Open WebUI
In a browser, go to your local Open WebUI instance, most likely at http://localhost:8080, and add the tools in the admin dashboard.
Each MCP server from the config.json file needs to be added (2 in this case):
PDFtoMD
URL: http://localhost:8081/PDFtoMD
API KEY: the api-key set in the mcpo command
Name: PDFtoMD
Database
URL: http://localhost:8081/database
API KEY: the api-key set in the mcpo command
Name:CreateDB
Embedding Control
URL: http://localhost:8081/embedding_control
API KEY: the api-key set in the mcpo command
Name: ControlEmbeddings
Lattice Calculator
URL: http://localhost:8081/LatticeCalculator
API KEY: the api-key set in the mcpo command
Name: Lattice Calculator
Remove Credits
URL: http://localhost:8081/remove_credits
API KEY: the api-key set in the mcpo command
Name: Remove Credits
OneChunkDoc
URL: http://localhost:8081/onechunkdoc
API KEY: the api-key set in the mcpo command
Name: One Doc Chunking
Clustering
URL: http://localhost:8081/clustering
API KEY: the api-key set in the mcpo command
Name: Clustering
Query
URL: http://localhost:8081/query
API KEY: the api-key set in the mcpo command
Name: SearchDB
Hierarchy
URL: http://localhost:8081/hierarchy
API KEY: the api-key set in the mcpo command
Name: Hierarchical Clustering
Then you should be ready to use the tools in the Open-Webui interface, and toggling them to activate, allowing you to prompt them.
There are videos showing how each tool works, along with prompt examples that can be used to validate if your tools are working correctly. These videos are located under the "Documentation" folder, along with slides and poster about this project.