LLM-Health-Assistant

Logo

A Brief Report

API Document

Disclaimer

The LLM Health Assistant provides general information only and does not constitute medical advice. It does not establish a doctor-patient relationship. Always consult a qualified healthcare professional for medical concerns. We are not responsible for any decisions made based on the platform’s information.

Introduction

The LLM Health Assistant is a health consultation platform based on a large language model (LLM), leveraging generative AI and retrieval-augmented generation (RAG) technologies to provide users with personalized and intelligent health Q&A services. The system integrates multiple functional modules, including text interaction, voice interaction, PubMed paper retrieval, user information management, and conversation storage.

Back to Table of Contents

Service Architecture

The system follows a 4-Layer Architecture (not include presentation layer) to ensure efficiency, scalability, and security:

Presentation Layer
- Provides the user interface for interactions, supporting both text and voice input.
- Sends user requests to the Process Centric Layer for processing.
- Key Components:
  - Web Frontend (HTML/CSS/JS): Login, health consultation, voice chat, user profile management.
Process Centric Layer
- Coordinates the overall business logic and invokes various APIs for task execution.
- Core functionalities:
  - User Authentication (OAuth2 + JWT)
  - Text Chat Processing (GLM-4-Plus)
  - Voice Processing (GLM-4-Voice)
  - History Retrieval (Pinecone)
  - Medical Paper Search (PubMed API)
Business Logic Layer
- Handles AI interaction, context retrieval, and query parsing.
- Core functionalities:
  - LLM Processing (GLM-4-Plus for intelligent responses)
  - Context Retrieval (Pinecone for historical conversation storage)
  - Query Handling (PubMed API for medical paper retrieval)
Adapter Services Layer
- Manages interactions with external APIs and ensures system extensibility.
- Key components:
  - GLM-4-Plus API (Processes text-based queries)
  - GLM-4-Voice API (Handles voice interactions)
  - Pinecone Adapter (Stores and retrieves user conversations)
  - SQLite Adapter (Manages user authentication and data)
  - PubMed API (Fetches the latest medical research)
Data Services Layer

Provides foundational AI and database services that power the application.
Key components:
- GLM-4-Voice (Processes speech input and generates voice responses)
- GLM-4-Plus (Handles text-based health queries and generates intelligent responses)
- Pinecone (Stores and retrieves user conversation history for context-aware interactions)
- PubMed (Provides medical research data for evidence-based health consultations)
- SQLite (Manages user authentication and stores basic user information)

System Architecture

Back to Table of Contents

Technology Stack and Development Tools

Backend

FastAPI (Lightweight web framework supporting high concurrency)
SQLite (Lightweight database for user data storage)
Pinecone (Vector database for user conversation history)
OAuth2.0 + JWT (User authentication for API security)
PubMed API (Medical literature retrieval)

Frontend

HTML, CSS, JavaScript (For the user interface)
Fetch API (For frontend-backend communication)

AI Models

GLM-4-Plus (Text-based health consultation)
GLM-4-Voice (Voice input processing)
Sentence Transformer (all-MiniLM-L6-v2) (Text embedding for context retrieval and semantic search)

Development Tools

Hardware

Operating System: Windows 11 Home
CPU: Intel(R) Core(TM) i7-14700HX @2.1GHZ
GPU: NVIDIA GeForce RTX 4070 Laptop GPU (8 GB)
Memory: 32 GB

Software

Tool	Purpose
Anaconda	Development environment management
VS Code	Code development
JupyterLab	Early-stage experiment exploration
Edge Browser	Frontend interface testing
Postman	API testing

External API Key Sources

Back to Table of Contents

Usage

During development, torch 2.6+cu124 was used for acceleration, but CUDA is not mandatory. Since the CPU computation speed is within an acceptable range, the Docker image is built with the CPU version of Torch for convenience. If you wish to use GPU acceleration within the image, please install the NVIDIA Container Toolkit. Dockerfile and docker-compose.yml need to be reconfigured.
For user data management, this project also includes a database management system that allows querying and removing accounts from the two databases.

Running Code

Using conda to manage environment.

conda create -n sde python=3.9
conda activate sde 
git clone https://github.com/Avalon-S/LLM-Health-Assistant
cd LLM-Health-Assistant

Create a .env file in the project root and add the following API keys:

# SECRET_KEY can be generated randomly by you.
SECRET_KEY=your_secret_key
ZHIPU_API_KEY=your_glm_api_key
PINECONE_API_KEY=your_pinecone_key

For Pinecone configuration, set index name=healthassistant, set region as us-east-1, cloud as AWS.

Install Dependencies

pip install -r requirements.txt

Run the FastAPI Server

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Once started, API documentation can be accessed via:

http://localhost:8000/redoc

Run the Frontend Input http://localhost:8000/ in your browser to access the LLM Health Assistant web interface.
Enter the administrator system

python CLI_DB_Manager.py

Code Running

Build & Run the Docker Image

Before doing this, make sure the Docker CLI is enabled. It is recommended to install Docker Desktop.

git clone https://github.com/Avalon-S/LLM-Health-Assistant
cd LLM-Health-Assistant

Create a .env file in the project root and add the following API keys:

# SECRET_KEY can be generated randomly by you.
SECRET_KEY=your_secret_key
ZHIPU_API_KEY=your_glm_api_key
PINECONE_API_KEY=your_pinecone_key

For Pinecone configuration, set index name=healthassistant, set region as us-east-1, cloud as AWS.

Build the image (without using cache), it takes about 10-20 minutes, depending on your internet speed.

docker-compose build --no-cache

Start the container (run in the background)

docker-compose up -d

Stop all containers started by docker-compose down.

docker-compose down

Enter the administrator system (keep the container is running)

docker ps # Get CONTAINER ID
docker exec -it <CONTAINER ID> /bin/bash
python CLI_DB_Manager.py

Image Running

Back to Table of Contents

Project Display

Login Page

Dashboard Page

Health Chat Page

Voice Chat Page

Back to Table of Contents

Reflection

At the beginning, the initial plan was to locally deploy LLaMA 3.2 1B and 3B. However, during later development, there were numerous dependency conflicts, and the models performed extremely poorly in multi-turn dialogues with severe hallucinations. Moreover, locally deploying an LLM would result in an excessively large Docker image, making deployment time-consuming. Therefore, we switched to using GLM-4-Plus, which delivers performance comparable to GPT-4o, and the results have been satisfactory.

It should be noted that the strategy for deciding whether to call specific APIs to enhance the prompt in this project follows an expert system approach. Specifically, if certain keywords are detected, such as my age or paper, the system will automatically call the Pinecone or PubMed API, respectively, for retrieval. This is a simple, fast, and effective strategy. LangChain was not used because experiments showed that the task was not complex (no deep reasoning required), and using an agent to determine which API to call took significantly longer than letting the LLM respond directly. Additionally, there was no difference in answer quality—GLM-4-Plus was already powerful enough.

Overall, despite the tight timeline, I am fairly satisfied with the implementation of this project.

Back to Table of Contents

License

This project is licensed under the MIT License. See the LICENSE file for details.

Back to Table of Contents

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
backend		backend
docs		docs
frontend		frontend
.gitignore		.gitignore
CLI_DB_Manager.py		CLI_DB_Manager.py
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
users.db		users.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Health-Assistant

Disclaimer

Table of Contents

Introduction

Service Architecture

Technology Stack and Development Tools

Backend

Frontend

AI Models

Development Tools

Usage

Project Display

Reflection

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM-Health-Assistant

Disclaimer

Table of Contents

Introduction

Service Architecture

Technology Stack and Development Tools

Backend

Frontend

AI Models

Development Tools

Usage

Project Display

Reflection

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages