Skip to content

xiaohangguo/pdfChat

Repository files navigation

PDFChat System

A powerful document chat system that supports multimodal interactions with documents containing both text and images.

Features

  • Document Processing:

    • Support for Markdown documents with embedded images
    • Automatic text chunking with image preservation
    • Vector embeddings using BAAI/bge-large-zh
    • FAISS vector store for efficient similarity search
  • Chat Capabilities:

    • Context-aware responses using InternLM XComposer
    • Support for multimodal interactions (text + images)
    • Chat history management
    • Streaming responses

Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Start the backend server:
cd backend
python run.py

The server will start at http://localhost:8000

API Endpoints

Documents

  • POST /api/documents/upload - Upload a markdown document
  • GET /api/documents/list - List all processed documents
  • DELETE /api/documents/{document_name} - Delete a document

Chat

  • POST /api/chat/chat - Chat with a document
  • POST /api/chat/clear-history - Clear chat history

Usage Example

  1. Prepare your markdown document with embedded images
  2. Upload the document using the upload endpoint
  3. Start chatting with the document using the chat endpoint

Requirements

  • Python 3.8+
  • CUDA-capable GPU (recommended)
  • 16GB+ RAM

Models Used

  • Text Embeddings: BAAI/bge-large-zh
  • Multimodal Chat: internlm/internlm-xcomposer2d5-7b

TODO:

  1. 支持pdf、word、txt上传,如果其中也有图片的话,需要类似markdown格式一样处理
  2. 将VLM和sentence模型换成 api

streamlit run Home.py cd /home/lvshuhang/pdfChat && pkill -f "uvicorn backend.app.main:app" || true && uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

Figure 2 | AIME accuracy of DeepSeek-R1-Zero 中展示了什么内容,请你用中文回答

About

基于VLM的pdf对话系统

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published