Skip to content

bugman-007/PDF-QA-POC_FOR_GUY

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF QA POC (HE/EN, RTL-ready)

  • Parses local PDFs from ./pdfs with Azure Document Intelligence (auto HE/EN hints)
  • Normalizes text (NFKC, strips niqqud)
  • Auto-detects language per chunk and per question
  • Embeds with OpenAI, indexes with FAISS, metadata in SQLite
  • Streamlit chat UI with Start/Rebuild button and page citations

Setup

  1. Fill .env with your AZURE_DI_* and OPENAI_API_KEY
  2. Put sample PDFs into ./pdfs
  3. ./run.sh
  4. streamlit run ui/app.py

Notes

  • Cached Azure DI JSON lands in ./data/raw
  • Index files: ./data/faiss.index and ./data/meta.sqlite
  • Switch models via .env (EMBED_MODEL, LLM_MODEL)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors