PDF QA POC (HE/EN, RTL-ready) Parses local PDFs from ./pdfs with Azure Document Intelligence (auto HE/EN hints) Normalizes text (NFKC, strips niqqud) Auto-detects language per chunk and per question Embeds with OpenAI, indexes with FAISS, metadata in SQLite Streamlit chat UI with Start/Rebuild button and page citations Setup Fill .env with your AZURE_DI_* and OPENAI_API_KEY Put sample PDFs into ./pdfs ./run.sh streamlit run ui/app.py Notes Cached Azure DI JSON lands in ./data/raw Index files: ./data/faiss.index and ./data/meta.sqlite Switch models via .env (EMBED_MODEL, LLM_MODEL)