Docker Compose stack for scalable TEI embeddings (multi-GPU) fronted by a FastAPI proxy with a Qdrant cache. 🐳⛓️💾
-
Updated
Apr 8, 2025 - Python
Docker Compose stack for scalable TEI embeddings (multi-GPU) fronted by a FastAPI proxy with a Qdrant cache. 🐳⛓️💾
Text Embeddings Inference (TEI)'s unofficial python wrapper library for batch processing with asyncio
Self-hosted text embeddings server powered by Hugging Face TEI, with an OpenAI-compatible API. Supports BGE, Nomic, MiniLM and other models. Features optional API key auth, offline/air-gapped mode, and persistent model cache.
Repository of the project Efficient Edge Embeddings (E*3) project, subgrant 2dAI2OC07 from EU Horizon dAIEDGE
It will automatically batch inference requests from independent users together in a single batch request for efficiency, so that for users the interface looks like individual requests, but internally it is handled as a batch request
An enterprise-grade RAG Chatbot built with AI-assisted development. Features local LLMs (Ollama), LlamaIndex integration, and Entra ID SSO. Designed for secure, air-gapped PDF analysis. 📄
Add a description, image, and links to the text-embeddings-inference topic page so that developers can more easily learn about it.
To associate your repository with the text-embeddings-inference topic, visit your repo's landing page and select "manage topics."