A local API to run ChatGPT-style large language models — fully open-source, extensible, and designed for real-world developer workflows.
As large language models grow bigger and more powerful, accessing them via cloud APIs brings latency, cost, and privacy concerns. Local LLMs exist but lack a unified, developer-friendly API that enables advanced use cases.
COGNIX fills that gap. It’s a Python module and HTTP API that mimics the OpenAI ChatGPT API experience, running fully locally on your machine or private infrastructure. It supports cutting-edge backends like LLAMA.cpp, Ollama, and others, empowering you with the flexibility and control that cloud APIs can’t provide.
- Multi-modal and multi-domain: Beyond chat completions, COGNIX supports reasoning workflows, document and image processing, and complex conversational management — all through the same familiar API interface.
- Web and code integration: Inspired by projects like Cursor, COGNIX implements code hashing and model interaction patterns that can seamlessly integrate with code editors like VS Code. This enables advanced coding assistance powered by local LLMs directly in your development environment.
- Privacy and control: Keep sensitive data on-premises with zero cloud dependencies.
- Cost efficiency: Eliminate usage-based API billing by running models on your own hardware.
- Developer experience: Use the same API patterns and tooling you already know from OpenAI, minimizing friction.
- Extensibility: Add new backends, tool integrations, and capabilities easily.
- Open source: Community-driven and hackable by design.
COGNIX is designed to be the universal local API for large language models — a bridge between the power of local LLMs and the rich ecosystem of tools built around the OpenAI API style.
By supporting advanced use cases like multi-step reasoning, document understanding, image-based queries, and interactive conversations — plus planned integration with code editors through smart code hashing — COGNIX aims to make local LLMs as powerful, accessible, and seamless to use as cloud APIs.
- Implement core OpenAI-compatible API endpoints:
/v1/chat/completions,/v1/embeddings,/v1/models. - Build flexible adapter system supporting LLAMA.cpp, Ollama, vLLM, and more.
- Add multi-modal support for document and image processing.
- Enable conversational context management with advanced memory features.
- Integrate streaming, function and tool calling APIs.
- Develop usage tracking, authentication, and rate limiting.
- Build integrations for IDEs and code editors leveraging code hashing techniques for contextual coding assistance.
COGNIX is a community-first project. Developers, researchers, and enthusiasts are invited to collaborate and help build the future of local LLMs — faster, cheaper, more private, and tightly integrated with real workflows.
MIT License — free for anyone to use, modify, and distribute.
Powered by the local LLM revolution.