Adaptive Evidence Distillation Engine: The Context Optimization Engine that Respects Your Context Window.
AEDE is a context optimization engine built on top of modern RAG and reasoning workflows designed to solve the "context bloat" problem. By transforming massive datasets into high-signal, distilled evidence, it saves up to 84% in token usage while significantly reducing the amount of context sent to frontier models.
Stop throwing tokens into the void. Start reasoning with intent.
- Up to 84% context reduction
- Tested across 50+ benchmark questions
- Tested on annual reports, research papers, agent conversations, and chat histories
- Adaptive routing with LangGraph
- Multi-model reasoning pipeline
- 90% answer quality retention evaluated by manual answer comparison against ground truth response across 50+ benchmark questions
We tested AEDE against industry-standard document types. The results aren't just incremental; they're transformative.
| Context Type | Questions Tested | Reduction Range |
|---|---|---|
| Annual Reports | 20+ | 49% – 84% |
| Research Papers | 10+ | 42% – 70% |
| Agent Conversations | 10+ | 43% – 84% |
| Conversation History | 10+ | 55% – 63% |
Benchmarks represent token savings compared to raw document injection into Gemini 3.1 Flash Lite without AEDE optimization.
Question:
Compare Tesla revenue across the latest two fiscal years.
Raw Context:
- 1,967 tokens
AEDE Optimized Context:
- 315 tokens
Token Reduction:
- 84%
Workflow: retrieve → extract → analyze → compress → reason
Traditional RAG retrieves relevant context and delegates all reasoning to the final model. AEDE extracts, analyzes, and distills. It identifies core concepts, pulls supporting evidence (claims + quotes), and merges redundancies before the final LLM even sees it.
AEDE orchestrates a multi-model dance for maximum efficiency:
- Distillation Model (Llama 3.1 8B via Groq): Handles extraction, analysis, and distillation in milliseconds.
- Reasoning Model (Gemini 3.1 Flash Lite): Performs the final, high-context reasoning on the optimized evidence.
Powered by LangGraph, the system makes real-time decisions:
retrieve_more: Not enough coverage? It goes back for more.compress: Too much noise? It aggressively de-duplicates.direct_answer: Simple query? It skips the heavy lifting to save time and cost.
If there's no quote, it didn't happen. AEDE preserves the lineage of every claim, ensuring that the final answer is grounded in verifiable facts from your own data.
AEDE isn't just a CLI; it's a full-stack experience. The built-in dashboard provides:
- Performance Sparklines: Track token savings and latency in real-time.
- Workflow Tracing: See exactly which path the AEDE engine took (
retrieve_more->compress->reason). - Collection Management: Effortlessly ingest PDFs or paste text blobs into persistent ChromaDB collections.
- Confidence Metrics: Visual indicators of "Evidence Coverage" so you know how much to trust the answer.
- Backend: FastAPI, LangGraph, ChromaDB, Pydantic.
- Frontend: Next.js 14, Tailwind CSS, Framer Motion, Lucide.
- AI Models: Llama 3.1 8B (via Groq), Gemini 3.1 Flash Lite (via Google AI).
- Embeddings: Sentence-Transformers (MiniLM-L6-v2).
git clone https://github.com/your-username/AEDE.git
cd AEDEcd backend
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txtCreate a .env file in the backend/ directory:
GROQ_API_KEY=your_groq_key
GEMINI_API_KEY=your_gemini_keycd ../frontend
npm installStart Backend:
# From the backend directory
uvicorn api:app --reload --port 8000Start Frontend:
# From the frontend directory
npm run devOpen http://localhost:3000 to see the magic happen.
- Extract Concepts: Identify what the user is actually asking for.
- Focused Retrieval: Pull the most relevant chunks from ChromaDB.
- Evidence Extraction: Turn raw chunks into "Claims + Quotes".
- Coverage Analysis: Check if we have enough information to answer.
- Workflow Compiler: Decide whether to retrieve more, compress, or answer.
- Evidence Distillation: Merge redundant facts into a clean context.
- Final Reasoning: Answer the query using the optimized context.
Any Context Source
(PDFs, Agent Chats, Conversations)
↓
Retrieve
↓
Evidence Extraction
↓
Evidence Analysis
↓
Coverage Evaluation
↓
Adaptive Routing
┌────────┬────────┐
│ │ │
▼ ▼ ▼
Direct Compress Reason
└────────┴────────┘
↓
Frontier Model
↓
Answer
We're building the future of context-efficient AI. If you have ideas for better distillation algorithms or smarter routing, we'd love your help!
- Fork the Project
- Create your Feature Branch
- Commit your Changes
- Push to the Branch
- Open a Pull Request
Reducing context. Preserving evidence. Improving reasoning.