CodeSight AI is a production-style full-stack application for exploring unfamiliar GitHub repositories with agentic AI. It combines local RAG over cloned source code with MCP-powered external repo intelligence (CodeWiki), then returns grounded answers, file references, Mermaid diagrams, and onboarding notes.
- Public GitHub repo ingestion and indexing
- Code-aware chunking with language and symbol metadata
- Embeddings + vector search with ChromaDB
- Agentic orchestration with LangGraph
- MCP integration layer for CodeWiki with graceful fallback
- Mermaid diagram generation + UI rendering
- Structured learning notes (onboarding, architecture, auth-flow, api-flow, key-modules)
- Grounded references (file path + line ranges)
- Premium motion-first frontend (landing + workspace transitions)
- Dedicated 3D Notes reader route with page-turn interaction (
/notes/{repo_id}) - Dockerized backend and frontend
- Backend: FastAPI, LangChain, LangGraph, ChromaDB, sentence-transformers, GitPython, Pydantic
- Frontend: Next.js 14, TypeScript, Tailwind CSS, Radix UI primitives, Mermaid
- Infra: Docker, docker-compose, Makefile
codesight-ai/
backend/
app/
api/
agent/
core/
ingestion/
retrieval/
mcp/
prompts/
services/
tools/
models/
utils/
main.py
tests/
pyproject.toml
Dockerfile
.env.example
frontend/
app/
components/
hooks/
lib/
public/
types/
Dockerfile
package.json
docker-compose.yml
Makefile
README.md
flowchart LR
UI[Next.js Frontend] --> API[FastAPI Backend]
API --> INGEST[Repo Ingestion Pipeline]
INGEST --> STORE[(Repo Metadata JSON)]
INGEST --> VS[(ChromaDB Vector Store)]
API --> GRAPH[LangGraph Agent]
GRAPH --> RET[Retriever]
GRAPH --> MAP[Repo Mapper]
GRAPH --> MCP[CodeWiki MCP Adapter]
GRAPH --> LLM[LLM Service]
RET --> VS
GRAPH --> NOTES[Notes Service]
flowchart TD
A[User Query] --> B[classify_intent]
B --> C[run_retriever]
C --> D[run_repo_mapper]
D --> E[run_codewiki_mcp]
E --> F[run_file_explainer]
F --> G[run_diagram_generator]
G --> H[run_notes_generator]
H --> I[compose_response]
sequenceDiagram
participant U as User
participant F as Frontend
participant B as Backend API
participant G as LangGraph
participant V as ChromaDB
participant M as CodeWiki MCP
U->>F: Ask codebase question
F->>B: POST /api/chat
B->>G: invoke graph(repo_id, query)
G->>V: retrieve top-k chunks
G->>M: optional MCP query
G->>B: answer + refs + diagram + trace
B->>F: ChatResponse JSON
F->>U: Render answer, references, Mermaid
POST /api/repos/ingestGET /api/repos/{repo_id}POST /api/chatPOST /api/notes/generateGET /api/notes/{repo_id}GET /api/health
Copy from backend/.env.example.
Key values:
LLM_API_KEY: OpenAI-compatible API keyLLM_API_BASE: OpenAI-compatible endpointLLM_MODEL: model idMCP_ENABLED: enable/disable CodeWiki integrationMCP_CODEWIKI_ENDPOINT: MCP server endpointEMBEDDING_MODEL_NAME: sentence-transformers model
Copy from frontend/.env.example.
NEXT_PUBLIC_API_BASE_URL=http://localhost:8000
cd backend
cp .env.example .env
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
cp .env.example .env
npm install
npm run devFrontend runs at http://localhost:3000.
Main UI routes:
/landing + ingest/workspace/{repo_id}chat workspace/notes/{repo_id}premium 3D notes reader
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env
docker-compose up --build- Backend:
http://localhost:8000 - Frontend:
http://localhost:3000
cd backend
pytest -qCovered test areas:
- GitHub repo URL validation
- file filtering rules
- chunking behavior
- retrieval pipeline pass-through
- chat response schema typing
- MCP fallback handling
- "Explain how authentication works and cite files"
- "Give me the request flow from API route to database"
- "Create an onboarding path for a new engineer"
- "Generate architecture notes for key modules"
Add screenshots to frontend/public/screenshots/ and reference them here:
landing.pngworkspace-chat.pngworkspace-diagram.png
- If
LLM_API_KEYis missing, backend uses deterministic fallback text where possible. - If CodeWiki MCP is unavailable, chat still works with local RAG-only path.
- Repositories and vectors persist under
backend/data/.
- SSE streaming chat responses
- richer dependency graph extraction via tree-sitter + static analysis
- RBAC and multi-user workspaces
- async background ingest jobs with progress polling
- reranker model for improved retrieval precision