Engineering Codebase Intelligence Systems for Rapid FDE Onboarding.
The Legacy Code Cartographer / Brownfield Cartographer is a multi-agent system that ingests any GitHub repository or local path and produces a living, queryable knowledge graph of the system's architecture, data flows, and semantic structure.
- Structural Analysis (Surveyor Agent): Uses
tree-sitterfor language-agnostic AST parsing. Builds module import graphs, identifies architectural hubs (PageRank), and detects circular dependencies. - Data Lineage (Hydrologist Agent): Specialized for data engineering. Analyzes data flows across Python (pandas, PySpark), SQL (
sqlglot), and configuration boundaries. - Semantic Analysis (Semanticist Agent): Uses Gemini LLM to generate business-oriented purpose statements for every module.
- Semantic Search (Semantic Index): Vector-indexed knowledge base powered by Qdrant and Gemini embeddings.
- Living Context (Archivist Agent): Produces
CODEBASE.mdandonboarding_brief.mdfor instant architectural awareness. - Interactive REPL: Persistent command-line interface with
/commandsyntax for seamless workflow. - Web GUI: React-based dashboard with real-time WebSocket updates and interactive visualizations.
- uv for dependency management.
- Google Gemini API Key (
GEMINI_API_KEY). - Qdrant Cluster (Endpoint and API Key) - Optional for semantic search.
git clone <this-repo-url>
cd "The-Brownfield-Cartographer"
uv pip install -e .This installs all dependencies including:
- Core analysis libraries (networkx, tree-sitter, sqlglot)
- LLM integration (google-genai, openai)
- CLI tools (typer, rich)
- Web framework (fastapi, uvicorn)
- Frontend dependencies (React via npm)
Create a .env file in the root directory:
GEMINI_API_KEY=your_gemini_key
QDRANT_API_KEY=your_qdrant_key # Optional
QDRANT_CLUSTER_ENDPOINT=your_endpoint # Optionaluv run python cli.py
# Then use /config_show to view current settingsOr use the Typer CLI:
uv run python cartograph.py config set --key llm.provider --value gemini
uv run python cartograph.py config set --key llm.api_key --value YOUR_KEYConfiguration is stored in ~/.cartographer/config.json.
The Brownfield Cartographer offers three interfaces with 100% feature parity:
The persistent interactive session allows you to execute multiple commands without re-invoking Python:
uv run python cli.pyInteractive Commands:
Legacy-Code-Cartographer > /help # Show all commands
Legacy-Code-Cartographer > /list # List analyzed projects
Legacy-Code-Cartographer > /analyze ./my-repo # Analyze a repository
Legacy-Code-Cartographer > /summary my-repo # Show project summary
Legacy-Code-Cartographer > /map my-repo --view structure # Generate graphs
Legacy-Code-Cartographer > /artifacts my-repo # List artifact paths
Legacy-Code-Cartographer > /config_show # Display configuration
Legacy-Code-Cartographer > /whereami # Show current directory
Legacy-Code-Cartographer > /clear # Clear terminal
Legacy-Code-Cartographer > /exit # Exit session
Features:
- ✅ Persistent session state
- ✅ Direct service integration (no subprocess overhead)
- ✅ Robust error handling (Ctrl+C cancels operation, doesn't crash)
- ✅ Rich terminal output with colors and tables
- ✅
/commandsyntax for intuitive workflow
For scripting and automation, use the Typer-based CLI:
# Analyze a repository
uv run python cartograph.py analyze ./my-project
uv run python cartograph.py analyze https://github.com/user/repo.git
# List all projects
uv run python cartograph.py list
# View project summary
uv run python cartograph.py summary my-project --detailed
# Generate visualizations
uv run python cartograph.py map my-project --view structure
uv run python cartograph.py map my-project --view lineage --format json
# Manage configuration
uv run python cartograph.py config show
uv run python cartograph.py config set --key llm.model --value gemini-2.0-flash-expAll commands support:
--helpfor detailed usage--verbosefor detailed logging--fullfor non-incremental analysis
Launch the interactive web dashboard:
uv run python main.py guiFor launching in a new build
uv run python main.py gui --buildThen navigate to http://localhost:5001 in your browser.
GUI Features:
- 🎨 Interactive graph visualizations (pyvis-powered)
- 📊 Real-time analysis progress via WebSockets
- 📝 Rendered markdown documentation
- 🔍 Searchable semantic index
- 🗺️ Module structure and data lineage views
For backward compatibility, direct analysis is still supported:
# Analyze a GitHub repo
uv run python main.py ingest https://github.com/meltano/meltano
# Analyze a local path
uv run python main.py ingest /path/to/your/projectResults are stored in .cartography/<project_name>/.
All analysis results are stored in .cartography/<project_name>/:
.cartography/my-project/
├── knowledge_graph.json # Unified NetworkX graph (nodes + edges)
├── module_graph.html # Interactive module structure visualization
├── lineage_graph.html # Interactive data lineage visualization
├── CODEBASE.md # Comprehensive architectural documentation
├── ONBOARDING_BRIEF.md # Quick-start guide for new developers
├── semantic_index.json # Module metadata with LLM-generated purposes
└── trace.json # Analysis execution trace
Access artifacts via:
- REPL:
/artifacts <project> - CLI: Files are in
.cartography/<project>/ - GUI: Navigate to project dashboard
- Surveyor: Static structure analyst using tree-sitter AST parsing
- Hydrologist: Data flow & lineage analyst for Python/SQL
- Semanticist: LLM-powered semantic purpose generator
- Archivist: Living documentation maintainer
- Visualizer: Interactive graph renderer (pyvis)
- Navigator: Advanced query engine (LangGraph)
All business logic resides in src/core/:
CartographyService: Analysis orchestrationVisualizationService: Graph generationConfigService: Settings management
100% Feature Parity: GUI, REPL, and CLI all use the same core services, ensuring identical outputs.
By default, the system performs incremental analysis (only re-analyzes changed files):
# REPL
/analyze ./my-repo
# CLI
uv run python cartograph.py analyze ./my-repo --incrementalFor full re-analysis:
# REPL
/analyze ./my-repo --full
# CLI
uv run python cartograph.py analyze ./my-repo --full# CLI only
uv run python cartograph.py analyze ./my-repo --output /custom/path# REPL
/map my-project --view structure --format json
# CLI
uv run python cartograph.py map my-project --view structure --format json# Using CLI in a loop
for repo in repo1 repo2 repo3; do
uv run python cartograph.py analyze ./$repo
done# .github/workflows/analyze.yml
- name: Analyze Codebase
run: |
uv pip install -e .
uv run python cartograph.py analyze . --full
uv run python cartograph.py map $(basename $(pwd)) --view both --format jsonError: API key not configured
Solution:
# Via REPL
/config_show # Check current config
# Via CLI
uv run python cartograph.py config set --key llm.api_key --value YOUR_KEY
# Or set environment variable
export GEMINI_API_KEY=your_keyError: Project 'xyz' not found
Solution:
# List available projects
# REPL: /list
# CLI: uv run python cartograph.py list
# Analyze the project first
# REPL: /analyze ./path/to/project
# CLI: uv run python cartograph.py analyze ./path/to/projectThe project requires Python 3.13+ for optimal performance. Python 3.14 is fully supported.
python3 --version # Should be 3.13 or higherIf you see ModuleNotFoundError:
# Reinstall dependencies
uv pip install -e .
# Or use uv run to ensure correct environment
uv run python cli.pyIf the REPL appears frozen:
- Press
Ctrl+Cto cancel the current operation - Press
Ctrl+Dor type/exitto quit - Use
/clearto reset the terminal
- CLI Guide: See
CLI_README.mdfor detailed CLI documentation - Constitution: See
.agents/rules/constitution.mdfor architectural principles - Agent Rules: See
.agents/rules/agent.mdfor development guidelines - Implementation Summary: See
IMPLEMENTATION_SUMMARY.mdfor technical details
MIT License - see LICENSE for details.