Your Advanced AI Coding Assistant - Multi-provider AI integration, VCS intelligence, IDE integration, and enterprise features for modern development workflows.
For a detailed overview of the project, see Building AI Coding Assistants ISBN:979-8-9937022-0-9
- Intelligent Routing across Ollama, OpenAI, Anthropic, and Google AI
- Response Fusion with conflict resolution and consensus building
- Local Fine-Tuning and custom model deployment
- Cost Optimization with usage tracking and budget management
- Git Hooks Management with AI-powered validation
- CI/CD Pipeline Integration (GitHub, GitLab, Azure, CircleCI)
- Code Quality Tracking with regression analysis
- Automated Pull Request Review and commit message generation
- VS Code Extension with real-time AI assistance
- WebSocket Communication for live workspace analysis
- 8+ AI Providers integrated seamlessly
- Context-Aware Suggestions with intelligent code completion
- Distributed Processing for large codebases
- Advanced Caching with predictive optimization
- Security & Compliance with audit logging
- Performance Monitoring and analytics dashboard
- Quick Start
- Installation
- CLI Modes
- Multi-Provider AI Setup
- llama.cpp Setup
- VCS Intelligence
- IDE Integration
- Core Features
- Configuration
- Development
- Documentation
- Performance
- Security
# Install globally
npm install -g ollama-code
# Quick test
ollama-code ask "Explain async/await in TypeScript"
# Interactive setup
ollama-code --interactive# Configure multiple AI providers
ollama-code config set ai.providers.openai.apiKey "${OPENAI_API_KEY}"
ollama-code config set ai.providers.anthropic.apiKey "${ANTHROPIC_API_KEY}"
# Test intelligent routing
ollama-code fusion generate "Create a React authentication component"# Install Git hooks for AI validation
ollama-code setup-hooks --install-all
# Generate CI/CD pipeline
ollama-code generate-pipeline github --enable-quality-gates- Node.js ≥18.0.0
- Git (for VCS features)
- Ollama or llama.cpp (local AI models - see llama.cpp Setup)
- VS Code (for IDE integration)
npm install -g ollama-codenpm install ollama-codegit clone https://github.com/erichchampion/ollama-code.git
cd ollama-code
yarn install && yarn buildollama-code --version # Interactive selector
ollama-code-simple --version # Simple CLI mode
ollama-code-advanced --version # Advanced CLI mode
ollama-code-interactive --version # Interactive modeollama-code # Launches guided mode selection
DEBUG=enhanced-fast-path-router ollama-code --interactive🚀 Optimized Initialization - The interactive mode now features:
- Streaming Startup: Essential components load first, advanced features load in background
- Smart Component Loading: Only loads components needed for your specific requests
- 80% Faster Startup: Reduced initialization time from 8-15s to 1-3s
- Progressive Enhancement: Immediate basic functionality with continuous capability expansion
- Fallback Protection: Graceful degradation when components fail to load
- Real-time Status: See component loading progress with
/statuscommand - Performance Monitoring: Track system performance and optimization metrics
- Terminal Compatibility: Works in CI/CD, TTY, and non-interactive environments
- Background Loading: Heavy components load while you work
# Force legacy mode for testing/compatibility
OLLAMA_SKIP_ENHANCED_INIT=true ollama-code --interactive
# Enable debug logging for optimization
DEBUG=enhanced-fast-path-router ollama-code --interactive
# Silent mode for CI/CD environments
ollama-code --interactive --silent
# Configure logging level (default: ERROR for quiet operation)
LOG_LEVEL=0 ollama-code # DEBUG - Most verbose, shows all logs
LOG_LEVEL=1 ollama-code # INFO - Informational messages
LOG_LEVEL=2 ollama-code # WARN - Warning messages only
LOG_LEVEL=3 ollama-code # ERROR - Error messages only (default)
LOG_LEVEL=4 ollama-code # SILENT - No logsollama-code-simple ask "question"
ollama-code-simple list-models
ollama-code-simple --helpollama-code-advanced fusion generate "prompt"
ollama-code-advanced setup-hooks --install-all
ollama-code-advanced fine-tune train --base-model qwen2.5-coder:latest🚀 Optimized Advanced Mode - Now includes:
- Selective Loading: Only initializes components required by the specific command
- Background Preloading: Common components preload while executing commands
- Timeout Protection: All component initialization has timeout safeguards
- Legacy Fallback: Automatic fallback to legacy initialization if needed
- Ollama - Local models with fine-tuning (default)
- llama.cpp - Direct GGUF model inference via llama-server
- OpenAI - GPT models with cost optimization
- Anthropic - Claude models with enterprise features
- Google AI - Gemini with multimodal capabilities
# Configure all providers
ollama-code config set ai.providers.ollama.enabled true
ollama-code config set ai.providers.openai.enabled true
ollama-code config set ai.providers.anthropic.enabled true
ollama-code config set ai.providers.google.enabled true
# Set intelligent routing
ollama-code config set ai.routing.strategy "intelligent"
ollama-code config set ai.routing.weights.cost 0.3
ollama-code config set ai.routing.weights.speed 0.3
ollama-code config set ai.routing.weights.quality 0.4
# Enable response fusion
ollama-code config set ai.fusion.enabled true
ollama-code config set ai.fusion.strategy "consensus"# Fine-tune local models
ollama-code fine-tune train --dataset training_data.jsonl
# Deploy custom models
ollama-code deploy-model custom-model --load-balancer round-robin
# Response fusion for critical tasks
ollama-code fusion generate "complex prompt" --providers "ollama,openai,anthropic"
# Provider benchmarking
ollama-code benchmark-providers --task "code-generation" --iterations 10llama.cpp provides an alternative to Ollama for running local AI models. It allows you to run GGUF models directly via llama-server without needing Ollama installed.
- Direct GGUF Support - Run any GGUF model without conversion
- Lower Memory Overhead - More efficient than Ollama for single-model use
- Fine-Grained Control - Direct control over GPU layers, context size, and more
- No Additional Services - Just the model file and llama-server
# macOS (Homebrew)
brew install llama.cpp
# Build from source
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make -j
# The server binary is at ./llama-server (or build/bin/llama-server)# Example: Qwen 2.5 Coder (recommended for coding tasks)
# Download from Hugging Face: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF
# Or use any GGUF model compatible with llama.cpp# Required: Set the provider and model path
export AI_PROVIDER=llamacpp
export LLAMACPP_MODEL_PATH=~/models/qwen2.5-coder-7b-instruct-q4_k_m.gguf
# Optional: Custom server URL (default: http://localhost:8080)
export LLAMACPP_API_URL=http://localhost:8080
# Optional: Performance tuning
export LLAMACPP_GPU_LAYERS=-1 # -1 = auto (use all GPU layers)
export LLAMACPP_CONTEXT_SIZE=8192 # Context window size
export LLAMACPP_FLASH_ATTENTION=true # Enable flash attention
export LLAMACPP_THREADS=8 # CPU threads for inference
# Then run ollama-code
ollama-code --interactiveAdd to your .ollama-code.json or ~/.config/ollama-code/config.json:
{
"provider": "llamacpp",
"llamacpp": {
"enabled": true,
"baseUrl": "http://localhost:8080",
"modelPath": "~/models/qwen2.5-coder-7b-instruct-q4_k_m.gguf",
"contextSize": 8192,
"gpuLayers": -1,
"flashAttention": true
}
}# Start llama-server manually with your preferred settings
llama-server \
-m ~/models/qwen2.5-coder-7b-instruct-q4_k_m.gguf \
--port 8080 \
-c 8192 \
-ngl -1 \
--flash-attn
# Then configure ollama-code to use it
export AI_PROVIDER=llamacpp
export LLAMACPP_API_URL=http://localhost:8080
ollama-code --interactive# Check llama-server status
ollama-code llamacpp-status
# Show current provider configuration
ollama-code show-provider
# Switch between providers
ollama-code set-provider llamacpp
ollama-code set-provider ollama
# Get help loading a model
ollama-code llamacpp-load ~/models/model.gguf| Variable | Description | Default |
|---|---|---|
AI_PROVIDER |
Active provider (ollama, llamacpp, openai, etc.) |
ollama |
LLAMACPP_API_URL |
llama-server URL | http://localhost:8080 |
LLAMACPP_MODEL_PATH |
Path to GGUF model file | None |
LLAMACPP_EXECUTABLE |
Path to llama-server binary | Auto-detect |
LLAMACPP_GPU_LAYERS |
GPU layers to offload (-1 = all) | -1 |
LLAMACPP_CONTEXT_SIZE |
Context window size | 4096 |
LLAMACPP_FLASH_ATTENTION |
Enable flash attention | false |
LLAMACPP_THREADS |
CPU threads for inference | Auto |
LLAMACPP_PARALLEL |
Parallel sequences | 1 |
LLAMACPP_ENABLED |
Enable llama.cpp provider | false |
- Qwen 2.5 Coder (7B/14B) - Excellent for code generation and analysis
- DeepSeek Coder (6.7B/33B) - Strong coding performance
- CodeLlama (7B/13B/34B) - Meta's code-specialized model
- StarCoder2 (3B/7B/15B) - Good balance of size and capability
# Check if llama-server is installed
which llama-server
# Start manually to see errors
llama-server -m /path/to/model.gguf --port 8080 -v# Reduce GPU layers (use more CPU)
export LLAMACPP_GPU_LAYERS=20 # Only offload 20 layers to GPU
# Use a smaller quantization (q4_k_m instead of q8_0)# Enable GPU acceleration
export LLAMACPP_GPU_LAYERS=-1
# Enable flash attention
export LLAMACPP_FLASH_ATTENTION=true
# Increase threads for CPU inference
export LLAMACPP_THREADS=8# Install AI-powered Git hooks
ollama-code setup-hooks --install-all
# Configure quality thresholds
ollama-code config set vcs.hooks.qualityThreshold 0.8
ollama-code config set vcs.hooks.enableCommitValidation true
# Test hooks
git commit -m "test commit" # Triggers AI validation# Generate GitHub Actions workflow
ollama-code generate-pipeline github \
--enable-quality-gates \
--enable-security-analysis \
--enable-performance-analysis
# Generate GitLab CI configuration
ollama-code generate-pipeline gitlab --enable-regression-analysis
# Universal CI API for multi-platform support
ollama-code config set cicd.platform "github"
ollama-code config set cicd.qualityGates.minScore 85# Initialize quality tracking
ollama-code init-quality-tracking --baseline
# Run comprehensive analysis
ollama-code analyze-quality --full-report
# Generate quality dashboard
ollama-code quality-dashboard --format html
# Regression analysis
ollama-code analyze-regression --compare-branch feature/new-feature# Install from VS Code Marketplace
# Search for "Ollama Code" in Extensions
# Or install from VSIX
code --install-extension ollama-code.vsix
# Development installation
cd extensions/vscode && yarn install && yarn build- Inline Code Completion with AI suggestions
- Code Actions for AI-powered quick fixes
- Hover Information with intelligent context
- Real-time Diagnostics and error detection
- Workspace Analysis with live updates
# Start WebSocket server for real-time integration
ollama-code-advanced --enable-websocket --port 3002
# Configure in VS Code settings.json
{
"ollamaCode.serverPort": 3002,
"ollamaCode.enableInlineCompletion": true,
"ollamaCode.realTimeAnalysis": true
}# Code assistance
ollama-code ask "How to implement OAuth2?"
ollama-code explain src/auth.ts
ollama-code fix src/buggy-file.js
ollama-code refactor src/legacy-code.js
# Code generation
ollama-code generate class UserAuth --language typescript
ollama-code generate tests src/utils.js
ollama-code generate docs src/api/
# Analysis and review
ollama-code analyze-architecture --format detailed
ollama-code review-code --provider anthropic src/
ollama-code security-audit src/ --comprehensive# List and manage models
ollama-code list-models
ollama-code pull-model qwen2.5-coder:latest
ollama-code set-model qwen2.5-coder:latest
# Model performance testing
ollama-code test-model qwen2.5-coder:latest --benchmark
ollama-code compare-models --models "codellama:7b,qwen2.5-coder:latest"# Project analysis
ollama-code analyze-project --depth comprehensive
ollama-code workspace-insights --format json
# File operations
ollama-code search "authentication" --type function
ollama-code edit src/config.ts --ai-assisted
ollama-code optimize-imports src/ --language typescript{
"ai": {
"defaultProvider": "ollama",
"defaultModel": "qwen2.5-coder:latest",
"defaultTemperature": 0.7,
"providers": {
"ollama": {
"enabled": true,
"baseUrl": "http://localhost:11434"
},
"openai": {
"enabled": true,
"apiKey": "${OPENAI_API_KEY}",
"models": ["gpt-4", "gpt-3.5-turbo"]
}
},
"routing": {
"strategy": "intelligent",
"weights": {
"cost": 0.3,
"speed": 0.3,
"quality": 0.4
}
}
},
"vcs": {
"hooks": {
"enableCommitValidation": true,
"qualityThreshold": 0.8
},
"cicd": {
"platform": "github",
"enableQualityGates": true
}
}
}# View configuration
ollama-code config view
ollama-code config view --section ai.providers
# Set configuration
ollama-code config set ai.defaultProvider "openai"
ollama-code config set vcs.hooks.enableCommitValidation true
# Reset configuration
ollama-code config reset
ollama-code config reset --section ai.providers.openai# Clone and install
git clone https://github.com/erichchampion/ollama-code.git
cd ollama-code
yarn install
# Build and test
yarn build
yarn test:all
yarn docs:generate# Core development
yarn dev # Development mode with ts-node
yarn build # Compile TypeScript
yarn test # Unit tests
yarn lint # ESLint
yarn clean # Remove build artifacts
# Testing
yarn test # Main test suite (fast, stable tests only)
yarn test:ci # CI-friendly test suite (excludes performance tests)
yarn test:unit # Unit tests only
yarn test:integration # All integration tests
yarn test:integration:other # Non-performance integration tests
# Performance testing (resource-intensive, run separately)
yarn test:performance # All performance tests (unit + integration)
yarn test:performance:unit # Performance-sensitive unit tests
yarn test:integration:performance # Performance integration tests
yarn test:integration:optimization-migration # Optimization migration tests
# Other test suites
yarn test:e2e # End-to-end tests with Playwright
yarn test:docs # Documentation tests
yarn test:security # Security tests
yarn test:all # All tests in recommended order (CI + performance + e2e)
yarn test:all:full # All tests in parallel (may have flaky failures)
# Documentation
yarn docs:generate # TypeDoc API documentation
yarn docs:watch # Watch and regenerate
yarn docs:validate # Validate links and examples
yarn docs:check-all # Complete validationsrc/
├── ai/ # Multi-provider AI system
│ ├── providers/ # AI provider implementations
│ ├── vcs/ # VCS intelligence features
│ └── performance/ # Performance optimization
├── commands/ # CLI command system
├── config/ # Configuration management
├── terminal/ # Terminal interface
├── utils/ # Shared utilities
├── cli-selector.ts # Interactive mode selector
├── simple-cli.ts # Simple CLI mode
└── cli.ts # Advanced CLI mode
extensions/
└── vscode/ # VS Code extension
├── src/providers/ # Language providers
├── src/services/ # Extension services
└── src/client/ # WebSocket client
docs/
├── API_REFERENCE.md # Complete API documentation
├── CONFIGURATION.md # Configuration guide
├── ARCHITECTURE.md # System architecture
└── OLLAMA.md # Setup and integration guide
- API Reference - All 50+ commands with examples
- Configuration Guide - Complete configuration options
- Architecture - System architecture and design
- Setup Guide - Installation and integration
- TypeDoc API Docs - Generated from TypeScript source
- GitHub Actions - Automated documentation updates
- Link Validation - Automated link checking and validation
# Generate all documentation
yarn docs:generate-all
# TypeDoc API documentation
yarn docs:generate
# Validate documentation quality
yarn docs:check-all
# Watch for changes
yarn docs:watch- Streaming Initialization: Essential components load first, advanced features in background
- Lazy Component Loading: Components load only when needed for specific requests
- Progressive Enhancement: Immediate functionality with continuous capability expansion
- Smart Dependency Management: Eliminates circular dependencies and recursive loading
- Terminal Compatibility: Optimized for CI/CD, TTY, and non-interactive environments
- 80% Faster Startup: Interactive mode now starts in 1-3s (previously 8-15s)
- 95%+ Success Rate: Robust initialization with fallback protection
- Memory Efficient: Only loads required components, reducing memory usage by 60%
- Background Loading: Heavy components load while you work on immediate tasks
- Distributed Processing for large codebases (10,000+ files)
- Predictive AI Caching with multi-tier strategy
- Incremental Analysis with file watching
- Memory Optimization with automatic cleanup
- Component Status Monitoring: Real-time health checks and performance metrics
- Interactive Startup: 1-3s (optimized) vs 8-15s (legacy)
- Advanced Mode: < 2s for simple commands, < 5s for complex operations
- Command Response: < 100ms for basic commands
- AI Processing: Variable (2-30s) based on model and complexity
- Large Codebase: Handles 10,000+ files efficiently
- Component Loading: Essential components ready in < 1s, full system in < 5s
# Configure for large repositories
ollama-code config set performance.largeCodebase.enabled true
ollama-code config set performance.distributed.maxWorkers 8
# Enable predictive caching
ollama-code config set performance.predictiveCache.enabled true
# Monitor performance (interactive mode)
/status # Component status and health
/performance # Performance metrics
/metrics # Detailed system metrics
# Performance monitoring (CLI)
ollama-code performance-dashboard --port 8080
ollama-code monitor-resources --interval 60
# Optimization controls
DEBUG=enhanced-fast-path-router ollama-code --interactive # Debug mode
OLLAMA_SKIP_ENHANCED_INIT=true ollama-code --interactive # Legacy mode
ollama-code --interactive --silent # Silent mode# In interactive mode, use these commands:
/status # Show component loading status
/status --detailed # Detailed component information
/status --json # JSON format for automation
/performance # Performance metrics and recommendations
/metrics --export # Export metrics for analysis- Local Processing - All AI processing via local Ollama
- No Data Transmission - Code never leaves your machine
- Optional Cloud Providers - User-controlled API integration
- Audit Logging - Comprehensive activity tracking
- Input Validation - Zod schema validation for all inputs
- Path Traversal Protection - Secure file access controls
- Command Sanitization - Safe command execution
- API Key Management - Secure credential storage
- Type Safety - TypeScript strict mode throughout
# Configure audit logging
ollama-code config set security.audit.enabled true
ollama-code config set security.audit.retentionDays 90
# Access control
ollama-code config set security.access.requireAuth true
# Compliance features
ollama-code config set compliance.gdpr.enabled trueWe welcome contributions! Please follow the contribution steps below:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Update documentation
- Submit a pull request
- Follow TypeScript best practices
- Add comprehensive tests
- Update documentation for API changes
- Ensure all CI checks pass
This project is licensed under the MIT License - see the LICENSE.md file for details.
- Privacy-Focused - Your code stays local
- Multi-Provider Flexibility - Choose the best AI for each task
- IDE Integration - Seamless VS Code experience
- Git Intelligence - AI-powered version control
- Code Quality Automation - Consistent quality enforcement
- CI/CD Integration - Automated pipeline generation
- Collaboration Tools - Shared configurations and workflows
- Performance Analytics - Team productivity insights
- Scalable Architecture - Handles large codebases efficiently
- Security & Compliance - Enterprise-grade security features
- Cost Management - AI usage tracking and optimization
- Custom Deployments - Fine-tuned models for your domain
Get Started Today - Transform your development workflow with intelligent AI assistance.
npm install -g ollama-code
ollama-code --interactiveBuilt with ❤️ using TypeScript, Node.js, and the power of local AI.