Ollama Code CLI

Your Advanced AI Coding Assistant - Multi-provider AI integration, VCS intelligence, IDE integration, and enterprise features for modern development workflows.

For a detailed overview of the project, see Building AI Coding Assistants ISBN:979-8-9937022-0-9

🚀 Features Overview

🤖 Multi-Provider AI Integration

Intelligent Routing across Ollama, OpenAI, Anthropic, and Google AI
Response Fusion with conflict resolution and consensus building
Local Fine-Tuning and custom model deployment
Cost Optimization with usage tracking and budget management

🔧 VCS Intelligence

Git Hooks Management with AI-powered validation
CI/CD Pipeline Integration (GitHub, GitLab, Azure, CircleCI)
Code Quality Tracking with regression analysis
Automated Pull Request Review and commit message generation

💻 IDE Integration

VS Code Extension with real-time AI assistance
WebSocket Communication for live workspace analysis
8+ AI Providers integrated seamlessly
Context-Aware Suggestions with intelligent code completion

🏢 Enterprise Features

Distributed Processing for large codebases
Advanced Caching with predictive optimization
Security & Compliance with audit logging
Performance Monitoring and analytics dashboard

⚡ Quick Start

Basic Installation

# Install globally
npm install -g ollama-code

# Quick test
ollama-code ask "Explain async/await in TypeScript"

# Interactive setup
ollama-code --interactive

Multi-Provider Setup

# Configure multiple AI providers
ollama-code config set ai.providers.openai.apiKey "${OPENAI_API_KEY}"
ollama-code config set ai.providers.anthropic.apiKey "${ANTHROPIC_API_KEY}"

# Test intelligent routing
ollama-code fusion generate "Create a React authentication component"

VCS Intelligence Setup

# Install Git hooks for AI validation
ollama-code setup-hooks --install-all

# Generate CI/CD pipeline
ollama-code generate-pipeline github --enable-quality-gates

📦 Installation

Prerequisites

Node.js ≥18.0.0
Git (for VCS features)
Ollama or llama.cpp (local AI models - see llama.cpp Setup)
VS Code (for IDE integration)

Installation Methods

1. Global Installation (Recommended)

npm install -g ollama-code

2. Local Project Installation

npm install ollama-code

3. Development Installation

git clone https://github.com/erichchampion/ollama-code.git
cd ollama-code
yarn install && yarn build

Verify Installation

ollama-code --version              # Interactive selector
ollama-code-simple --version       # Simple CLI mode
ollama-code-advanced --version     # Advanced CLI mode
ollama-code-interactive --version  # Interactive mode

🎯 CLI Modes

Interactive Mode Selector (Default)

ollama-code                    # Launches guided mode selection
DEBUG=enhanced-fast-path-router ollama-code --interactive

🚀 Optimized Initialization - The interactive mode now features:

Streaming Startup: Essential components load first, advanced features load in background
Smart Component Loading: Only loads components needed for your specific requests
80% Faster Startup: Reduced initialization time from 8-15s to 1-3s
Progressive Enhancement: Immediate basic functionality with continuous capability expansion
Fallback Protection: Graceful degradation when components fail to load

Interactive Mode Features

Real-time Status: See component loading progress with /status command
Performance Monitoring: Track system performance and optimization metrics
Terminal Compatibility: Works in CI/CD, TTY, and non-interactive environments
Background Loading: Heavy components load while you work

Environment Variables

# Force legacy mode for testing/compatibility
OLLAMA_SKIP_ENHANCED_INIT=true ollama-code --interactive

# Enable debug logging for optimization
DEBUG=enhanced-fast-path-router ollama-code --interactive

# Silent mode for CI/CD environments
ollama-code --interactive --silent

# Configure logging level (default: ERROR for quiet operation)
LOG_LEVEL=0 ollama-code    # DEBUG - Most verbose, shows all logs
LOG_LEVEL=1 ollama-code    # INFO - Informational messages
LOG_LEVEL=2 ollama-code    # WARN - Warning messages only
LOG_LEVEL=3 ollama-code    # ERROR - Error messages only (default)
LOG_LEVEL=4 ollama-code    # SILENT - No logs

Simple CLI Mode

ollama-code-simple ask "question"
ollama-code-simple list-models
ollama-code-simple --help

Advanced CLI Mode

ollama-code-advanced fusion generate "prompt"
ollama-code-advanced setup-hooks --install-all
ollama-code-advanced fine-tune train --base-model qwen2.5-coder:latest

🚀 Optimized Advanced Mode - Now includes:

Selective Loading: Only initializes components required by the specific command
Background Preloading: Common components preload while executing commands
Timeout Protection: All component initialization has timeout safeguards
Legacy Fallback: Automatic fallback to legacy initialization if needed

🤖 Multi-Provider AI Setup

Supported Providers

Ollama - Local models with fine-tuning (default)
llama.cpp - Direct GGUF model inference via llama-server
OpenAI - GPT models with cost optimization
Anthropic - Claude models with enterprise features
Google AI - Gemini with multimodal capabilities

Configuration Example

# Configure all providers
ollama-code config set ai.providers.ollama.enabled true
ollama-code config set ai.providers.openai.enabled true
ollama-code config set ai.providers.anthropic.enabled true
ollama-code config set ai.providers.google.enabled true

# Set intelligent routing
ollama-code config set ai.routing.strategy "intelligent"
ollama-code config set ai.routing.weights.cost 0.3
ollama-code config set ai.routing.weights.speed 0.3
ollama-code config set ai.routing.weights.quality 0.4

# Enable response fusion
ollama-code config set ai.fusion.enabled true
ollama-code config set ai.fusion.strategy "consensus"

Advanced Features

# Fine-tune local models
ollama-code fine-tune train --dataset training_data.jsonl

# Deploy custom models
ollama-code deploy-model custom-model --load-balancer round-robin

# Response fusion for critical tasks
ollama-code fusion generate "complex prompt" --providers "ollama,openai,anthropic"

# Provider benchmarking
ollama-code benchmark-providers --task "code-generation" --iterations 10

🦙 llama.cpp Setup

llama.cpp provides an alternative to Ollama for running local AI models. It allows you to run GGUF models directly via llama-server without needing Ollama installed.

Why Use llama.cpp?

Direct GGUF Support - Run any GGUF model without conversion
Lower Memory Overhead - More efficient than Ollama for single-model use
Fine-Grained Control - Direct control over GPU layers, context size, and more
No Additional Services - Just the model file and llama-server

Installation

1. Install llama.cpp

# macOS (Homebrew)
brew install llama.cpp

# Build from source
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make -j

# The server binary is at ./llama-server (or build/bin/llama-server)

2. Download a GGUF Model

# Example: Qwen 2.5 Coder (recommended for coding tasks)
# Download from Hugging Face: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF

# Or use any GGUF model compatible with llama.cpp

Configuration

Option 1: Environment Variables

# Required: Set the provider and model path
export AI_PROVIDER=llamacpp
export LLAMACPP_MODEL_PATH=~/models/qwen2.5-coder-7b-instruct-q4_k_m.gguf

# Optional: Custom server URL (default: http://localhost:8080)
export LLAMACPP_API_URL=http://localhost:8080

# Optional: Performance tuning
export LLAMACPP_GPU_LAYERS=-1        # -1 = auto (use all GPU layers)
export LLAMACPP_CONTEXT_SIZE=8192    # Context window size
export LLAMACPP_FLASH_ATTENTION=true # Enable flash attention
export LLAMACPP_THREADS=8            # CPU threads for inference

# Then run ollama-code
ollama-code --interactive

Option 2: Configuration File

Add to your .ollama-code.json or ~/.config/ollama-code/config.json:

{
  "provider": "llamacpp",
  "llamacpp": {
    "enabled": true,
    "baseUrl": "http://localhost:8080",
    "modelPath": "~/models/qwen2.5-coder-7b-instruct-q4_k_m.gguf",
    "contextSize": 8192,
    "gpuLayers": -1,
    "flashAttention": true
  }
}

Option 3: Manual Server Start

# Start llama-server manually with your preferred settings
llama-server \
  -m ~/models/qwen2.5-coder-7b-instruct-q4_k_m.gguf \
  --port 8080 \
  -c 8192 \
  -ngl -1 \
  --flash-attn

# Then configure ollama-code to use it
export AI_PROVIDER=llamacpp
export LLAMACPP_API_URL=http://localhost:8080
ollama-code --interactive

Provider Commands

# Check llama-server status
ollama-code llamacpp-status

# Show current provider configuration
ollama-code show-provider

# Switch between providers
ollama-code set-provider llamacpp
ollama-code set-provider ollama

# Get help loading a model
ollama-code llamacpp-load ~/models/model.gguf

Environment Variables Reference

Variable	Description	Default
`AI_PROVIDER`	Active provider (`ollama`, `llamacpp`, `openai`, etc.)	`ollama`
`LLAMACPP_API_URL`	llama-server URL	`http://localhost:8080`
`LLAMACPP_MODEL_PATH`	Path to GGUF model file	None
`LLAMACPP_EXECUTABLE`	Path to llama-server binary	Auto-detect
`LLAMACPP_GPU_LAYERS`	GPU layers to offload (-1 = all)	`-1`
`LLAMACPP_CONTEXT_SIZE`	Context window size	`4096`
`LLAMACPP_FLASH_ATTENTION`	Enable flash attention	`false`
`LLAMACPP_THREADS`	CPU threads for inference	Auto
`LLAMACPP_PARALLEL`	Parallel sequences	`1`
`LLAMACPP_ENABLED`	Enable llama.cpp provider	`false`

Recommended Models for Coding

Qwen 2.5 Coder (7B/14B) - Excellent for code generation and analysis
DeepSeek Coder (6.7B/33B) - Strong coding performance
CodeLlama (7B/13B/34B) - Meta's code-specialized model
StarCoder2 (3B/7B/15B) - Good balance of size and capability

Troubleshooting

Server Not Starting

# Check if llama-server is installed
which llama-server

# Start manually to see errors
llama-server -m /path/to/model.gguf --port 8080 -v

Out of Memory

# Reduce GPU layers (use more CPU)
export LLAMACPP_GPU_LAYERS=20  # Only offload 20 layers to GPU

# Use a smaller quantization (q4_k_m instead of q8_0)

Slow Performance

# Enable GPU acceleration
export LLAMACPP_GPU_LAYERS=-1

# Enable flash attention
export LLAMACPP_FLASH_ATTENTION=true

# Increase threads for CPU inference
export LLAMACPP_THREADS=8

🔧 VCS Intelligence

Git Hooks Management

# Install AI-powered Git hooks
ollama-code setup-hooks --install-all

# Configure quality thresholds
ollama-code config set vcs.hooks.qualityThreshold 0.8
ollama-code config set vcs.hooks.enableCommitValidation true

# Test hooks
git commit -m "test commit"  # Triggers AI validation

CI/CD Pipeline Integration

# Generate GitHub Actions workflow
ollama-code generate-pipeline github \
  --enable-quality-gates \
  --enable-security-analysis \
  --enable-performance-analysis

# Generate GitLab CI configuration
ollama-code generate-pipeline gitlab --enable-regression-analysis

# Universal CI API for multi-platform support
ollama-code config set cicd.platform "github"
ollama-code config set cicd.qualityGates.minScore 85

Code Quality Tracking

# Initialize quality tracking
ollama-code init-quality-tracking --baseline

# Run comprehensive analysis
ollama-code analyze-quality --full-report

# Generate quality dashboard
ollama-code quality-dashboard --format html

# Regression analysis
ollama-code analyze-regression --compare-branch feature/new-feature

💻 IDE Integration

VS Code Extension

# Install from VS Code Marketplace
# Search for "Ollama Code" in Extensions

# Or install from VSIX
code --install-extension ollama-code.vsix

# Development installation
cd extensions/vscode && yarn install && yarn build

Real-time Features

Inline Code Completion with AI suggestions
Code Actions for AI-powered quick fixes
Hover Information with intelligent context
Real-time Diagnostics and error detection
Workspace Analysis with live updates

WebSocket Server

# Start WebSocket server for real-time integration
ollama-code-advanced --enable-websocket --port 3002

# Configure in VS Code settings.json
{
  "ollamaCode.serverPort": 3002,
  "ollamaCode.enableInlineCompletion": true,
  "ollamaCode.realTimeAnalysis": true
}

🎨 Core Features

AI-Powered Commands

# Code assistance
ollama-code ask "How to implement OAuth2?"
ollama-code explain src/auth.ts
ollama-code fix src/buggy-file.js
ollama-code refactor src/legacy-code.js

# Code generation
ollama-code generate class UserAuth --language typescript
ollama-code generate tests src/utils.js
ollama-code generate docs src/api/

# Analysis and review
ollama-code analyze-architecture --format detailed
ollama-code review-code --provider anthropic src/
ollama-code security-audit src/ --comprehensive

Model Management

# List and manage models
ollama-code list-models
ollama-code pull-model qwen2.5-coder:latest
ollama-code set-model qwen2.5-coder:latest

# Model performance testing
ollama-code test-model qwen2.5-coder:latest --benchmark
ollama-code compare-models --models "codellama:7b,qwen2.5-coder:latest"

Workspace Operations

# Project analysis
ollama-code analyze-project --depth comprehensive
ollama-code workspace-insights --format json

# File operations
ollama-code search "authentication" --type function
ollama-code edit src/config.ts --ai-assisted
ollama-code optimize-imports src/ --language typescript

⚙️ Configuration

Hierarchical Configuration System

{
  "ai": {
    "defaultProvider": "ollama",
    "defaultModel": "qwen2.5-coder:latest",
    "defaultTemperature": 0.7,
    "providers": {
      "ollama": {
        "enabled": true,
        "baseUrl": "http://localhost:11434"
      },
      "openai": {
        "enabled": true,
        "apiKey": "${OPENAI_API_KEY}",
        "models": ["gpt-4", "gpt-3.5-turbo"]
      }
    },
    "routing": {
      "strategy": "intelligent",
      "weights": {
        "cost": 0.3,
        "speed": 0.3,
        "quality": 0.4
      }
    }
  },
  "vcs": {
    "hooks": {
      "enableCommitValidation": true,
      "qualityThreshold": 0.8
    },
    "cicd": {
      "platform": "github",
      "enableQualityGates": true
    }
  }
}

Configuration Commands

# View configuration
ollama-code config view
ollama-code config view --section ai.providers

# Set configuration
ollama-code config set ai.defaultProvider "openai"
ollama-code config set vcs.hooks.enableCommitValidation true

# Reset configuration
ollama-code config reset
ollama-code config reset --section ai.providers.openai

🛠️ Development

Setup Development Environment

# Clone and install
git clone https://github.com/erichchampion/ollama-code.git
cd ollama-code
yarn install

# Build and test
yarn build
yarn test:all
yarn docs:generate

Development Commands

# Core development
yarn dev                      # Development mode with ts-node
yarn build                    # Compile TypeScript
yarn test                     # Unit tests
yarn lint                     # ESLint
yarn clean                    # Remove build artifacts

# Testing
yarn test                     # Main test suite (fast, stable tests only)
yarn test:ci                  # CI-friendly test suite (excludes performance tests)
yarn test:unit                # Unit tests only
yarn test:integration         # All integration tests
yarn test:integration:other   # Non-performance integration tests

# Performance testing (resource-intensive, run separately)
yarn test:performance         # All performance tests (unit + integration)
yarn test:performance:unit    # Performance-sensitive unit tests
yarn test:integration:performance        # Performance integration tests
yarn test:integration:optimization-migration  # Optimization migration tests

# Other test suites
yarn test:e2e                 # End-to-end tests with Playwright
yarn test:docs                # Documentation tests
yarn test:security            # Security tests
yarn test:all                 # All tests in recommended order (CI + performance + e2e)
yarn test:all:full            # All tests in parallel (may have flaky failures)

# Documentation
yarn docs:generate            # TypeDoc API documentation
yarn docs:watch               # Watch and regenerate
yarn docs:validate            # Validate links and examples
yarn docs:check-all          # Complete validation

Project Architecture

src/
├── ai/                       # Multi-provider AI system
│   ├── providers/           # AI provider implementations
│   ├── vcs/                 # VCS intelligence features
│   └── performance/         # Performance optimization
├── commands/                 # CLI command system
├── config/                   # Configuration management
├── terminal/                 # Terminal interface
├── utils/                    # Shared utilities
├── cli-selector.ts          # Interactive mode selector
├── simple-cli.ts            # Simple CLI mode
└── cli.ts                   # Advanced CLI mode

extensions/
└── vscode/                   # VS Code extension
    ├── src/providers/       # Language providers
    ├── src/services/        # Extension services
    └── src/client/          # WebSocket client

docs/
├── API_REFERENCE.md         # Complete API documentation
├── CONFIGURATION.md         # Configuration guide
├── ARCHITECTURE.md          # System architecture
└── OLLAMA.md               # Setup and integration guide

📚 Documentation

Complete Documentation Suite

API Reference - All 50+ commands with examples
Configuration Guide - Complete configuration options
Architecture - System architecture and design
Setup Guide - Installation and integration

Auto-Generated Documentation

TypeDoc API Docs - Generated from TypeScript source
GitHub Actions - Automated documentation updates
Link Validation - Automated link checking and validation

Documentation Commands

# Generate all documentation
yarn docs:generate-all

# TypeDoc API documentation
yarn docs:generate

# Validate documentation quality
yarn docs:check-all

# Watch for changes
yarn docs:watch

⚡ Performance

🚀 Enhanced Optimization System

Streaming Initialization: Essential components load first, advanced features in background
Lazy Component Loading: Components load only when needed for specific requests
Progressive Enhancement: Immediate functionality with continuous capability expansion
Smart Dependency Management: Eliminates circular dependencies and recursive loading
Terminal Compatibility: Optimized for CI/CD, TTY, and non-interactive environments

Performance Improvements

80% Faster Startup: Interactive mode now starts in 1-3s (previously 8-15s)
95%+ Success Rate: Robust initialization with fallback protection
Memory Efficient: Only loads required components, reducing memory usage by 60%
Background Loading: Heavy components load while you work on immediate tasks

Enterprise-Scale Performance

Distributed Processing for large codebases (10,000+ files)
Predictive AI Caching with multi-tier strategy
Incremental Analysis with file watching
Memory Optimization with automatic cleanup
Component Status Monitoring: Real-time health checks and performance metrics

Performance Metrics

Interactive Startup: 1-3s (optimized) vs 8-15s (legacy)
Advanced Mode: < 2s for simple commands, < 5s for complex operations
Command Response: < 100ms for basic commands
AI Processing: Variable (2-30s) based on model and complexity
Large Codebase: Handles 10,000+ files efficiently
Component Loading: Essential components ready in < 1s, full system in < 5s

Optimization Features

# Configure for large repositories
ollama-code config set performance.largeCodebase.enabled true
ollama-code config set performance.distributed.maxWorkers 8

# Enable predictive caching
ollama-code config set performance.predictiveCache.enabled true

# Monitor performance (interactive mode)
/status                                    # Component status and health
/performance                               # Performance metrics
/metrics                                   # Detailed system metrics

# Performance monitoring (CLI)
ollama-code performance-dashboard --port 8080
ollama-code monitor-resources --interval 60

# Optimization controls
DEBUG=enhanced-fast-path-router ollama-code --interactive    # Debug mode
OLLAMA_SKIP_ENHANCED_INIT=true ollama-code --interactive     # Legacy mode
ollama-code --interactive --silent                          # Silent mode

Component Status Commands

# In interactive mode, use these commands:
/status              # Show component loading status
/status --detailed   # Detailed component information
/status --json       # JSON format for automation
/performance         # Performance metrics and recommendations
/metrics --export    # Export metrics for analysis

🔒 Security

Privacy-First Architecture

Local Processing - All AI processing via local Ollama
No Data Transmission - Code never leaves your machine
Optional Cloud Providers - User-controlled API integration
Audit Logging - Comprehensive activity tracking

Security Features

Input Validation - Zod schema validation for all inputs
Path Traversal Protection - Secure file access controls
Command Sanitization - Safe command execution
API Key Management - Secure credential storage
Type Safety - TypeScript strict mode throughout

Enterprise Security

# Configure audit logging
ollama-code config set security.audit.enabled true
ollama-code config set security.audit.retentionDays 90

# Access control
ollama-code config set security.access.requireAuth true

# Compliance features
ollama-code config set compliance.gdpr.enabled true

🤝 Contributing

We welcome contributions! Please follow the contribution steps below:

Quick Contribution Steps

Fork the repository
Create a feature branch
Make your changes with tests
Update documentation
Submit a pull request

Development Guidelines

Follow TypeScript best practices
Add comprehensive tests
Update documentation for API changes
Ensure all CI checks pass

📄 License

This project is licensed under the MIT License - see the LICENSE.md file for details.

🌟 Why Ollama Code CLI?

For Individual Developers

Privacy-Focused - Your code stays local
Multi-Provider Flexibility - Choose the best AI for each task
IDE Integration - Seamless VS Code experience
Git Intelligence - AI-powered version control

For Teams

Code Quality Automation - Consistent quality enforcement
CI/CD Integration - Automated pipeline generation
Collaboration Tools - Shared configurations and workflows
Performance Analytics - Team productivity insights

For Enterprises

Scalable Architecture - Handles large codebases efficiently
Security & Compliance - Enterprise-grade security features
Cost Management - AI usage tracking and optimization
Custom Deployments - Fine-tuned models for your domain

Get Started Today - Transform your development workflow with intelligent AI assistance.

npm install -g ollama-code
ollama-code --interactive

Built with ❤️ using TypeScript, Node.js, and the power of local AI.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
book/md		book/md
docs		docs
extensions/vscode		extensions/vscode
scripts		scripts
src		src
tests		tests
.babelrc		.babelrc
.gitignore		.gitignore
README.md		README.md
azure-pipelines.yml		azure-pipelines.yml
bitbucket-pipelines.yml		bitbucket-pipelines.yml
jest.config.js		jest.config.js
package.json		package.json
playwright.config.ts		playwright.config.ts
test-command-execution.txt		test-command-execution.txt
test-direct-tool-calling.mjs		test-direct-tool-calling.mjs
test-multiple-files.txt		test-multiple-files.txt
test-synthetic-complex.sh		test-synthetic-complex.sh
test-synthetic-tools.sh		test-synthetic-tools.sh
test-tool-calling-debug.sh		test-tool-calling-debug.sh
test-tool-calling.sh		test-tool-calling.sh
tsconfig.json		tsconfig.json
typedoc.json		typedoc.json

Folders and files

Latest commit

History

Repository files navigation

Ollama Code CLI

🚀 Features Overview

🤖 Multi-Provider AI Integration

🔧 VCS Intelligence

💻 IDE Integration

🏢 Enterprise Features

📋 Table of Contents

⚡ Quick Start

Basic Installation

Multi-Provider Setup

VCS Intelligence Setup

📦 Installation

Prerequisites

Installation Methods

1. Global Installation (Recommended)

2. Local Project Installation

3. Development Installation

Verify Installation

🎯 CLI Modes

Interactive Mode Selector (Default)

Interactive Mode Features

Environment Variables

Simple CLI Mode

Advanced CLI Mode

🤖 Multi-Provider AI Setup

Supported Providers

Configuration Example

Advanced Features

🦙 llama.cpp Setup

Why Use llama.cpp?

Installation

1. Install llama.cpp

2. Download a GGUF Model

Configuration

Option 1: Environment Variables

Option 2: Configuration File

Option 3: Manual Server Start

Provider Commands

Environment Variables Reference

Recommended Models for Coding

Troubleshooting

Server Not Starting

Out of Memory

Slow Performance

🔧 VCS Intelligence

Git Hooks Management

CI/CD Pipeline Integration

Code Quality Tracking

💻 IDE Integration

VS Code Extension

Real-time Features

WebSocket Server

🎨 Core Features

AI-Powered Commands

Model Management

Workspace Operations

⚙️ Configuration

Hierarchical Configuration System

Configuration Commands

🛠️ Development

Setup Development Environment

Development Commands

Project Architecture

📚 Documentation

Complete Documentation Suite

Auto-Generated Documentation

Documentation Commands

⚡ Performance

🚀 Enhanced Optimization System

Performance Improvements

Enterprise-Scale Performance

Performance Metrics

Optimization Features

Component Status Commands

🔒 Security

Privacy-First Architecture

Packages