A powerful Android application that transforms your device into a local AI server. OLLAMINI downloads AI models directly from various sources and runs them using a custom native C++ AI runner with GPU acceleration support.
- Model Management: Download and manage AI models from Hugging Face and other sources
- Local AI Server: Run models locally with a custom C++ implementation
- GPU Acceleration: Native GPU support for faster inference
- Chat Interface: Interactive chat with AI models
- Server Control: Start/stop the local AI server
- Statistics: Monitor server performance and model usage
- Settings: Customize app behavior and server configuration
- Documentation: Built-in help and usage guides
- Model Downloader: Downloads .gguf models directly from URLs
- Native AI Runner: Custom C++ implementation for model inference
- Local Server: HTTP API server for external access
- Android UI: Jetpack Compose interface for management
Internet → Model Download → Local Storage → Native AI Runner → HTTP API → External Clients
- Home: Server status and quick actions
- Models: Browse, download, and manage AI models
- Chat: Interactive conversations with AI models
- Server: Start/stop and configure the local server
- Statistics: Performance metrics and usage data
- Settings: App configuration and preferences
- Documentation: Help and usage guides
- Language: Kotlin
- UI Framework: Jetpack Compose
- Database: Room with SQLite
- Networking: Retrofit for API calls
- Native Code: C++ with JNI for AI inference
- Background Services: Android WorkManager
- GPU Support: OpenCL integration
- Android 8.0 (API 26) or higher
- 4GB+ RAM recommended
- 2GB+ free storage for models
- GPU with OpenCL support (optional, for acceleration)
- Install the app from the APK
- Grant permissions for storage and network access
- Browse models in the Models tab
- Download a model (e.g., Llama 2 7B)
- Start the server from the Server tab
- Chat with the AI or use the HTTP API
The app uses a JSON file to define available models with human-readable sizes:
[
{
"id": "llama2-7b",
"name": "Llama 2 7B",
"description": "Meta's Llama 2 7B parameter model",
"size": "4GB",
"download_url": "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf",
"parameters": "7B",
"model_files": {
"model_url": "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf",
"model_size": "4GB",
"params_url": "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/params",
"params_size": "1KB",
"config_url": "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/config.json",
"config_size": "2KB"
}
}
]Required Fields:
id: Unique identifier for the model (required)name: Display name shown in the app (required)description: Model description and details (optional)size: Human-readable model size like "4GB", "1.5GB", "500MB" (required)download_url: Main download URL for the model (required)parameters: Model parameter count like "7B", "13B", "14B" (required)
Model Files (Optional):
model_files.model_url: Main .gguf model file URLmodel_files.model_size: Human-readable size of the model filemodel_files.params_url: Model parameters file URL (optional)model_files.params_size: Size of params file (e.g., "1KB")model_files.config_url: Model configuration file URL (optional)model_files.config_size: Size of config file (e.g., "2KB")
Size Format Support:
- Bytes: "1024B"
- Kilobytes: "1KB", "1.5KB"
- Megabytes: "500MB", "1.5MB"
- Gigabytes: "4GB", "1.5GB"
- Terabytes: "1TB"
- Port: Default 8080 (configurable)
- Host: 0.0.0.0 (accessible from network)
- API Endpoints: RESTful interface for model interaction
Once the server is running, you can interact with models via HTTP:
# Generate text
curl -X POST http://localhost:8080/generate \
-H "Content-Type: application/json" \
-d '{"model": "llama2-7b", "prompt": "Hello, how are you?"}'
# Chat conversation
curl -X POST http://localhost:8080/chat \
-H "Content-Type: application/json" \
-d '{"model": "llama2-7b", "message": "Tell me a joke"}'- Personal AI Assistant: Run AI models locally for privacy
- Development Testing: Test AI integrations without cloud costs
- Offline AI: Use AI capabilities without internet connection
- Educational: Learn about AI models and inference
- Prototyping: Quick AI model testing and experimentation
- Local Processing: All AI inference happens on your device
- No Cloud Dependencies: Models run entirely locally
- Data Privacy: Your conversations stay on your device
- Network Control: Choose which devices can access your AI server
- Storage: Download and store AI models
- Network: Access model repositories and serve HTTP API
- WiFi State: Detect network configuration for server access
- Model Loading: Optimized for Android devices
- Memory Management: Efficient RAM usage for large models
- GPU Acceleration: Optional OpenCL support for faster inference
- Background Processing: Non-blocking model operations
- Model Updates: Automatic version checking
- App Updates: Regular feature and security updates
- Community Models: Easy addition of new model sources
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face: Model hosting and distribution
- TheBloke: GGUF model conversions
- Meta: Llama 2 models
- Mistral AI: Mistral models
- Microsoft: Phi and Orca models
- Issues: Report bugs on GitHub
- Discussions: Ask questions in GitHub Discussions
- Documentation: Check the in-app help section
OLLAMINI - Your personal AI server, powered by local inference.