Releases: DeVenLucaz/llamdrop
LLAMdrop v0.9.5
What's New in v0.9.5
Bug Fixes & Code Audit
- RAM monitoring unified — eliminated 4 redundant implementations across
chat.py,downloader.py, andram_monitor.py. Single source of truth inspecs.py - Inference blocking fixed — non-blocking stdout collection loop prevents "Thinking..." animation from stuttering on single-core or constrained devices
- Ollama first-class — Ollama backend now receives full
DeviceProfilefor hardware-aware auto-tuning, same as native llama.cpp path
New Features
doctor --cleanup— detects and removes orphaned.partfiles left by interrupted downloads. One command, no manual hunting.- Formal test suite —
tests/directory added with 6 passing tests covering hardware tiering and prompt builder logic
Housekeeping
- All repository URLs migrated from old username to
DeVenLucaz/llamdrop - Version badges and roadmap updated across README and models.json
Full changelog
See CHANGELOG.md
LLAMdrop v0.9.1
🦙 llamdrop v0.9.1
Stability and refinement release — no new features.
Everything that ships here has been tested on-device.
🧹 Chat Output Pipeline — Fully Cleaned
- llama.cpp startup banner no longer bleeds into chat responses
- Responses no longer print twice (was caused by
-pflag behaviour in some builds) - Timing stats
[ Prompt: X t/s | Generation: Y t/s ],Exiting..., and ChatML boundary tags (<|im_start|>/<|im_end|>) no longer leak into responses - Noise filter moved to stderr only — model stdout is no longer silently corrupted
📋 Model Browser
- Browser now shows only models that fit your device tier and available RAM
- Catalog expanded from 25 to 41 models across 6 tiers (Micro → Desktop/Workstation)
- Category switching is now instant — compatibility check cached at open, not repeated per keypress
- Cancelled downloads are deleted immediately — partial files never appear as valid models
- File size check added — files under 50MB are ignored in My Models and the downloaded tick
⚡ Performance & Internals
- Hardware detection now runs once at startup instead of three separate times
- RAM reads consolidated to one shared function across all modules
- Incremental prompt buffer — prompt no longer rebuilt from scratch on every message
- Background GGUF scanner — My Models screen no longer freezes, shows live count while scanning
- Smarter context trimming — first exchange always preserved when history is shortened
🔧 Minor Fixes
- Auto-save threshold named as
_AUTOSAVE_EVERY_TURNS = 10(was a magic number) — help text corrected to "every 5 exchanges"
Full changelog: CHANGELOG.md
LLAMdrop v0.8.5
🦙 llamdrop v0.8.5
Bug fix patch — 5 targeted fixes on top of the v0.8.0 smart device backend.
🐛 Bug Fixes
-
browser.pycontent swap — File containedbenchmarks.pycontent, causingImportError: cannot import name 'show_browser'on every launch after updating. Correct file restored. -
Blocking stdout read —
proc.stdout.read()replaced with a daemon thread doing line-by-line collection, so the 🦙 Thinking spinner actually animates during inference instead of freezing. -
Retry path blocking — Fallback retry path for old
llama-clihad the same blocking read bug. Now uses the same daemon thread fix as the main inference path. -
Mali Vulkan false positive —
/dev/mali0existing only proves Mali GPU hardware is present, not that a Vulkan driver is loaded. Detection now checks for a Vulkan ICD directory before claiming GPU acceleration. Matches the Adreno fix from v0.7.1. -
/cleardidn't reset auto-save counter — Clearing history resethistoryto[]but left_last_save_lenat its old value, silently delaying the next auto-save by up to 10 messages. Counter now resets to0alongside history.
📦 What's in the v0.8.x series (since v0.7.0)
v0.8.0 introduced the full smart device-aware backend:
- New
modules/specs.py— complete device intelligence (platform, RAM, CPU big.LITTLE, GPU) - Device tier classification: Micro → Workstation
- Auto-tuned runtime flags (
--threads,--ctx-size,--n-gpu-layers, etc.) - Android GPU always CPU-only — Mali/Adreno are slower than CPU for LLMs
- Native Windows PowerShell installer (
install.ps1) - macOS Homebrew + Ollama install path
- SHA-256 binary verification before extraction
v0.8.1 fixed 13 bugs including resume download corruption, Gemma response truncation, stale config cache, and battery icon ranges.
v0.8.5 — this release, 5 fixes above.
Full changelog: CHANGELOG.md
LLAMdrop v0.7.0
v0.7.0 — 2026-04-27
New Features
- Chip-aware thread selection — 30+ chips mapped to their actual big core count. Dimensity 720 now correctly uses 2 performance cores instead of the blind cores//2 heuristic.
- Fixed context thresholds — was 512 tokens at <2GB RAM (barely 2 exchanges). Now 2048 at <2GB, 4096 at <5GB, 8192 on high-RAM devices. Uses zram-inclusive effective RAM.
- Device class detection — classifies your device as ultra_low / low / mid / high / desktop and picks optimal backend and model tier automatically.
- Tiered install welcome screen — first launch shows detected hardware (chip, RAM, class, threads, context) and recommends which models to download. Runs once, never again.
- Ollama backend — on Linux/desktop, llamdrop detects if Ollama is running and routes inference through it automatically. New 🤖 Ollama Chat menu item appears when Ollama is active.
- IQ quant support — IQ3_M and IQ2_M variants added to 10 models (all 3B+). Better quality than Q2_K at similar RAM. Preference order updated. Vulkan auto-disabled for IQ quants (incompatible).
- Conditional mmap — models on internal storage (~/.llamdrop/models/) now use mmap, reducing peak RAM by 15–30%. External/sdcard paths keep --no-mmap for safety.
- Inference extraction —
_run_inference(),_extract_response(),_print_response()separated from chat loop. Clean backend abstraction for future backends. - Backend abstraction —
modules/backends/package added withollama.py. New_dispatch_inference()routes to correct backend automatically.
Improvements
- Device info screen now shows device class, recommended backend, and suggested models.
- Launch summary shows mmap status and IQ quant Vulkan warning.
llamdrop doctornow checks Ollama installation and server status.llamdrop updatenow pullsbackends/files.
Bug Fixes
- Context size was critically under-provisioned for most devices.
- Thread count heuristic was wrong for many chip layouts (e.g. Helio G85).
LLAMdrop v0.6.1
Bug fixes for v0.6
Fixed
- Battery monitoring import error on devices without battery sysfs path
- Config file not created on first install in some Termux environments
- Prompt format fallback to chatml when model has no prompt_format field
/exportpath detection across different Android configurations- updater.py now correctly includes config.py and battery.py in update list
Still works
- All v0.6 features intact — config file, chat export, battery monitoring, prompt format auto-detect
llamdrop updatefrom v0.6.0 → v0.6.1 works cleanly
Install / Update
# Fresh install
curl -sL https://raw.githubusercontent.com/ypatole035-ai/llamdrop/main/install.sh | bash
# Already installed
llamdrop updateLLAMdrop v0.5
What's new in v0.5
New commands
llamdrop update— self-update without reinstallingllamdrop doctor— health check for your installllamdrop version/llamdrop help— quick CLI info
New features
- ⚡ Model benchmarking — tokens/sec recorded after each chat, shown in browser
- 🩺 Doctor in main menu — check binary, RAM, storage, network, permissions
- 🆙 Update in main menu — one tap to pull latest version
Improved
- Session delete — type D2 in Resume screen to delete session 2
- Banner fixed — no more garbled blocks on startup
- Atomic file writes during update — safe even if connection drops
- Full CHANGELOG.md added to repo
Install
curl -sL https://raw.githubusercontent.com/ypatole035-ai/llamdrop/main/install.sh | bashLLAMdrop v0.4
🚀 LLAMdrop v0.4 - The Stability & Style Update
This release marks a major milestone in mobile-first AI inference. We've cleaned up the logic, optimized the hardware detection, and professionalized the documentation.
✨ What's New?
- Optimized Chipset Detection: Smoother performance on MediaTek and Snapdragon budget chips.
- Enhanced UI: Cleaner terminal menus for better navigation in Termux.
- Pro Docs: Integrated Wiki and SEO-ready README for better discovery.
- Security First: Implemented CodeQL scanning to ensure a safe environment.
📦 Installation
If you already have LLAMdrop, just run the update command. For new users:
curl -sSL https://raw.githubusercontent.com/ypatole035-ai/llamdrop/main/install.sh | bash
Built entirely on Android with curiosity and grit.
LLAMdrop v0.3
🚀 What's New in v0.3
This release focuses on making local AI seamless across different hardware capabilities.
✨ Key Features
- Smart Hardware Detection: Automatically identifies your chipset (CPU/GPU) to recommend the best configuration.
- Expanded Model Catalog: Now includes 12 verified models categorized into 3 performance tiers (Low-end, Mid-range, and Flagship).
- Chip Name Fixes: Improved accuracy for mobile processor identification.
- One-Line Installer: Refined
install.shfor a smoother setup in Termux and Linux environments.
🛠 Improvements
- Resilient downloader with auto-retry logic.
- Fixed context trimming issues in the chat loop.
📱 Device Compatibility
Tested and confirmed working on:
- Android (via Termux)
- Raspberry Pi 4/5
- Low-spec Linux Laptops
LLMdrop is and always will be free & open source under the GPL v3 License.