Skip to content

Releases: DeVenLucaz/llamdrop

LLAMdrop v0.9.5

04 Jun 00:43
33c855e

Choose a tag to compare

What's New in v0.9.5

Bug Fixes & Code Audit

  • RAM monitoring unified — eliminated 4 redundant implementations across chat.py, downloader.py, and ram_monitor.py. Single source of truth in specs.py
  • Inference blocking fixed — non-blocking stdout collection loop prevents "Thinking..." animation from stuttering on single-core or constrained devices
  • Ollama first-class — Ollama backend now receives full DeviceProfile for hardware-aware auto-tuning, same as native llama.cpp path

New Features

  • doctor --cleanup — detects and removes orphaned .part files left by interrupted downloads. One command, no manual hunting.
  • Formal test suitetests/ directory added with 6 passing tests covering hardware tiering and prompt builder logic

Housekeeping

  • All repository URLs migrated from old username to DeVenLucaz/llamdrop
  • Version badges and roadmap updated across README and models.json

Full changelog

See CHANGELOG.md

LLAMdrop v0.9.1

28 Apr 13:05
89c5af0

Choose a tag to compare

🦙 llamdrop v0.9.1

Stability and refinement release — no new features.
Everything that ships here has been tested on-device.


🧹 Chat Output Pipeline — Fully Cleaned

  • llama.cpp startup banner no longer bleeds into chat responses
  • Responses no longer print twice (was caused by -p flag behaviour in some builds)
  • Timing stats [ Prompt: X t/s | Generation: Y t/s ], Exiting..., and ChatML boundary tags (<|im_start|> / <|im_end|>) no longer leak into responses
  • Noise filter moved to stderr only — model stdout is no longer silently corrupted

📋 Model Browser

  • Browser now shows only models that fit your device tier and available RAM
  • Catalog expanded from 25 to 41 models across 6 tiers (Micro → Desktop/Workstation)
  • Category switching is now instant — compatibility check cached at open, not repeated per keypress
  • Cancelled downloads are deleted immediately — partial files never appear as valid models
  • File size check added — files under 50MB are ignored in My Models and the downloaded tick

⚡ Performance & Internals

  • Hardware detection now runs once at startup instead of three separate times
  • RAM reads consolidated to one shared function across all modules
  • Incremental prompt buffer — prompt no longer rebuilt from scratch on every message
  • Background GGUF scanner — My Models screen no longer freezes, shows live count while scanning
  • Smarter context trimming — first exchange always preserved when history is shortened

🔧 Minor Fixes

  • Auto-save threshold named as _AUTOSAVE_EVERY_TURNS = 10 (was a magic number) — help text corrected to "every 5 exchanges"

Full changelog: CHANGELOG.md

LLAMdrop v0.8.5

27 Apr 15:03
cb290a5

Choose a tag to compare

🦙 llamdrop v0.8.5

Bug fix patch — 5 targeted fixes on top of the v0.8.0 smart device backend.


🐛 Bug Fixes

  • browser.py content swap — File contained benchmarks.py content, causing ImportError: cannot import name 'show_browser' on every launch after updating. Correct file restored.

  • Blocking stdout readproc.stdout.read() replaced with a daemon thread doing line-by-line collection, so the 🦙 Thinking spinner actually animates during inference instead of freezing.

  • Retry path blocking — Fallback retry path for old llama-cli had the same blocking read bug. Now uses the same daemon thread fix as the main inference path.

  • Mali Vulkan false positive/dev/mali0 existing only proves Mali GPU hardware is present, not that a Vulkan driver is loaded. Detection now checks for a Vulkan ICD directory before claiming GPU acceleration. Matches the Adreno fix from v0.7.1.

  • /clear didn't reset auto-save counter — Clearing history reset history to [] but left _last_save_len at its old value, silently delaying the next auto-save by up to 10 messages. Counter now resets to 0 alongside history.


📦 What's in the v0.8.x series (since v0.7.0)

v0.8.0 introduced the full smart device-aware backend:

  • New modules/specs.py — complete device intelligence (platform, RAM, CPU big.LITTLE, GPU)
  • Device tier classification: Micro → Workstation
  • Auto-tuned runtime flags (--threads, --ctx-size, --n-gpu-layers, etc.)
  • Android GPU always CPU-only — Mali/Adreno are slower than CPU for LLMs
  • Native Windows PowerShell installer (install.ps1)
  • macOS Homebrew + Ollama install path
  • SHA-256 binary verification before extraction

v0.8.1 fixed 13 bugs including resume download corruption, Gemma response truncation, stale config cache, and battery icon ranges.

v0.8.5 — this release, 5 fixes above.


Full changelog: CHANGELOG.md

LLAMdrop v0.7.0

26 Apr 22:46
e8443cb

Choose a tag to compare

v0.7.0 — 2026-04-27

New Features

  • Chip-aware thread selection — 30+ chips mapped to their actual big core count. Dimensity 720 now correctly uses 2 performance cores instead of the blind cores//2 heuristic.
  • Fixed context thresholds — was 512 tokens at <2GB RAM (barely 2 exchanges). Now 2048 at <2GB, 4096 at <5GB, 8192 on high-RAM devices. Uses zram-inclusive effective RAM.
  • Device class detection — classifies your device as ultra_low / low / mid / high / desktop and picks optimal backend and model tier automatically.
  • Tiered install welcome screen — first launch shows detected hardware (chip, RAM, class, threads, context) and recommends which models to download. Runs once, never again.
  • Ollama backend — on Linux/desktop, llamdrop detects if Ollama is running and routes inference through it automatically. New 🤖 Ollama Chat menu item appears when Ollama is active.
  • IQ quant support — IQ3_M and IQ2_M variants added to 10 models (all 3B+). Better quality than Q2_K at similar RAM. Preference order updated. Vulkan auto-disabled for IQ quants (incompatible).
  • Conditional mmap — models on internal storage (~/.llamdrop/models/) now use mmap, reducing peak RAM by 15–30%. External/sdcard paths keep --no-mmap for safety.
  • Inference extraction_run_inference(), _extract_response(), _print_response() separated from chat loop. Clean backend abstraction for future backends.
  • Backend abstractionmodules/backends/ package added with ollama.py. New _dispatch_inference() routes to correct backend automatically.

Improvements

  • Device info screen now shows device class, recommended backend, and suggested models.
  • Launch summary shows mmap status and IQ quant Vulkan warning.
  • llamdrop doctor now checks Ollama installation and server status.
  • llamdrop update now pulls backends/ files.

Bug Fixes

  • Context size was critically under-provisioned for most devices.
  • Thread count heuristic was wrong for many chip layouts (e.g. Helio G85).

LLAMdrop v0.6.1

25 Apr 18:19
e9a36cf

Choose a tag to compare

Bug fixes for v0.6

Fixed

  • Battery monitoring import error on devices without battery sysfs path
  • Config file not created on first install in some Termux environments
  • Prompt format fallback to chatml when model has no prompt_format field
  • /export path detection across different Android configurations
  • updater.py now correctly includes config.py and battery.py in update list

Still works

  • All v0.6 features intact — config file, chat export, battery monitoring, prompt format auto-detect
  • llamdrop update from v0.6.0 → v0.6.1 works cleanly

Install / Update

# Fresh install
curl -sL https://raw.githubusercontent.com/ypatole035-ai/llamdrop/main/install.sh | bash

# Already installed
llamdrop update

LLAMdrop v0.5

25 Apr 13:09
df2e02c

Choose a tag to compare

What's new in v0.5

New commands

  • llamdrop update — self-update without reinstalling
  • llamdrop doctor — health check for your install
  • llamdrop version / llamdrop help — quick CLI info

New features

  • ⚡ Model benchmarking — tokens/sec recorded after each chat, shown in browser
  • 🩺 Doctor in main menu — check binary, RAM, storage, network, permissions
  • 🆙 Update in main menu — one tap to pull latest version

Improved

  • Session delete — type D2 in Resume screen to delete session 2
  • Banner fixed — no more garbled blocks on startup
  • Atomic file writes during update — safe even if connection drops
  • Full CHANGELOG.md added to repo

Install

curl -sL https://raw.githubusercontent.com/ypatole035-ai/llamdrop/main/install.sh | bash

LLAMdrop v0.4

24 Apr 22:06
a926427

Choose a tag to compare

LLAMdrop v0.4 Pre-release
Pre-release

🚀 LLAMdrop v0.4 - The Stability & Style Update

This release marks a major milestone in mobile-first AI inference. We've cleaned up the logic, optimized the hardware detection, and professionalized the documentation.

✨ What's New?

  • Optimized Chipset Detection: Smoother performance on MediaTek and Snapdragon budget chips.
  • Enhanced UI: Cleaner terminal menus for better navigation in Termux.
  • Pro Docs: Integrated Wiki and SEO-ready README for better discovery.
  • Security First: Implemented CodeQL scanning to ensure a safe environment.

📦 Installation

If you already have LLAMdrop, just run the update command. For new users:
curl -sSL https://raw.githubusercontent.com/ypatole035-ai/llamdrop/main/install.sh | bash


Built entirely on Android with curiosity and grit.

LLAMdrop v0.3

24 Apr 10:04

Choose a tag to compare

LLAMdrop v0.3 Pre-release
Pre-release

🚀 What's New in v0.3

This release focuses on making local AI seamless across different hardware capabilities.

✨ Key Features

  • Smart Hardware Detection: Automatically identifies your chipset (CPU/GPU) to recommend the best configuration.
  • Expanded Model Catalog: Now includes 12 verified models categorized into 3 performance tiers (Low-end, Mid-range, and Flagship).
  • Chip Name Fixes: Improved accuracy for mobile processor identification.
  • One-Line Installer: Refined install.sh for a smoother setup in Termux and Linux environments.

🛠 Improvements

  • Resilient downloader with auto-retry logic.
  • Fixed context trimming issues in the chat loop.

📱 Device Compatibility

Tested and confirmed working on:

  • Android (via Termux)
  • Raspberry Pi 4/5
  • Low-spec Linux Laptops

LLMdrop is and always will be free & open source under the GPL v3 License.