π‘ Ultra-fast local LLM and embedding inference directly inside your JVM process β Zero-copy, Zero HTTP overhead, C++ native speed.
FastAIModel is a retained-memory local inference engine for Java that wraps llama.cpp (for GGUF) and ONNX Runtime (for ONNX) using direct JNI bindings. It is the engine that drives offline execution in the FastJava Ecosystem, giving Java developers native LLM and embedding capabilities without keeping heavy external apps (like LM Studio or Ollama) open.
- Why FastAIModel?
- Key Features
- Installation
- API Reference
- Performance
- Project Structure
- Roadmap
- License
import fastaimodel.FastAIModel;
public class Demo {
public static void main(String[] args) {
// Load local GGUF model directly into memory
try (FastAIModel model = new FastAIModel("models/qwen2.5-coder-1.5b.gguf", 2048, 0.7f)) {
model.generate("Write a quicksort in Java:", token -> {
System.out.print(token);
System.out.flush();
});
}
}
}Running LLMs locally in Java typically requires invoking external subprocesses or running local HTTP servers. FastAIModel eliminates this bloat by running the model directly inside your Java process:
- True In-Process Execution β Runs the model in the same process space, bypassing system context-switches and network sockets.
- Zero HTTP/JSON Overhead β Text and tokens flow directly between Java and C++ memory.
- Low Memory Overhead β Eliminates the footprint of keeping GUI-based desktop inference servers running in the background.
- π Native llama.cpp Performance β Direct integration with CPU AVX2/AVX512 instruction sets and GPU computation (Vulkan/CUDA).
- π Direct Token Streaming β Direct native callbacks stream tokens back to your Java consumer in real-time.
- π¦ GGUF Support β Native compatibility with any GGUF quantized models (Llama, Qwen, Mistral, Gemma).
- π§ Zero-Copy Memory β Shared token handling minimizing garbage collection strain on the JVM.
Add the JitPack repository and the dependencies to your pom.xml:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependencies>
<!-- FastAIModel Engine -->
<dependency>
<groupId>com.github.andrestubbe</groupId>
<artifactId>FastAIModel</artifactId>
<version>0.1.0</version>
</dependency>
<!-- FastCore (Mandatory Native DLL Loader) -->
<dependency>
<groupId>com.github.andrestubbe</groupId>
<artifactId>FastCore</artifactId>
<version>0.1.0</version>
</dependency>
</dependencies>repositories {
maven { url 'https://jitpack.io' }
}
dependencies {
implementation 'com.github.andrestubbe:FastAIModel:0.1.0'
implementation 'com.github.andrestubbe:FastCore:0.1.0'
}- ROADMAP.md: Planned milestone features and performance extensions.
- REFERENCE.md: JNI contracts and configuration options.
- PHILOSOPHY.md: In-process design decisions.
- CHANGELOG.md: Releases history.
| Platform | Status |
|---|---|
| Windows 10/11 (x64) | β Fully Supported |
| Linux | π§ Planned |
| macOS | π§ Planned |
MIT License β See LICENSE file for details.
Part of the FastJava Ecosystem β Making the JVM faster. Small package. Maximum speed. Zero bloat. ππ