Run powerful LLM models on your PC. Chat from anywhere on your phone.
Layla Server is a lightweight wrapper around llama-server (or any OpenAI-compatible inference engine) that exposes your local LLM over WebRTC — so you can connect to it from anywhere, just by scanning a QR code with the Layla app.
Your PC has the horsepower to run large, capable models. Your phone has the convenience. Layla Server bridges the two — no port forwarding, no static IP, no cloud subscription required.
- ✅ One QR code to connect your phone to your PC's LLM
- ✅ Works anywhere — WebRTC punches through NAT and firewalls
- ✅ Long-lived sessions — polling-based signalling keeps the server up indefinitely
- ✅ One-click installer — ships with a bundled
llama-serversnapshot, no setup needed - ✅ BYOM — bring your own GGUF models
- ✅ Swap backends — switch to any OpenAI-compatible inference engine in settings
┌─────────────────────────────┐ WebRTC ┌──────────────┐
│ Your PC │ ◄──────────────────► │ Layla App │
│ │ │ (iPhone / │
│ llama-server ←→ Layla │ QR code │ Android) │
│ (or any OpenAI-compat API) │ └──────────────┘
└─────────────────────────────┘
- Layla Server starts
llama-server(or connects to your preferred backend) - It generates a QR code encoding the WebRTC connection offer
- Scan the QR code with the Layla app — a peer-to-peer connection is established
- All OpenAI-compatible HTTP calls from the app are proxied over WebRTC to your local model
- Reconnect any time by scanning a new QR code — the server stays alive via polling-based signalling
Download the latest release for your platform from the Releases page.
Layla Server ships with a bundled llama-server snapshot — no separate installation required (you can easily swap it out with newer versions or even different servers).
- Download and unzip the release
- Drop your
.ggufmodel file into themodels/folder - Launch
layla-server - Open the Layla app on your phone and scan the QR code
That's it. You're now chatting with your local model from your phone.
Prerequisites: Node.js 18+
git clone https://github.com/l3utterfly/Layla-Server.git
cd Layla-Server
npm install
npm startBuild an .exe for Windows:
npm run distAll settings are available in the Settings panel of the Layla Server UI.
| Setting | Description |
|---|---|
| Model path | Path to your .gguf model file |
| Inference backend | URL of any OpenAI-compatible server (default: bundled llama-server) |
| Additional Cmd Args | Additional command-line arguments to pass to the server executable |
Layla Server proxies to any OpenAI-compatible endpoint. To use a different inference engine (LM Studio, Ollama, vLLM, etc.), just point the backend URL to it in Settings — no other changes needed.
Layla Server supports any model in GGUF format.
- Download a GGUF model (e.g. from Hugging Face)
- Select your model in the settings page
- Restart the server — the new model is loaded automatically
- Launch Layla Server on your PC
- A QR code will appear in the UI
- Open the Layla app on your phone
- Tap Connect → Scan QR code
- Point your camera at the code — connection is established instantly
The server uses polling-based WebRTC signalling, so it stays alive and connectable for as long as it's running. You can close the app, leave for hours, and reconnect whenever you like (no need to scan QR code again).
- WebRTC transport — peer-to-peer data channel; traffic goes directly between your devices once connected
- Polling signalling — no persistent WebSocket required; the server polls for new connection offers at a configurable interval, making it robust for long-running sessions
- OpenAI-compatible proxy — the WebRTC channel transparently forwards HTTP requests and responses, so the Layla app doesn't need to know anything about the underlying transport
- Layla App — the offline LLM mobile client
- llama.cpp — the inference engine powering the bundled backend
Apache 2.0 © Layla Network Pty Ltd