Skip to content
93 changes: 93 additions & 0 deletions 0_app/3_modelyaml/_convert-to-mlx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
title: Convert Models to MLX
description: Convert models to MLX format for use with the MLX engine in LM Studio
index: 6
---

## Convert a model to MLX format
Convert models to MLX to use with LM Studio’s MLX engine.

LM Studio’s MLX capability integrates two different implementations, **mlx-lm** and **mlx-vlm**. The former should be used for text-only models while the latter should be used for vision models.


## Prerequisites

You’ll need a Mac with Apple Silicon (M-series)

### Install the packages

**Text-only models**

```bash
pip install mlx-lm
```

**Vision models**

```bash
pip install mlx-vlm
pip3 install torch torchvision
```

### Run the conversion scripts
Use the conversion package that corresponds to your model type below.


## Convert a Hugging Face model to MLX

### Text-only models

```bash
python -m mlx_lm convert \
--hf-path <huggingface-model-id> \
--mlx-path /path/to/output/mlx-model
```

Example command to convert Qwen3-0.6B:

```bash
python -m mlx_lm convert --hf-path Qwen/Qwen3-0.6B
```

### Vision models

```bash
mlx_vlm.convert \
--hf-path <huggingface-model-id> \
--mlx-path /path/to/output/mlx-model
```

Example command to convert Qwen2.5-VL-3B-Instruct:

```bash
mlx_vlm.convert --hf-path Qwen/Qwen2.5-VL-3B-Instruct
```

Note the following flags to include for either conversion tool:
`--hf-path` is the path to the Hugging Face model
`--mlx-path` is where you’d like the converted model to be saved

To directly place the converted model in LM Studio’s model directory, we recommend setting `--mlx-path` to the following:

```bash
~/.lmstudio/models/publisher/modelName
```

For the Qwen2.5-VL-3B-Instruct model above, an example command would look like:

```bash
mlx_vlm.convert --hf-path Qwen/Qwen2.5-VL-3B-Instruct --mlx-path ~/.lmstudio/models/publisher/Qwen2.5-VL-3B-Instruct-MLX
```

This way, you’ll see the model in LM Studio as soon as the conversion is complete – no import step needed.

Use the `--help` flag on either command (e.g., `python -m mlx_vlm.convert --help`) to view advanced conversion options such as quantization.

## Use the converted model in LM Studio

**Note**: Skip this step if you placed the conversion output directly in LM Studio’s model directory.

If you do not include the --mlx-path flag in your command, the tool automatically creates a folder called mlx_model in the folder you were currently in. In this case, you’ll need to manually import the converted MLX model to LM Studio by placing it in LM Studio’s expected models directory structure. By default, LM Studio stores models in `~/.lmstudio/models/`.


See our [MLX overview](../advanced/mlx) for more information about using MLX models in LM Studio.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
47 changes: 47 additions & 0 deletions 0_app/5_advanced/llama-cpp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: What is llama.cpp
description: Learn about the llama.cpp inference engine in LM Studio
index: 2
---

## What is llama.cpp

If you’ve tinkered with open source models, you’ve likely heard of [llama.cpp](https://github.com/ggml-org/llama.cpp). llama.cpp is an open-source inference engine written in C++, developed by Georgi Gerganov in 2023. The goal of this project is to make LLM inference accessible across a wide range of hardware, with minimal setup and without compromising on performance. LM Studio integrates llama.cpp under the hood as one of our primary engines.


### Relevant Terminology

- **GGUF**: GGUF (GPT-Generated Unified Format) is a file format for packaging a model with its weights and all the metadata needed to run them into a single, portable file. Any model in GGUF format can be loaded and run with the llama.cpp engine.
- **Quantization**: Today, most foundation open source models are still very large in size — a 7B parameter model at full float32 precision is around 28GB. In order to run these models on consumer hardware with limited memory, llama.cpp leverages quantization – a method to reduce model size with minimal quality loss – and stores quantized versions in the GGUF format.


## llama.cpp in LM Studio

In the LM Studio app, open Runtime settings (⌘⇧R) to see llama.cpp as the runtime selection for GGUF.

<img src="/assets/docs/settings-runtime-llamacpp.png" data-caption="llama.cpp engine" />

When downloading models in LM Studio, you may see <Model-Name>-Q4_K_M.gguf in the model card. Q4_K_M.gguf indicates a Q4 quantized version of the model stored in GGUF format.

## Download a GGUF model

**From the GUI**

In the app, head to the Model Search tab and filter by GGUF to see only models in that format.

<img src="/assets/docs/modelsearch-gguf.png" data-caption="Filter by GGUF in model search" />

**Using the CLI**

From the terminal, use `lms get` and include the `--gguf` flag to only show models in the GGUF format:

```bash
lms get --gguf
```

## Converting models to GGUF format

Many popular models already have GGUF versions available. Before converting manually, check if the model is available in GGUF format in [lmstudio-community](https://huggingface.co/lmstudio-community) or by using the `hf.co/models?library=gguf` tag on HuggingFace.

Models stored locally in other formats can be converted to GGUF to run with llama.cpp. Non-GGUF models can be converted using the Python scripts in the [llama.cpp repository](https://github.com/ggml-org/llama.cpp).

46 changes: 46 additions & 0 deletions 0_app/5_advanced/mlx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: What is MLX
description: Learn about the MLX inference engine in LM Studio
index: 3
---

## What is MLX

[MLX](https://github.com/ml-explore/mlx) is a machine learning framework and library developed by Apple to optimize running ML workloads on Apple Silicon. [mlx-lm](https://github.com/ml-explore/mlx-lm/tree/main) and [mlx-vlm](https://github.com/Blaizzy/mlx-vlm) are packages built on top of the MLX framework that focuses specifically on performing inference with LLMs or VLMs, respectively. These packages maximize the speed and efficiency of running LLMs on Apple Silicon.

LM Studio integrates both mlx-lm and mlx-vlm into the [LM Studio MLX engine](https://github.com/lmstudio-ai/mlx-engine). Because MLX is built by Apple for Apple hardware, it is a Mac-only engine and is not available on other platforms.

## MLX and unified memory

Unlike traditional systems where CPU and GPU maintain separate memory pools, Apple Silicon uses a unified memory architecture. Both GGUF and MLX take advantage of this hardware design, but MLX is built specifically for Apple Silicon and typically results in better performance (faster inference speeds) on Macs compared to GGUF.

## MLX in LM Studio

If you’re using LM Studio on a Mac, you will see MLX as an available engine under Runtime Settings (⌘⇧R).

<img src="/assets/docs/settings-runtime-mlx.png" data-caption="MLX engine" />

When browsing models in LM Studio, you may see entries like `lmstudio-community/Qwen3-Coder-Next-MLX-4bit`. The LM Studio team maintains our own MLX conversions for popular open source models on Hugging Face at [lmstudio-community](https://huggingface.co/lmstudio-community).

### Download an MLX model

**From the GUI**

In the app, head to the Model Search tab and filter by MLX to see only models in that format.

<img src="/assets/docs/modelsearch-mlx.png" data-caption="Filter by MLX in model search" />

**Using the CLI**

From the terminal, use `lms get` and include the `--mlx` flag to only show MLX models:

```bash
lms get --mlx
```
Read more about `lms get` [here](https://lmstudio.ai/docs/cli/local-models/get).

## Choosing between MLX and llama.cpp

Note that the MLX engine only supports models in MLX format, and the llama.cpp engine only supports models in GGUF format.

If you're on a Mac device, our Staff Picks will recommend the MLX model if it’s available as an option, and otherwise exclude the option if it is not recommended over GGUF.
Loading