Recently, multimodal models are gaining traction. It would be better if this project supported them. The underlying llama.cpp already has support for [vision language models](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#multimodal). So this shouldn't be too difficult to implement.
Recently, multimodal models are gaining traction. It would be better if this project supported them.
The underlying llama.cpp already has support for vision language models. So this shouldn't be too difficult to implement.