Replies: 2 comments
-
|
koboldcpp gets its model support from llama.cpp, you should make the request there |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Gotcha. Went to llama.cpp and found it's already more-or-less supported or being worked on, and probably among the 63 updates that KoboldCPP is behind on. https://github.com/ggml-org/llama.cpp/pulls?q=llada Okay, i'll just hope KoboldCPP gets updated to support it soon :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
There's new diffusion models coming out that i would like to see KoboldCPP support. There's currently a 8B and 100B models that while experimental look very promising.
To my understanding it determines the size of the reply, then does multiple passes, masking sections that are likely low confidence (or random? Maybe pattern sets like checkboard?) until the response is done. At present the ARM model (one token at a time) doesn't allow for this.
Two main advantages with Diffusion method includes that it is faster (fewer passes over the whole thing), and unlike ARM if a bad word/token is given earlier, it can replace that with a better one in subsequent passes (rather than be stuck with it and working around it, or being inconsistent later).
https://llada.pro - LLaDa demo
https://huggingface.co/mradermacher/LLaDA-8B-Instruct-i1-GGUF
Beta Was this translation helpful? Give feedback.
All reactions