[Why] do models require a new version? It can already take arbitrary gguf; I ass...

ynniv · on April 28, 2024

They do, and I was using the "new" models before the update. Perhaps there is tuning or bug fixes for them? Or they just want to confirm that these are supported. There are some new models that do have different architectures, so sometimes an update is necessary.

refulgentis · on April 29, 2024

Phi 3 has a unique architecture that needed some additions to llama.cpp's conversion script. Also Phi 3 is an absolute mess, there's no reliable way to latch on to when it's done writing a message and no one wants to admit it, people are patching around it instead.

ex. I could condition on "\n\n<|assistant|>||<|system|>||<|user>", but it'd still be wrong.

Pretty much everything Phi 3 feels like it needed to all come out within 48 hours a month too early. The ONNX genai library doesn't work on Mac, at all, the mobile SDKs don't support it...sigh

FieryTransition · on April 28, 2024

Because the way they are quantized takes time to get bug-free when new architectures are released. If a model was quantized with a known bug in the quantizer, then it effectively makes those quantized versions buggy and they need to be requantized with a new version of llamacpp which has this fixed.