I wonder if Ollama will or plans to have other "Supported backends" than llama.c...

jmorgan · on April 29, 2024

Yes, we are also looking at integrating MLX [1] which is optimized for Apple Silicon and built by an amazing team of individuals, a few of which were behind the original Torch [2] project. There's also TensorRT-LLM [3] by Nvidia optimized for their recent hardware.

All of this of course acknowledging that llama.cpp is an incredible project with competitive performance and support for almost any platform.

[1] https://github.com/ml-explore/mlx

[2] https://en.wikipedia.org/wiki/Torch_(machine_learning)

[3] https://github.com/NVIDIA/TensorRT-LLM

smcleod · on April 29, 2024

MLX and TensorRT would be really nice!

sdesol · on April 29, 2024

I don't think they will move away from llama.cpp until they are forced to. The number of people contributing to llama.cpp is quite significant [1] and it wouldn't make sense to use another backend given how quickly llama.cpp is iterating and growing.

[1] https://devboard.gitsense.com/ggerganov?r=ggerganov%2Fllama....

Full disclosure: This is my tool

refulgentis · on April 29, 2024

ghost of christmas future

The chance onnx becomes significantly relevant here went from 1% to 15% this week. They're demo'ing ~2x faster inference with Phi-3. There's been fits and starts on LLMs in ONNX for a year, but, with Wintel's AI PC™ push, and all the constituent parts in place (4 bit quants! adaptive quants!), I'd put very good money on it.

sdesol · on April 29, 2024

So you are saying Ollama is a strong MS acquisition in the future if onnx works out.

refulgentis · on April 29, 2024

no, ONNX is a Microsoft project, I don't know why people know what Ollama is and I don't think they will in a year

sdesol · on April 29, 2024

I know it is a Microsoft Project. My reasoning is, if Ollama supports ONNX and if it can provide performance on par or better than llama.cpp, it would make sense for Microsoft to acquire Ollama for distribution reasons.

observationist · on April 29, 2024

Llama.cpp is the valuable bit here, and Ollama is only good for end user convenience. It saves you 20 minutes of googling and futzing with the million and one llama.cpp wrappers available for every language, lets you set up things to load on startup, but if you're building something for scale or backend, neither llama.cpp or ollama are coming along for the ride. At best it'll live through a proof of concept stage, but as soon as you start caring about performance it's getting discarded.

Microsoft isn't going to pay for something that amounts to a useful setup script wrapped around an inefficient convenience library intended for people to be able to run AI on consumer hardware. There's no exploitable value proposition, whereas building their own closed source AI systems that are tightly coupled to the Windows ecosystem and favor cloud services allows them to extract maximum rent.

sh79 · on April 29, 2024

Their behaviour around llama.cpp acknowledgement is very shady. Until the very recent, there was no mention of llama.cpp in their README at all and now it's tucked away all the way down. Compare that to the originally proposed PR for example: https://github.com/ollama/ollama/pull/3700

margorczynski · on April 29, 2024

Do you know maybe what are these alternative engines they're talking about? Or is it just a way to evade the fact that at the end of the day it is just a wrapper around llama.cpp?

sh79 · on April 29, 2024

It was mentioned in another comment to the parent. There are no alternatives currently, the whole thing has been built upon llama.cpp since its inception.

bigfudge · on April 29, 2024

Ollama is great. I actually wish they would wrap OpenAI and Azure and generally act as as a proxy for third party APIs. Having a consistent, well thought out API which isn't tied to a single provider would be really good for the community.

Edit: this would be useful because in many cases some workloads can be local, but others cannot... e.g. if you really need gpt4 for specific queries.

Cheer2171 · on April 29, 2024

It is open source, so if you want to see this in ollama, pull requests are welcome. :)