Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you want to run decently heavy models, I'd recommend getting at a minimum getting 48GB. This allows you to run 34b llama models with ease, 70b models quantized, mixtral without problems.

If you want to run most models, get 64GB. This just gives you some more room to work with.

If you want to run anything, get 128GB or more. Unquantized 70b? Check. Goliath 120b? Check.

Note that high end consumer gpus end at 24GB VRAM. I have one 7900xtx for running llms, and the best it can reliably run is 4-bit quantized 34b models, anything larger is partially in regular ram.



Thank you for this detailed response. I'm not sure if it was clear, but I was going to use just the Apple Silicon CPU/GPU, not an external one from Nvidia.

Is there anything useful you can do with 24 or 32GB of RAM with llms? Regular M2 Mac minis can only be ordered with up to 24GB of RAM. The Pro Mac mini M2 is upgradable to 32GB RAM.


I've been unable to get Mixtral (mixtral-8x7b-instruct-v0.1.Q6_K.gguf) to run well on my M2 MacBook Air (24 GB). It's super slow and eventually freezes after about 12-15 tokens of a response. You should look at M3 options with more RAM -- 64 GB or even the weird-sounding 96 GB might be a good choice.


https://www.reddit.com/r/LocalLLaMA/comments/17kcgjv/how_doe... reddit thread talks about some of the pros and cons of a m3 max with 128 Gb costing ~5-6K


You can buy multiple 4090s for that money and will get real GPUs including tensor cores. Still relevant it seems: https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...


32GB will run a good quantized Mixtral, though I can't confidently explain how much of a quality difference there is from unquantized.


Data Point: I am currently having issues getting Mixtral Q4_K_M running in LMStudio on my 32gb M1 Max. I'm trying Q3 to see if it fits.

I can have it run in on 'cpu' which is very slow, but offloading to the GPU runs out of memory.


You just have to allow more than 75% memory to be allocated to the GPU by running sudo sysctl -w iogpu.wired_limit_mb=30720 (for a 30 GB limit in this case).


1. That worked after some tweaking. 2. I had to lower the context window size to get LM Studio to load it up. 3. LM Studio has two distinct checkboxes that both say "Apple Metal GPU". No idea if they do the same thing....

Thanks a ton! I'm running on GPU w/ Mixtral 8x Instruct Q4_K_M now. tok/sec is about 4x what CPU only was. (Now at 26 tok/sec or so).


I was talking about m2 Macs. Just comparing that the best you can do with a gpu is 24GB, Macs go far beyond because of their integrated memory.


Just be aware you don't get to use all of it. I believe you only get access to ~20.8GB of GPU memory on a 32GB Apple Silicon Mac, and perhaps something like ~48GB on a 64GB Mac. I think there are some options to reconfigure the balance but they are technical and do not come without risk.


This is an important consideration. Thanks for mentioning it.


Ok, sorry. I did not understand that you just mentioned that to give more context. Totally makes sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: