coremltools is the only way to run on ANE, so less of a trick and more of a requ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		smpanaro 11 months ago \| parent \| context \| favorite \| on: Run LLMs on Apple Neural Engine (ANE) coremltools is the only way to run on ANE, so less of a trick and more of a requirement. The tricks are more around optimizing for the hardware capabilities/constraints. For instance: - conv2d is faster than linear (see Apple's post [0]) so you rewrite the model for that (example from the repo [1]) - inputs/outputs are static shapes, so KV cache requires some creativity (I wrote about that here [2]) - compute is float16 (not bfloat16) so occasionally you have to avoid activation overflows [0]: https://machinelearning.apple.com/research/neural-engine-tra... [1]: https://github.com/Anemll/Anemll/blob/4bfa0b08183a437e759798... [2]: https://stephenpanaro.com/blog/kv-cache-for-neural-engine

thadk 11 months ago [–]

Sounds like M2-era onward have bfloat16: https://eclecticlight.co/2024/01/13/how-m1-macs-may-lag-behi...

anemll 11 months ago | [–]

Yes for GPU, however ANE only supports FP16 plus integers. M4/A17 added accelerated int8 that is twice faster than FP16

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact