Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Granite or sapphire rapids are very under rated for MoE inference loads. But you need a GPU for the KV cache.

Plus many boards also support CXL for RAM expansion over PCI 5!

Source: building a hybrid inference business for regulated industry workloads.

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: