Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> gpt-3.5-turbo is generally considered to be about 20B params. An 8B model does not exceed it.

The industry has moved on from the old Chinchilla scaling regime, and with it the conviction that LLM capability is mainly dictated by parameter count. OpenAI didn't disclose how much pretraining they did for 3.5-Turbo, but GPT 3 was trained on 300 billion tokens of text data. In contrast, Llama 3.1 was trained on 15 trillion tokens of data.

Objectively, Llama 3.1 8B and other small models have exceeded GPT-3.5-Turbo in benchmarks and human preference scores.

> Is a $8000 MBP regular consumer hardware?

As user `bloomingkales` notes down below, a $499 Mac Mini can run 8B parameter models. An $8,000 expenditure is not required.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: