Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

7B vs 70B parameters... I think. The small ones fit in the memory of consumer grade cards. That's what I more or less know (waiting for my new computer to arrive this week)


How many parameters did ChatGPT have in Dec 2022 when it first broke into mainstream news?


GPT-3 had 175B, and the original ChatGPT was probably just a GPT-3 finetune (although they called it gpt-3.5, so it could have been different). However, it was severely undertrained. Llama-3.1-8B is better in most ways than the original ChatGPT; a well-trained ~70B usually feels GPT-4-level. The latest Llama release, llama-3.3-70b, goes toe-to-toe even with much larger models (albeit is bad at coding, like all Llama models so far; it's not inherent to the size, since Qwen is good, so I'm hoping the Llama 4 series is trained on more coding tokens).


> However, it was severely undertrained

by modern standards. at the time, it was trained according to neural scaling laws oai believed to hold.


Sure, at the time everyone misunderstood Chinchilla. Nonetheless it was severely undertrained, even if they didn't know it back then.


I don't think that's ever been shared, but it's predecessor GPT-3 Da Vinci was 175B.

One of the most exciting trends of the past year has been models getting dramatically smaller while maintaining similar levels of capability.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: